omnizart music¶

Lists the detailed available options of each sub-commands.

transcribe¶

omnizart music transcribe¶

Transcribe a single audio and output as a MIDI file.

This will output a MIDI file with the same name as the given audio, except the extension will be replaced with ‘.mid’.

Supported modes are: Piano, Stream, Pop

Example Usage

$ omnizart music transcribe

example.wav 
–model-path path/to/model 
–output example.mid

omnizart music transcribe [OPTIONS] INPUT_AUDIO

Options

-m, --model-path <model_path>¶: Path to the pre-trained model or the supported transcription mode.

-o, --output <output>¶

Path to output the prediction file (could be MIDI, CSV, …, etc.)

Default: ./

Arguments

INPUT_AUDIO¶: Required argument

generate-feature¶

omnizart music generate-feature¶

Extract the feature of the whole dataset for training.

The command will try to infer the dataset type from the given dataset path.

Available datasets are:
* Maps: Piano solo performances (smaller)
* Maestro: Piano solo performances (larger)
* MusicNet: Classical music performances, with 11 classes of instruments
* Pop: Pop music, including various instruments, drums, and vocal.

omnizart music generate-feature [OPTIONS]

Options

-d, --dataset-path <dataset_path>¶: Required Path to the downloaded dataset

-o, --output-path <output_path>¶: Path for saving the extracted feature. Default to the folder under the dataset.

-n, --num-threads <num_threads>¶

Number of threads used for parallel feature extraction.

Default: 4

-h, --harmonic¶: Whether to use harmonic version of the feature

train-model¶

omnizart music train-model¶

Train a new model or continue to train on a pre-trained model

omnizart music train-model [OPTIONS]

Options

-d, --feature-path <feature_path>¶: Required Path to the folder of extracted feature

-m, --model-name <model_name>¶: Name for the output model (can be a path)

-i, --input-model <input_model>¶: If given, the training will continue to fine-tune the pre-trained model.

-e, --epochs <epochs>¶: Number of training epochs

-s, --steps <steps>¶: Number of training steps of each epoch

-vs, --val-steps <val_steps>¶: Number of validation steps of each epoch

-b, --batch-size <batch_size>¶: Batch size of each training step

-vb, --val-batch-size <val_batch_size>¶: Batch size of each validation step

--early-stop <early_stop>¶: Stop the training if validation accuracy does not improve over the given number of epochs.

-y, --model-type <model_type>¶

Type of the neural network model

Default: attn
Options: attn | aspp

-f, --feature-type <feature_type>¶

Determine the input feature types for training

Default: Spec, Ceps
Options: Spec | Ceps | GCoS

-l, --label-type <label_type>¶

Determine the output label should be note- (onset, duration) or stream-level (onset, duration, instrument)

Default: note-stream
Options: note | note-stream | pop-note-stream | frame | frame-stream

-n, --loss-function <loss_function>¶

Detemine which loss function to use

Default: smooth
Options: focal | smooth | bce

-t, --timesteps <timesteps>¶

Time width of each input feature

Default: 256