omnizart music

Lists the detailed available options of each sub-commands.

transcribe

omnizart music transcribe

Transcribe a single audio and output as a MIDI file.

This will output a MIDI file with the same name as the given audio, except the extension will be replaced with ‘.mid’.

Supported modes are: Piano, Stream, Pop

Example Usage
$ omnizart music transcribe
example.wav
–model-path path/to/model
–output example.mid
omnizart music transcribe [OPTIONS] INPUT_AUDIO

Options

-m, --model-path <model_path>

Path to the pre-trained model or the supported transcription mode.

-o, --output <output>

Path to output the prediction file (could be MIDI, CSV, …, etc.)

Default

./

Arguments

INPUT_AUDIO

Required argument

generate-feature

omnizart music generate-feature

Extract the feature of the whole dataset for training.

The command will try to infer the dataset type from the given dataset path.

Available datasets are:
* Maps: Piano solo performances (smaller)
* Maestro: Piano solo performances (larger)
* MusicNet: Classical music performances, with 11 classes of instruments
* Pop: Pop music, including various instruments, drums, and vocal.
omnizart music generate-feature [OPTIONS]

Options

-d, --dataset-path <dataset_path>

Required Path to the downloaded dataset

-o, --output-path <output_path>

Path for saving the extracted feature. Default to the folder under the dataset.

-n, --num-threads <num_threads>

Number of threads used for parallel feature extraction.

Default

4

-h, --harmonic

Whether to use harmonic version of the feature

train-model

omnizart music train-model

Train a new model or continue to train on a pre-trained model

omnizart music train-model [OPTIONS]

Options

-d, --feature-path <feature_path>

Required Path to the folder of extracted feature

-m, --model-name <model_name>

Name for the output model (can be a path)

-i, --input-model <input_model>

If given, the training will continue to fine-tune the pre-trained model.

-e, --epochs <epochs>

Number of training epochs

-s, --steps <steps>

Number of training steps of each epoch

-vs, --val-steps <val_steps>

Number of validation steps of each epoch

-b, --batch-size <batch_size>

Batch size of each training step

-vb, --val-batch-size <val_batch_size>

Batch size of each validation step

--early-stop <early_stop>

Stop the training if validation accuracy does not improve over the given number of epochs.

-y, --model-type <model_type>

Type of the neural network model

Default

attn

Options

attn | aspp

-f, --feature-type <feature_type>

Determine the input feature types for training

Default

Spec, Ceps

Options

Spec | Ceps | GCoS

-l, --label-type <label_type>

Determine the output label should be note- (onset, duration) or stream-level (onset, duration, instrument)

Default

note-stream

Options

note | note-stream | pop-note-stream | frame | frame-stream

-n, --loss-function <loss_function>

Detemine which loss function to use

Default

smooth

Options

focal | smooth | bce

-t, --timesteps <timesteps>

Time width of each input feature

Default

256