Chord Transcription¶
Chord transcription for both MIDI and audio domain.
Re-implementation of the repository Tsung-Ping/Harmony-Transformer.
Feature Storage Format¶
Processed feature will be stored in .hdf
file format, one file per piece.
Columns in the file are:
chroma: the input feature for audio domain data.
chord: the first type of ground-truth label.
chord_change: the second type of ground-truth label.
tc
sequence_len
num_sequence
References¶
The related publications and techinical details can be found in [1] and [2]
- 1
Tsung-Ping Chen and Li Su, “Harmony Transformer: Incorporating Chord Segmentation Into Harmony Recognition,” International Society of Music Information Retrieval Conference (ISMIR), 2019.
- 2
Tsung-Ping Chen and Li Su, “Functional Harmony Recognition with Multi-task Recurrent Neural Networks,” International Society of Music Information Retrieval Conference (ISMIR), September 2018
App¶
- class omnizart.chord.app.ChordTranscription(conf_path=None)¶
Bases:
omnizart.base.BaseTranscription
Application class for chord transcription.
Methods
generate_feature
(dataset_path[, ...])Extract feature of McGill BillBoard dataset.
get_model
(settings)Get the chord model.
train
(feature_folder[, model_name, ...])Model training.
transcribe
(input_audio[, model_path, output])Transcribe chords in the audio.
- generate_feature(dataset_path, chord_settings=None, num_threads=4)¶
Extract feature of McGill BillBoard dataset.
There are three main features that will be used in the training:
chroma: input feature of the NN model
chord: the first type of the ground-truth
chord_change: the second type of the ground-truth
The last two feature will be both used for computing the training loss. During the feature extraction, the feature data is stored as a numpy array with named field, makes it works like a dict type.
- get_model(settings)¶
Get the chord model.
More comprehensive reasons to having this method, please refer to
omnizart.base.BaseTranscription.get_model
.
- train(feature_folder, model_name=None, input_model_path=None, chord_settings=None)¶
Model training.
Train a new music model or continue to train on a pre-trained model.
- Parameters
- feature_folder: Path
Path to the generated feature.
- model_name: str
The name of the trained model. If not given, will default to the current timestamp.
- input_model_path: Path
Specify the path to the pre-trained model if you want to continue to fine-tune on the model.
- chord_settings: ChordSettings
The configuration instance that holds all relative settings for the life-cycle of building a model.
- transcribe(input_audio, model_path=None, output='./')¶
Transcribe chords in the audio.
This function transcribes chord progression in the audio and will outputs MIDI and CSV files. The MIDI file is provided for quick validation by directly listen to the chords. The complete transcription results are listed in the CSV file, which contains the chord’s name and the start and end time.
- Parameters
- input_audio: Path
Path to the raw audio file (.wav).
- model_path: Path
Path to the trained model or the supported transcription mode.
- output: Path (optional)
Path for writing out the transcribed MIDI file. Default to the current path.
- Returns
- midi: pretty_midi.PrettyMIDI
Transcribed chord progression with default chord-to-notes mappings.
See also
omnizart.cli.chord.transcribe
CLI entry point of this function.
omnizart.chord.inference
Records the default chord-to-notes mappings.
Feature¶
- omnizart.chord.features.augment_feature(feature)¶
Feature augmentation
Variying pitches with 12 different shifts.
- omnizart.chord.features.compute_tonal_centroids(chromagram, filtering=True, sigma=8)¶
chromagram with shape [time, 12]
- omnizart.chord.features.extract_feature_label(feat_path, lab_path, segment_width=21, segment_hop=5, num_steps=100)¶
Basic feature extraction block.
Including multiple steps for processing the feature. Steps include:
Feature augmentation
Feature segmentation
Feature reshaping
- Parameters
- feat_path: Path
Path to the raw feature folder.
- lab_path: Path
Path to the corresponding label folder.
- segment_width: int
Width of each frame after segementation.
- segment_hop: int
Hop size for processing each segment.
- num_steps: int
Number of steps while reshaping the feature.
- Returns
- feature:
Processed feature
- omnizart.chord.features.load_feature(feat_path, label)¶
Load and parse the feature into the desired format.
- omnizart.chord.features.load_label(lab_path)¶
Load and parse the label into the desired format for later process.
- omnizart.chord.features.reshape_feature(feature, num_steps=100)¶
Reshape the feature into the final output.
- omnizart.chord.features.segment_feature(feature, segment_width=21, segment_hop=5)¶
Partition feature into segments.
- omnizart.chord.features.shift_chord(chord, shift)¶
Shift chord
- omnizart.chord.features.shift_chromagram(chromagram, shift)¶
Shift chord’s chromagram.
Dataset¶
- class omnizart.chord.app.McGillDatasetLoader(feature_folder=None, feature_files=None, num_samples=100, slice_hop=1)¶
Bases:
omnizart.base.BaseDatasetLoader
McGill BillBoard dataset loader.
The feature column name stored in the .hdf files is slightly different from others, which the name is
chroma
, notfeature
. Also the returned label should be a tuple of two different ground-truth labels to fit the training scenario.- Yields
- feature:
Input feature for training the model.
- label: tuple
gt_chord -> Ground-truth chord label. gt_chord_change -> Ground-truth chord change label.
Inference¶
- omnizart.chord.inference.inference(chord_pred, t_unit, min_dura=0.1)¶
- omnizart.chord.inference.write_csv(info, output='./chord.csv')¶
Settings¶
Below are the default settings for building the chord model. It will be loaded
by the class omnizart.setting_loaders.ChordSettings
. The name of the
attributes will be converted to snake-case (e.g., HopSize -> hop_size). There
is also a path transformation process when applying the settings into the
ChordSettings
instance. For example, if you want to access the attribute
BatchSize
defined in the yaml path General/Training/Settings/BatchSize,
the corresponding attribute will be ChordSettings.training.batch_size.
The level of /Settings is removed among all fields.
General:
TranscriptionMode:
Description: Mode of transcription by executing the `omnizart chord transcribe` command.
Type: String
Value: ChordV1
CheckpointPath:
Description: Path to the pre-trained models.
Type: Map
SubType: [String, String]
Value:
ChordV1: checkpoints/chord/chord_v1
Feature:
Description: Default settings of feature extraction for drum transcription.
Settings:
SegmentWidth:
Description: Width of segments. Each frame last for 0.046 seconds, and thus each segment would last for around 0.5 seconds.
Type: Integer
Value: 21
SegmentHop:
Description: Hop size of the segment.
Type: Integer
Value: 5
NumSteps:
Description: Number of total steps. Default setting would have around 23 seconds.
Type: Integer
Value: 100
Dataset:
Description: Settings of datasets.
Settings:
SavePath:
Description: Path for storing the downloaded datasets.
Type: String
Value: ./
FeatureSavePath:
Description: Path for storing the extracted feature. Default to the path under the dataset folder.
Type: String
Value: +
Model:
Description: Default settings of training / testing the model.
Settings:
SavePrefix:
Description: Prefix of the trained model's name to be saved.
Type: String
Value: chord
SavePath:
Description: Path to save the trained model.
Type: String
Value: ./checkpoints/chord
NumEncAttnBlocks:
Description: Number of attention blocks for encoder.
Type: Integer
Value: 2
NumDecAttnBlocks:
Description: Number of attention blocks for decoder.
Type: Integer
Value: 2
FreqSize:
Description: Available size on the frequency axis to be seen.
Type: Integer
Value: 24
EncInputEmbSize:
Description: Embedding size of the encoder's input.
Type: Integer
Value: 512
DecInputEmbSize:
Description: Embedding size of the decoder's input.
Type: Integer
Value: 512
DropoutRate:
Description: Dropout rate of all dropout layers.
Type: Float
Value: 0.6
AnnealingRate:
Description: To be added...
Type: Float
Value: 1.1
Inference:
Description: Default settings when infering notes.
Settings:
MinDura:
Description: Minimum duration (in seconds) for each chord. If shorter than expected, will append the duration to the previous chord.
Type: Float
Value: 0.1
Training:
Description: Hyper parameters for training
Settings:
Epoch:
Description: Maximum number of epochs for training.
Type: Integer
Value: 10
Steps:
Description: Number of training steps for each epoch.
Type: Integer
Value: 1000
ValSteps:
Description: Number of validation steps after each training epoch.
Type: Integer
Value: 500
BatchSize:
Description: Batch size of each training step.
Type: Integer
Value: 32
ValBatchSize:
Description: Batch size of each validation step.
Type: Integer
Value: 32
EarlyStop:
Description: Terminate the training if the validation performance doesn't imrove after n epochs.
Type: Integer
Value: 4
InitLearningRate:
Descriptoin: Initial learning rate.
Type: Float
Value: 0.0001
LearningRateDecay:
Description: Decaying rate of learning rate per epoch.
Type: Float
Value: 0.96