Chord Transcription

Chord transcription for both MIDI and audio domain.

Re-implementation of the repository Tsung-Ping/Harmony-Transformer.

Feature Storage Format

Processed feature will be stored in .hdf file format, one file per piece.

Columns in the file are:

  • chroma: the input feature for audio domain data.

  • chord: the first type of ground-truth label.

  • chord_change: the second type of ground-truth label.

  • tc

  • sequence_len

  • num_sequence

References

The related publications and techinical details can be found in [1] and [2]

1

Tsung-Ping Chen and Li Su, “Harmony Transformer: Incorporating Chord Segmentation Into Harmony Recognition,” International Society of Music Information Retrieval Conference (ISMIR), 2019.

2

Tsung-Ping Chen and Li Su, “Functional Harmony Recognition with Multi-task Recurrent Neural Networks,” International Society of Music Information Retrieval Conference (ISMIR), September 2018

App

class omnizart.chord.app.ChordTranscription(conf_path=None)

Bases: omnizart.base.BaseTranscription

Application class for chord transcription.

Methods

generate_feature(dataset_path[, ...])

Extract feature of McGill BillBoard dataset.

get_model(settings)

Get the chord model.

train(feature_folder[, model_name, ...])

Model training.

transcribe(input_audio[, model_path, output])

Transcribe chords in the audio.

generate_feature(dataset_path, chord_settings=None, num_threads=4)

Extract feature of McGill BillBoard dataset.

There are three main features that will be used in the training:

  • chroma: input feature of the NN model

  • chord: the first type of the ground-truth

  • chord_change: the second type of the ground-truth

The last two feature will be both used for computing the training loss. During the feature extraction, the feature data is stored as a numpy array with named field, makes it works like a dict type.

get_model(settings)

Get the chord model.

More comprehensive reasons to having this method, please refer to omnizart.base.BaseTranscription.get_model.

train(feature_folder, model_name=None, input_model_path=None, chord_settings=None)

Model training.

Train a new music model or continue to train on a pre-trained model.

Parameters
feature_folder: Path

Path to the generated feature.

model_name: str

The name of the trained model. If not given, will default to the current timestamp.

input_model_path: Path

Specify the path to the pre-trained model if you want to continue to fine-tune on the model.

chord_settings: ChordSettings

The configuration instance that holds all relative settings for the life-cycle of building a model.

transcribe(input_audio, model_path=None, output='./')

Transcribe chords in the audio.

This function transcribes chord progression in the audio and will outputs MIDI and CSV files. The MIDI file is provided for quick validation by directly listen to the chords. The complete transcription results are listed in the CSV file, which contains the chord’s name and the start and end time.

Parameters
input_audio: Path

Path to the raw audio file (.wav).

model_path: Path

Path to the trained model or the supported transcription mode.

output: Path (optional)

Path for writing out the transcribed MIDI file. Default to the current path.

Returns
midi: pretty_midi.PrettyMIDI

Transcribed chord progression with default chord-to-notes mappings.

See also

omnizart.cli.chord.transcribe

CLI entry point of this function.

omnizart.chord.inference

Records the default chord-to-notes mappings.

Feature

omnizart.chord.features.augment_feature(feature)

Feature augmentation

Variying pitches with 12 different shifts.

omnizart.chord.features.compute_tonal_centroids(chromagram, filtering=True, sigma=8)

chromagram with shape [time, 12]

omnizart.chord.features.extract_feature_label(feat_path, lab_path, segment_width=21, segment_hop=5, num_steps=100)

Basic feature extraction block.

Including multiple steps for processing the feature. Steps include:

  • Feature augmentation

  • Feature segmentation

  • Feature reshaping

Parameters
feat_path: Path

Path to the raw feature folder.

lab_path: Path

Path to the corresponding label folder.

segment_width: int

Width of each frame after segementation.

segment_hop: int

Hop size for processing each segment.

num_steps: int

Number of steps while reshaping the feature.

Returns
feature:

Processed feature

omnizart.chord.features.load_feature(feat_path, label)

Load and parse the feature into the desired format.

omnizart.chord.features.load_label(lab_path)

Load and parse the label into the desired format for later process.

omnizart.chord.features.reshape_feature(feature, num_steps=100)

Reshape the feature into the final output.

omnizart.chord.features.segment_feature(feature, segment_width=21, segment_hop=5)

Partition feature into segments.

omnizart.chord.features.shift_chord(chord, shift)

Shift chord

omnizart.chord.features.shift_chromagram(chromagram, shift)

Shift chord’s chromagram.

Dataset

class omnizart.chord.app.McGillDatasetLoader(feature_folder=None, feature_files=None, num_samples=100, slice_hop=1)

Bases: omnizart.base.BaseDatasetLoader

McGill BillBoard dataset loader.

The feature column name stored in the .hdf files is slightly different from others, which the name is chroma, not feature. Also the returned label should be a tuple of two different ground-truth labels to fit the training scenario.

Yields
feature:

Input feature for training the model.

label: tuple

gt_chord -> Ground-truth chord label. gt_chord_change -> Ground-truth chord change label.

Inference

omnizart.chord.inference.inference(chord_pred, t_unit, min_dura=0.1)
omnizart.chord.inference.write_csv(info, output='./chord.csv')

Settings

Below are the default settings for building the chord model. It will be loaded by the class omnizart.setting_loaders.ChordSettings. The name of the attributes will be converted to snake-case (e.g., HopSize -> hop_size). There is also a path transformation process when applying the settings into the ChordSettings instance. For example, if you want to access the attribute BatchSize defined in the yaml path General/Training/Settings/BatchSize, the corresponding attribute will be ChordSettings.training.batch_size. The level of /Settings is removed among all fields.

General:
    TranscriptionMode:
        Description: Mode of transcription by executing the `omnizart chord transcribe` command.
        Type: String
        Value: ChordV1
    CheckpointPath:
        Description: Path to the pre-trained models.
        Type: Map
        SubType: [String, String]
        Value:
            ChordV1: checkpoints/chord/chord_v1
    Feature:
        Description: Default settings of feature extraction for drum transcription.
        Settings:
            SegmentWidth:
                Description: Width of segments. Each frame last for 0.046 seconds, and thus each segment would last for around 0.5 seconds.
                Type: Integer
                Value: 21
            SegmentHop:
                Description: Hop size of the segment.
                Type: Integer
                Value: 5
            NumSteps:
                Description: Number of total steps. Default setting would have around 23 seconds.
                Type: Integer
                Value: 100
    Dataset:
        Description: Settings of datasets.
        Settings:
            SavePath:
                Description: Path for storing the downloaded datasets.
                Type: String
                Value: ./
            FeatureSavePath:
                Description: Path for storing the extracted feature. Default to the path under the dataset folder.
                Type: String
                Value: +
    Model:
        Description: Default settings of training / testing the model.
        Settings:
            SavePrefix:
                Description: Prefix of the trained model's name to be saved.
                Type: String
                Value: chord
            SavePath:
                Description: Path to save the trained model.
                Type: String
                Value: ./checkpoints/chord
            NumEncAttnBlocks:
                Description: Number of attention blocks for encoder.
                Type: Integer
                Value: 2
            NumDecAttnBlocks:
                Description: Number of attention blocks for decoder.
                Type: Integer
                Value: 2
            FreqSize:
                Description: Available size on the frequency axis to be seen.
                Type: Integer
                Value: 24
            EncInputEmbSize:
                Description: Embedding size of the encoder's input.
                Type: Integer
                Value: 512
            DecInputEmbSize:
                Description: Embedding size of the decoder's input.
                Type: Integer
                Value: 512
            DropoutRate:
                Description: Dropout rate of all dropout layers.
                Type: Float
                Value: 0.6
            AnnealingRate:
                Description: To be added...
                Type: Float
                Value: 1.1
    Inference:
        Description: Default settings when infering notes.
        Settings:
            MinDura:
                Description: Minimum duration (in seconds) for each chord. If shorter than expected, will append the duration to the previous chord.
                Type: Float
                Value: 0.1
    Training:
        Description: Hyper parameters for training
        Settings:
            Epoch:
                Description: Maximum number of epochs for training.
                Type: Integer
                Value: 10
            Steps:
                Description: Number of training steps for each epoch.
                Type: Integer
                Value: 1000
            ValSteps:
                Description: Number of validation steps after each training epoch.
                Type: Integer
                Value: 500
            BatchSize:
                Description: Batch size of each training step.
                Type: Integer
                Value: 32
            ValBatchSize:
                Description: Batch size of each validation step.
                Type: Integer
                Value: 32
            EarlyStop:
                Description: Terminate the training if the validation performance doesn't imrove after n epochs.
                Type: Integer
                Value: 4
            InitLearningRate:
                Descriptoin: Initial learning rate.
                Type: Float
                Value: 0.0001
            LearningRateDecay:
                Description: Decaying rate of learning rate per epoch.
                Type: Float
                Value: 0.96