Beat Transcription¶

MIDI domain beat tracking.

Track beats and downbeat in symbolic domain. Outputs the predicted beat positions in seconds. Re-implementation of the work [1] with tensorflow 2.3.0.

Feature Storage Format¶

Processed feature will be stored in .hdf format, one file per piece.

Columns in the file are:

feature: Piano roll like representation with mixed information.
label:

References¶

1: https://github.com/chuang76/symbolic-beat-tracking

App¶

class omnizart.beat.app.BeatTranscription(conf_path=None)¶

Bases: omnizart.base.BaseTranscription

Application class for beat tracking in MIDI domain.

Methods

`generate_feature`(dataset_path[, ...])	Extract the feature from the given dataset.
`train`(feature_folder[, model_name, ...])	Model training.
`transcribe`(input_audio[, model_path, output])	Transcribe beat positions in the given MIDI.

generate_feature(dataset_path, beat_settings=None, num_threads=8)¶

Extract the feature from the given dataset.

To train the model, the first step is to pre-process the data into feature representations. After downloading the dataset, use this function to generate the feature by giving the path of the stored dataset.

To specify the output path, modify the attribute beat_settings.dataset.feature_save_path. It defaults to the folder under where the dataset stored, generating two folders: train_feature and test_feature.

Parameters

dataset_path: Path: Path to the downloaded dataset.
beat_settings: BeatSettings: The configuration instance that holds all relative settings for the life-cycle of building a model.
num_threads:: Number of threads for parallel extraction the feature.

train(feature_folder, model_name=None, input_model_path=None, beat_settings=None)¶

Model training.

Train the model from scratch or continue training given a model checkpoint.

Parameters

feature_folder: Path: Path to the generated feature.
model_name: str: The name of the trained model. If not given, will default to the current timestamp.
input_model_path: Path: Specify the path to the model checkpoint in order to fine-tune the model.
beat_settings: BeatSettings: The configuration that holds all relative settings for the life-cycle of model building.

transcribe(input_audio, model_path=None, output='./')¶

Transcribe beat positions in the given MIDI.

Tracks the beat in symbolic domain. Outputs three files if the output path is given: <filename>.mid, <filename>_beat.csv, and <filename>_down_beat.csv, where filename is the name of the input MIDI without extension. The *.csv files records the beat positions in seconds.

Parameters

input_audio: Path: Path to the MIDI file (.mid).
model_path: Path: Path to the trained model or the supported transcription mode.
output: Path (optional): Path for writing out the transcribed MIDI file. Default to the current path.

Returns

midi: pretty_midi.PrettyMIDI: The transcribed beat positions. There are two types of beat: beat and down beat. Each are recorded in independent instrument track.

Dataset¶

class omnizart.beat.app.BeatDatasetLoader(feature_folder=None, feature_files=None, num_samples=100, slice_hop=1, feat_col_name='feature')¶

Bases: omnizart.base.BaseDatasetLoader

Data loader for training the model of beat.

Each feature slice will have an overlap size of timesteps//2.

Parameters

feature_folder: Path: Path to the extracted feature files, including *.hdf and *.pickle pairs, which refers to feature and label files, respectively.
feature_files: list[Path]: List of path of *.hdf feature files. Corresponding label files should also under the same folder.
num_samples: int: Total number of samples to yield.
timesteps: int: Time length of the feature.

Yields

feature:: Input features for model training.
label:: Corresponding labels.

Inference¶

omnizart.beat.inference.inference(pred, beat_th=0.5, down_beat_th=0.5, min_dist=0.3, t_unit=0.1)¶

Infers the beat and down beat positions from the raw prediction values.

Parameters

pred: 2D numpy array: The prediction of the model.
beat_th: float: Threshold for beat channel.
down_beat_th: float: Threshold for down beat channel.
min_dist: float: Minimum distance between two beat positions in seconds.
t_unit: float: Time unit of each frame in seconds.

Returns

midi: pretty_midi.PrettyMIDI: Inferred beat positions recorded as MIDI notes. Information of beat and down beat are recorded in two different instrument tracks.

Loss Functions¶

omnizart.beat.app.weighted_binary_crossentropy(target, pred, down_beat_weight=5)¶: Wrap around binary crossentropy loss with weighting to different channels.

Features¶

omnizart.beat.features.extract_feature(labels, t_unit=0.01)¶

Extract feature representation required by beat module.

Parameters

labels: list[Label]: List of omnizart.base.Label instances.
t_unit: float: Time unit of each frame of the output representation.

Returns

feature: 2D numpy array: A piano roll like representation. Please refer to the original paper for more details.

omnizart.beat.features.extract_feature_from_midi(midi_path, t_unit=0.01)¶

Extract feature for beat module from MIDI file.

Prediction¶

omnizart.beat.prediction.STEP_SIZE_RATIO = 0.5¶: Step size for slicing the feature. Ratio to the timesteps of the model input feature.

omnizart.beat.prediction.create_batches(feature, timesteps, batch_size=8)¶

Create a 4D output from the 2D feature for model prediciton.

Create overlapped input features, and collect feature slices into batches. The overlap size is 1/4 length to the timesteps.

Parameters

feature: 2D numpy array: The feature representation for the model.
timesteps: int: Size of the input feature dimension.
batch_size: int: Batch size.

Returns

batches: 4D numpy array: Batched feature slices with dimension: batches x batch_size x timesteps x feat.

omnizart.beat.prediction.merge_batches(batch_pred)¶: Merge the batched predictions back to the 2D output.

omnizart.beat.prediction.predict(feature, model, timesteps=1000, batch_size=64)¶

Predict on the given feature with the model.

Parameters

feature: 2D numpy array: Input feature of the model.
model:: The pre-trained Tensorflow model.
timesteps: int: Size of the input feature dimension.
batch_size: int: Batch size for the model input.

Returns

pred: 2D numpy array: The predicted probabilities of beat and down beat positions.

Settings¶

Below are the default settings for building the beat model. It will be loaded by the class omnizart.setting_loaders.BeatSettings. The name of the attributes will be converted to snake-case (e.g., HopSize -> hop_size). There is also a path transformation process when applying the settings into the BeatSettings instance. For example, if you want to access the attribute BatchSize defined in the yaml path General/Training/Settings/BatchSize, the corresponding attribute will be BeatSettings.training.batch_size. The level of /Settings is removed among all fields.

General:
    TranscriptionMode:
        Description: Mode of transcription by executing the `omnizart beat transcribe` command.
        Type: String
        Value: BLSTM
    CheckpointPath:
        Description: Path to the pre-trained models.
        Type: Map
        SubType: [String, String]
        Value:
            BLSTM: checkpoints/beat/beat_blstm
    Feature:
        Description: Default settings of feature extraction for drum transcription.
        Settings:
            TimeUnit:
                Description: Time unit of each frame in seconds.
                Type: Float
                Value: 0.01
    Dataset:
        Description: Settings of datasets.
        Settings:
            SavePath:
                Description: Path for storing the downloaded datasets.
                Type: String
                Value: ./
            FeatureSavePath:
                Description: Path for storing the extracted feature. Default to the path under the dataset folder.
                Type: String
                Value: +
    Model:
        Description: Default settings of training / testing the model.
        Settings:
            SavePrefix:
                Description: Prefix of the trained model's name to be saved.
                Type: String
                Value: beat
            SavePath:
                Description: Path to save the trained model.
                Type: String
                Value: ./checkpoints/beat
            ModelType:
                Description: One of 'blstm' or 'blstm_attn'.
                Type: String
                Value: blstm
            Timesteps:
                Description: Input length of the model.
                Type: Integer
                Value: 1000
            LstmHiddenDim:
                Description: Dimension of LSTM hidden layers.
                Type: Integer
                Value: 25
            NumLstmLayers:
                Description: Number of LSTM layers.
                Type: Integer
                Value: 2
            AttnHiddenDim:
                Description: Dimension of multi-head attention layers.
                Type: Integer
                Value: 256
    Inference:
        Description: Default settings when infering notes.
        Settings:
            BeatThreshold:
                Description: Threshold that will be applied to clip the predicted beat values to either 0 or 1.
                Type: Float
                Value: 0.5
            DownBeatThreshold:
                Description: Same as above, but for down beat.
                Type: Float
                Value: 0.3
            MinDistance:
                Description: Minimum required distance between each note in seconds.
                Type: Float
                Value: 0.3
    Training:
        Description: Hyper parameters for training
        Settings:
            Epoch:
                Description: Maximum number of epochs for training.
                Type: Integer
                Value: 10
            Steps:
                Description: Number of training steps for each epoch.
                Type: Integer
                Value: 1000
            ValSteps:
                Description: Number of validation steps after each training epoch.
                Type: Integer
                Value: 50
            BatchSize:
                Description: Batch size of each training step.
                Type: Integer
                Value: 64
            ValBatchSize:
                Description: Batch size of each validation step.
                Type: Integer
                Value: 64
            EarlyStop:
                Description: Terminate the training if the validation performance doesn't imrove after n epochs.
                Type: Integer
                Value: 7
            InitLearningRate:
                Descriptoin: Initial learning rate.
                Type: Float
                Value: 0.001
            DownBeatWeight:
                Description: Weighting of down beat loss. Beat loss is always set to one.
                Type: Float
                Value: 5