Feature

Collection of functions for processing raw audio into required input representation.

CFP

Author: Lisu

Mantainer: BreezeWhite

omnizart.feature.cfp.STFT(x, fr, fs, Hop, h)
omnizart.feature.cfp.cfp_filterbank(x, fr, fs, Hop, h, fc, tc, g, bin_per_octave)
omnizart.feature.cfp.extract_cfp(filename, down_fs=44100, **kwargs)

CFP feature extraction function.

Given the audio path, returns the CFP feature. Will automatically process the feature in parallel to accelerate the computation.

Parameters
filename: Path

Path to the audio.

hop: float

Hop size in seconds, with regard to the sampling rate.

win_size: int

Window size.

fr: float

Frequency resolution.

fc: float

Lowest start frequency.

tc: float

Inverse number of the highest frequency bound.

g: list[float]

Power factor of the output STFT results.

bin_per_octave: int

Number of bins in each octave.

down_fs: int

Resample to this sampling rate, if the loaded audio has a different value.

max_sample: int

Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.

Returns
Z

Multiplication of spectrum and cepstrum

tfrL0

Spectrum of the audio.

tfrLF

Generalized Cepstrum of Spectrum (GCoS).

tfrLQ

Cepstrum of the audio

cen_freq

Central frequencies to each feature.

References

The CFP approach was first proposed in [1]

1

L. Su and Y. Yang, “Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.

omnizart.feature.cfp.extract_patch_cfp(filename, patch_size=25, threshold=0.5, hop=0.02, win_size=2049, fr=2.0, fc=80.0, tc=0.001, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=16000, max_sample=2000)

Extract patch CFP feature for PatchCNN module.

Parameters
filename: Path

Path to the audio

patch_size: int

Height and width of each feature patch.

threshold: float

Threshold for determine peaks.

hop: float

Hop size in seconds, with regard to the sampling rate.

win_size: int

Window size.

fr: float

Frequency resolution.

fc: float

Lowest start frequency.

tc: float

Inverse number of the highest frequency bound.

g: list[float]

Power factor of the output STFT results.

bin_per_octave: int

Number of bins in each octave.

down_fs: int

Resample to this sampling rate, if the loaded audio has a different value.

max_sample: int

Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.

Returns
patch: 3D numpy array

Sequence of patch CFP features. The position of the patches are inferred according to the amplitude of the spectrogram.

mapping: 2D numpy array

Records the original frequency and time index of each patch, having dimension of len(patch) x 2.

Z: 2D numpy array

The original CFP feature. Dim: freq x time

cenf: list[float]

Records the corresponding center frequencies of the frequency dimension.

omnizart.feature.cfp.extract_vocal_cfp(filename, down_fs=16000, **kwargs)

Specialized CFP feature extraction for vocal submodule.

omnizart.feature.cfp.freq_to_log_freq_mapping(tfr, f, fr, fc, tc, NumPerOct)
omnizart.feature.cfp.nonlinear_func(X, g, cutoff)
omnizart.feature.cfp.parallel_extract(x, samples, max_sample, fr, fs, Hop, h, fc, tc, g, bin_per_octave)
omnizart.feature.cfp.quef_to_log_freq_mapping(ceps, q, fs, fc, tc, NumPerOct)
omnizart.feature.cfp.spectral_flux(spec, invert=False, norm=True)

HCFP

omnizart.feature.hcfp.extract_hcfp(filename, hop=0.02, win_size=7939, fr=2.0, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=44100, max_sample=2000, harmonic_num=6)
omnizart.feature.hcfp.fetch_harmonic(data, cenf, ith_har, start_freq=27.5, num_per_octave=48, is_reverse=False)

CQT

omnizart.feature.cqt.extract_cqt(audio_path, sampling_rate=44100, lowest_note=16, note_num=120, a_hop=256, pad_sec=1)

Compute some audio data’s constant-Q spectrogram, normalize, and log-scale it

Parameters
audio_data: Path

Path to the input audio.

sampling_rate: int

Sampling rate the audio data is sampled at, should be DOWN_SAMPLE_TO_SAPMLING_RATE.

lowest_note: int

Lowest MIDI note number.

note_num: int

Number of total notes. The highest note number would thus be lowest_note + note_num.

a_hop: int

Hop size for computing CQT.

pad_sec: float

Length of padding to the begin and the end of the raw audio data in seconds.

Returns
midi_gram: np.ndarray

Log-magnitude, L2-normalized constant-Q spectrogram of synthesized MIDI data.

omnizart.feature.cqt.post_process_cqt(gram)

Normalize and log-scale a Constant-Q spectrogram

Parameters
gram: np.ndarray

Constant-Q spectrogram, constructed from librosa.cqt.

Returns
log_normalized_gram: np.ndarray

Log-magnitude, L2-normalized constant-Q spectrogram.

Beat Tracking

class omnizart.feature.beat_for_drum.MadmomBeatTracking(num_threads=3)

Extract beat information with madmom library.

Three different beat tracking methods are used together for producing a more stable beat tracking result.

Methods

process(audio_data)

Generate beat tracking results with multiple approaches.

process(audio_data)

Generate beat tracking results with multiple approaches.

omnizart.feature.beat_for_drum.extract_beat_with_madmom(audio_path, sampling_rate=44100)

Extract beat position (in seconds) of the audio.

Extract beat with mixture of beat tracking techiniques using madmom.

Parameters
audio_path: Path

Path to the target audio

sampling_rate: int

Desired sampling to be resampled.

Returns
beat_arr: 1D numpy array

Contains beat positions in seconds.

audio_len_sec: float

Total length of the audio in seconds.

omnizart.feature.beat_for_drum.extract_mini_beat_from_audio_path(audio_path, sampling_rate=44100, mini_beat_div_n=32)

Wrapper of extracting mini beats from audio path.

omnizart.feature.beat_for_drum.extract_mini_beat_from_beat_arr(beat_arr, audio_len_sec, mini_beat_div_n=32)

Extract mini beats from the beat array.

Furhter split beat into shorter beat interval, which we call it mini beat, to increase the beat resolution. We use linear interpolation to generate the mini beats.

Parameters
beat_arr: 1D numpy array

Beat array generated by extract_beat_with_madmom.

audio_len_sec: float

Total length of the audio in seconds.

mini_beat_div_n: int

Number of mini beats in a single 4/4 measure.

Returns
mini_beat_pos_t: 1D numpy array

Positions of mini beats in seconds.