Feature¶

Collection of functions for processing raw audio into required input representation.

CFP¶

Author: Lisu

Mantainer: BreezeWhite

omnizart.feature.cfp.STFT(x, fr, fs, Hop, h)¶

omnizart.feature.cfp.cfp_filterbank(x, fr, fs, Hop, h, fc, tc, g, bin_per_octave)¶

omnizart.feature.cfp.extract_cfp(filename, down_fs=44100, **kwargs)¶

CFP feature extraction function.

Given the audio path, returns the CFP feature. Will automatically process the feature in parallel to accelerate the computation.

Parameters

filename: Path: Path to the audio.
hop: float: Hop size in seconds, with regard to the sampling rate.
win_size: int: Window size.
fr: float: Frequency resolution.
fc: float: Lowest start frequency.
tc: float: Inverse number of the highest frequency bound.
g: list[float]: Power factor of the output STFT results.
bin_per_octave: int: Number of bins in each octave.
down_fs: int: Resample to this sampling rate, if the loaded audio has a different value.
max_sample: int: Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.

Returns

Z: Multiplication of spectrum and cepstrum
tfrL0: Spectrum of the audio.
tfrLF: Generalized Cepstrum of Spectrum (GCoS).
tfrLQ: Cepstrum of the audio
cen_freq: Central frequencies to each feature.

References

The CFP approach was first proposed in [1]

1: L. Su and Y. Yang, “Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.

omnizart.feature.cfp.extract_patch_cfp(filename, patch_size=25, threshold=0.5, hop=0.02, win_size=2049, fr=2.0, fc=80.0, tc=0.001, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=16000, max_sample=2000)¶

Extract patch CFP feature for PatchCNN module.

Parameters

filename: Path: Path to the audio
patch_size: int: Height and width of each feature patch.
threshold: float: Threshold for determine peaks.
hop: float: Hop size in seconds, with regard to the sampling rate.
win_size: int: Window size.
fr: float: Frequency resolution.
fc: float: Lowest start frequency.
tc: float: Inverse number of the highest frequency bound.
g: list[float]: Power factor of the output STFT results.
bin_per_octave: int: Number of bins in each octave.
down_fs: int: Resample to this sampling rate, if the loaded audio has a different value.
max_sample: int: Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.

Returns

patch: 3D numpy array: Sequence of patch CFP features. The position of the patches are inferred according to the amplitude of the spectrogram.
mapping: 2D numpy array: Records the original frequency and time index of each patch, having dimension of len(patch) x 2.
Z: 2D numpy array: The original CFP feature. Dim: freq x time
cenf: list[float]: Records the corresponding center frequencies of the frequency dimension.

omnizart.feature.cfp.extract_vocal_cfp(filename, down_fs=16000, **kwargs)¶: Specialized CFP feature extraction for vocal submodule.

omnizart.feature.cfp.freq_to_log_freq_mapping(tfr, f, fr, fc, tc, NumPerOct)¶

omnizart.feature.cfp.nonlinear_func(X, g, cutoff)¶

omnizart.feature.cfp.parallel_extract(x, samples, max_sample, fr, fs, Hop, h, fc, tc, g, bin_per_octave)¶

omnizart.feature.cfp.quef_to_log_freq_mapping(ceps, q, fs, fc, tc, NumPerOct)¶

omnizart.feature.cfp.spectral_flux(spec, invert=False, norm=True)¶

HCFP¶

omnizart.feature.hcfp.extract_hcfp(filename, hop=0.02, win_size=7939, fr=2.0, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=44100, max_sample=2000, harmonic_num=6)¶

omnizart.feature.hcfp.fetch_harmonic(data, cenf, ith_har, start_freq=27.5, num_per_octave=48, is_reverse=False)¶

CQT¶

omnizart.feature.cqt.extract_cqt(audio_path, sampling_rate=44100, lowest_note=16, note_num=120, a_hop=256, pad_sec=1)¶

Compute some audio data’s constant-Q spectrogram, normalize, and log-scale it

Parameters

audio_data: Path: Path to the input audio.
sampling_rate: int: Sampling rate the audio data is sampled at, should be DOWN_SAMPLE_TO_SAPMLING_RATE.
lowest_note: int: Lowest MIDI note number.
note_num: int: Number of total notes. The highest note number would thus be lowest_note + note_num.
a_hop: int: Hop size for computing CQT.
pad_sec: float: Length of padding to the begin and the end of the raw audio data in seconds.

Returns

midi_gram: np.ndarray: Log-magnitude, L2-normalized constant-Q spectrogram of synthesized MIDI data.

omnizart.feature.cqt.post_process_cqt(gram)¶

Normalize and log-scale a Constant-Q spectrogram

Parameters

gram: np.ndarray: Constant-Q spectrogram, constructed from librosa.cqt.

Returns

log_normalized_gram: np.ndarray: Log-magnitude, L2-normalized constant-Q spectrogram.

Beat Tracking¶

class omnizart.feature.beat_for_drum.MadmomBeatTracking(num_threads=3)¶

Extract beat information with madmom library.

Three different beat tracking methods are used together for producing a more stable beat tracking result.

Methods

process(audio_data)

Generate beat tracking results with multiple approaches.

process(audio_data)¶: Generate beat tracking results with multiple approaches.

omnizart.feature.beat_for_drum.extract_beat_with_madmom(audio_path, sampling_rate=44100)¶

Extract beat position (in seconds) of the audio.

Extract beat with mixture of beat tracking techiniques using madmom.

Parameters

audio_path: Path: Path to the target audio
sampling_rate: int: Desired sampling to be resampled.

Returns

beat_arr: 1D numpy array: Contains beat positions in seconds.
audio_len_sec: float: Total length of the audio in seconds.

omnizart.feature.beat_for_drum.extract_mini_beat_from_audio_path(audio_path, sampling_rate=44100, mini_beat_div_n=32)¶: Wrapper of extracting mini beats from audio path.

omnizart.feature.beat_for_drum.extract_mini_beat_from_beat_arr(beat_arr, audio_len_sec, mini_beat_div_n=32)¶

Extract mini beats from the beat array.

Furhter split beat into shorter beat interval, which we call it mini beat, to increase the beat resolution. We use linear interpolation to generate the mini beats.

Parameters

beat_arr: 1D numpy array: Beat array generated by extract_beat_with_madmom.
audio_len_sec: float: Total length of the audio in seconds.
mini_beat_div_n: int: Number of mini beats in a single 4/4 measure.

Returns

mini_beat_pos_t: 1D numpy array: Positions of mini beats in seconds.