Feature¶
Collection of functions for processing raw audio into required input representation.
CFP¶
Author: Lisu
Mantainer: BreezeWhite
- omnizart.feature.cfp.STFT(x, fr, fs, Hop, h)¶
 
- omnizart.feature.cfp.cfp_filterbank(x, fr, fs, Hop, h, fc, tc, g, bin_per_octave)¶
 
- omnizart.feature.cfp.extract_cfp(filename, down_fs=44100, **kwargs)¶
 CFP feature extraction function.
Given the audio path, returns the CFP feature. Will automatically process the feature in parallel to accelerate the computation.
- Parameters
 - filename: Path
 Path to the audio.
- hop: float
 Hop size in seconds, with regard to the sampling rate.
- win_size: int
 Window size.
- fr: float
 Frequency resolution.
- fc: float
 Lowest start frequency.
- tc: float
 Inverse number of the highest frequency bound.
- g: list[float]
 Power factor of the output STFT results.
- bin_per_octave: int
 Number of bins in each octave.
- down_fs: int
 Resample to this sampling rate, if the loaded audio has a different value.
- max_sample: int
 Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.
- Returns
 - Z
 Multiplication of spectrum and cepstrum
- tfrL0
 Spectrum of the audio.
- tfrLF
 Generalized Cepstrum of Spectrum (GCoS).
- tfrLQ
 Cepstrum of the audio
- cen_freq
 Central frequencies to each feature.
References
The CFP approach was first proposed in [1]
- 1
 L. Su and Y. Yang, “Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.
- omnizart.feature.cfp.extract_patch_cfp(filename, patch_size=25, threshold=0.5, hop=0.02, win_size=2049, fr=2.0, fc=80.0, tc=0.001, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=16000, max_sample=2000)¶
 Extract patch CFP feature for PatchCNN module.
- Parameters
 - filename: Path
 Path to the audio
- patch_size: int
 Height and width of each feature patch.
- threshold: float
 Threshold for determine peaks.
- hop: float
 Hop size in seconds, with regard to the sampling rate.
- win_size: int
 Window size.
- fr: float
 Frequency resolution.
- fc: float
 Lowest start frequency.
- tc: float
 Inverse number of the highest frequency bound.
- g: list[float]
 Power factor of the output STFT results.
- bin_per_octave: int
 Number of bins in each octave.
- down_fs: int
 Resample to this sampling rate, if the loaded audio has a different value.
- max_sample: int
 Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.
- Returns
 - patch: 3D numpy array
 Sequence of patch CFP features. The position of the patches are inferred according to the amplitude of the spectrogram.
- mapping: 2D numpy array
 Records the original frequency and time index of each patch, having dimension of len(patch) x 2.
- Z: 2D numpy array
 The original CFP feature. Dim: freq x time
- cenf: list[float]
 Records the corresponding center frequencies of the frequency dimension.
- omnizart.feature.cfp.extract_vocal_cfp(filename, down_fs=16000, **kwargs)¶
 Specialized CFP feature extraction for vocal submodule.
- omnizart.feature.cfp.freq_to_log_freq_mapping(tfr, f, fr, fc, tc, NumPerOct)¶
 
- omnizart.feature.cfp.nonlinear_func(X, g, cutoff)¶
 
- omnizart.feature.cfp.parallel_extract(x, samples, max_sample, fr, fs, Hop, h, fc, tc, g, bin_per_octave)¶
 
- omnizart.feature.cfp.quef_to_log_freq_mapping(ceps, q, fs, fc, tc, NumPerOct)¶
 
- omnizart.feature.cfp.spectral_flux(spec, invert=False, norm=True)¶
 
HCFP¶
- omnizart.feature.hcfp.extract_hcfp(filename, hop=0.02, win_size=7939, fr=2.0, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=44100, max_sample=2000, harmonic_num=6)¶
 
- omnizart.feature.hcfp.fetch_harmonic(data, cenf, ith_har, start_freq=27.5, num_per_octave=48, is_reverse=False)¶
 
CQT¶
- omnizart.feature.cqt.extract_cqt(audio_path, sampling_rate=44100, lowest_note=16, note_num=120, a_hop=256, pad_sec=1)¶
 Compute some audio data’s constant-Q spectrogram, normalize, and log-scale it
- Parameters
 - audio_data: Path
 Path to the input audio.
- sampling_rate: int
 Sampling rate the audio data is sampled at, should be
DOWN_SAMPLE_TO_SAPMLING_RATE.- lowest_note: int
 Lowest MIDI note number.
- note_num: int
 Number of total notes. The highest note number would thus be lowest_note + note_num.
- a_hop: int
 Hop size for computing CQT.
- pad_sec: float
 Length of padding to the begin and the end of the raw audio data in seconds.
- Returns
 - midi_gram: np.ndarray
 Log-magnitude, L2-normalized constant-Q spectrogram of synthesized MIDI data.
- omnizart.feature.cqt.post_process_cqt(gram)¶
 Normalize and log-scale a Constant-Q spectrogram
- Parameters
 - gram: np.ndarray
 Constant-Q spectrogram, constructed from
librosa.cqt.
- Returns
 - log_normalized_gram: np.ndarray
 Log-magnitude, L2-normalized constant-Q spectrogram.
Beat Tracking¶
- class omnizart.feature.beat_for_drum.MadmomBeatTracking(num_threads=3)¶
 Extract beat information with madmom library.
Three different beat tracking methods are used together for producing a more stable beat tracking result.
Methods
process(audio_data)Generate beat tracking results with multiple approaches.
- process(audio_data)¶
 Generate beat tracking results with multiple approaches.
- omnizart.feature.beat_for_drum.extract_beat_with_madmom(audio_path, sampling_rate=44100)¶
 Extract beat position (in seconds) of the audio.
Extract beat with mixture of beat tracking techiniques using madmom.
- Parameters
 - audio_path: Path
 Path to the target audio
- sampling_rate: int
 Desired sampling to be resampled.
- Returns
 - beat_arr: 1D numpy array
 Contains beat positions in seconds.
- audio_len_sec: float
 Total length of the audio in seconds.
- omnizart.feature.beat_for_drum.extract_mini_beat_from_audio_path(audio_path, sampling_rate=44100, mini_beat_div_n=32)¶
 Wrapper of extracting mini beats from audio path.
- omnizart.feature.beat_for_drum.extract_mini_beat_from_beat_arr(beat_arr, audio_len_sec, mini_beat_div_n=32)¶
 Extract mini beats from the beat array.
Furhter split beat into shorter beat interval, which we call it mini beat, to increase the beat resolution. We use linear interpolation to generate the mini beats.
- Parameters
 - beat_arr: 1D numpy array
 Beat array generated by extract_beat_with_madmom.
- audio_len_sec: float
 Total length of the audio in seconds.
- mini_beat_div_n: int
 Number of mini beats in a single 4/4 measure.
- Returns
 - mini_beat_pos_t: 1D numpy array
 Positions of mini beats in seconds.