Feature¶
Collection of functions for processing raw audio into required input representation.
CFP¶
Author: Lisu
Mantainer: BreezeWhite
- omnizart.feature.cfp.STFT(x, fr, fs, Hop, h)¶
- omnizart.feature.cfp.cfp_filterbank(x, fr, fs, Hop, h, fc, tc, g, bin_per_octave)¶
- omnizart.feature.cfp.extract_cfp(filename, down_fs=44100, **kwargs)¶
CFP feature extraction function.
Given the audio path, returns the CFP feature. Will automatically process the feature in parallel to accelerate the computation.
- Parameters
- filename: Path
Path to the audio.
- hop: float
Hop size in seconds, with regard to the sampling rate.
- win_size: int
Window size.
- fr: float
Frequency resolution.
- fc: float
Lowest start frequency.
- tc: float
Inverse number of the highest frequency bound.
- g: list[float]
Power factor of the output STFT results.
- bin_per_octave: int
Number of bins in each octave.
- down_fs: int
Resample to this sampling rate, if the loaded audio has a different value.
- max_sample: int
Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.
- Returns
- Z
Multiplication of spectrum and cepstrum
- tfrL0
Spectrum of the audio.
- tfrLF
Generalized Cepstrum of Spectrum (GCoS).
- tfrLQ
Cepstrum of the audio
- cen_freq
Central frequencies to each feature.
References
The CFP approach was first proposed in [1]
- 1
L. Su and Y. Yang, “Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.
- omnizart.feature.cfp.extract_patch_cfp(filename, patch_size=25, threshold=0.5, hop=0.02, win_size=2049, fr=2.0, fc=80.0, tc=0.001, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=16000, max_sample=2000)¶
Extract patch CFP feature for PatchCNN module.
- Parameters
- filename: Path
Path to the audio
- patch_size: int
Height and width of each feature patch.
- threshold: float
Threshold for determine peaks.
- hop: float
Hop size in seconds, with regard to the sampling rate.
- win_size: int
Window size.
- fr: float
Frequency resolution.
- fc: float
Lowest start frequency.
- tc: float
Inverse number of the highest frequency bound.
- g: list[float]
Power factor of the output STFT results.
- bin_per_octave: int
Number of bins in each octave.
- down_fs: int
Resample to this sampling rate, if the loaded audio has a different value.
- max_sample: int
Maximum number of frames to be processed for each computation. Adjust to a smaller number if your RAM is not enough.
- Returns
- patch: 3D numpy array
Sequence of patch CFP features. The position of the patches are inferred according to the amplitude of the spectrogram.
- mapping: 2D numpy array
Records the original frequency and time index of each patch, having dimension of len(patch) x 2.
- Z: 2D numpy array
The original CFP feature. Dim: freq x time
- cenf: list[float]
Records the corresponding center frequencies of the frequency dimension.
- omnizart.feature.cfp.extract_vocal_cfp(filename, down_fs=16000, **kwargs)¶
Specialized CFP feature extraction for vocal submodule.
- omnizart.feature.cfp.freq_to_log_freq_mapping(tfr, f, fr, fc, tc, NumPerOct)¶
- omnizart.feature.cfp.nonlinear_func(X, g, cutoff)¶
- omnizart.feature.cfp.parallel_extract(x, samples, max_sample, fr, fs, Hop, h, fc, tc, g, bin_per_octave)¶
- omnizart.feature.cfp.quef_to_log_freq_mapping(ceps, q, fs, fc, tc, NumPerOct)¶
- omnizart.feature.cfp.spectral_flux(spec, invert=False, norm=True)¶
HCFP¶
- omnizart.feature.hcfp.extract_hcfp(filename, hop=0.02, win_size=7939, fr=2.0, g=[0.24, 0.6, 1], bin_per_octave=48, down_fs=44100, max_sample=2000, harmonic_num=6)¶
- omnizart.feature.hcfp.fetch_harmonic(data, cenf, ith_har, start_freq=27.5, num_per_octave=48, is_reverse=False)¶
CQT¶
- omnizart.feature.cqt.extract_cqt(audio_path, sampling_rate=44100, lowest_note=16, note_num=120, a_hop=256, pad_sec=1)¶
Compute some audio data’s constant-Q spectrogram, normalize, and log-scale it
- Parameters
- audio_data: Path
Path to the input audio.
- sampling_rate: int
Sampling rate the audio data is sampled at, should be
DOWN_SAMPLE_TO_SAPMLING_RATE
.- lowest_note: int
Lowest MIDI note number.
- note_num: int
Number of total notes. The highest note number would thus be lowest_note + note_num.
- a_hop: int
Hop size for computing CQT.
- pad_sec: float
Length of padding to the begin and the end of the raw audio data in seconds.
- Returns
- midi_gram: np.ndarray
Log-magnitude, L2-normalized constant-Q spectrogram of synthesized MIDI data.
- omnizart.feature.cqt.post_process_cqt(gram)¶
Normalize and log-scale a Constant-Q spectrogram
- Parameters
- gram: np.ndarray
Constant-Q spectrogram, constructed from
librosa.cqt
.
- Returns
- log_normalized_gram: np.ndarray
Log-magnitude, L2-normalized constant-Q spectrogram.
Beat Tracking¶
- class omnizart.feature.beat_for_drum.MadmomBeatTracking(num_threads=3)¶
Extract beat information with madmom library.
Three different beat tracking methods are used together for producing a more stable beat tracking result.
Methods
process
(audio_data)Generate beat tracking results with multiple approaches.
- process(audio_data)¶
Generate beat tracking results with multiple approaches.
- omnizart.feature.beat_for_drum.extract_beat_with_madmom(audio_path, sampling_rate=44100)¶
Extract beat position (in seconds) of the audio.
Extract beat with mixture of beat tracking techiniques using madmom.
- Parameters
- audio_path: Path
Path to the target audio
- sampling_rate: int
Desired sampling to be resampled.
- Returns
- beat_arr: 1D numpy array
Contains beat positions in seconds.
- audio_len_sec: float
Total length of the audio in seconds.
- omnizart.feature.beat_for_drum.extract_mini_beat_from_audio_path(audio_path, sampling_rate=44100, mini_beat_div_n=32)¶
Wrapper of extracting mini beats from audio path.
- omnizart.feature.beat_for_drum.extract_mini_beat_from_beat_arr(beat_arr, audio_len_sec, mini_beat_div_n=32)¶
Extract mini beats from the beat array.
Furhter split beat into shorter beat interval, which we call it mini beat, to increase the beat resolution. We use linear interpolation to generate the mini beats.
- Parameters
- beat_arr: 1D numpy array
Beat array generated by extract_beat_with_madmom.
- audio_len_sec: float
Total length of the audio in seconds.
- mini_beat_div_n: int
Number of mini beats in a single 4/4 measure.
- Returns
- mini_beat_pos_t: 1D numpy array
Positions of mini beats in seconds.