paddleaudio.compliance.librosa¶
- paddleaudio.compliance.librosa.adaptive_spect_augment(spect: numpy.ndarray, tempo_axis: int = 0, level: float = 0.1) numpy.ndarray[source]¶
Do adpative spectrogram augmentation. The level of the augmentation is gowern by the paramter level, ranging from 0 to 1, with 0 represents no augmentation.
- paddleaudio.compliance.librosa.compute_fbank_matrix(sr: int, n_fft: int, n_mels: int = 128, fmin: float = 0.0, fmax: typing.Optional[float] = None, htk: bool = False, norm: str = 'slaney', dtype: type = <class 'numpy.float32'>) numpy.ndarray[source]¶
Compute fbank matrix.
- Parameters
sr (int) -- Sample rate.
n_fft (int) -- FFT size.
n_mels (int, optional) -- Number of mel bins. Defaults to 128.
fmin (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.
fmax (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.
htk (bool, optional) -- Use htk scaling. Defaults to False.
norm (str, optional) -- Type of normalization. Defaults to "slaney".
dtype (type, optional) -- Data type. Defaults to np.float32.
- Returns
Mel transform matrix with shape (n_mels, n_fft//2 + 1).
- Return type
np.ndarray
- paddleaudio.compliance.librosa.depth_augment(y: numpy.ndarray, choices: List = ['int8', 'int16'], probs: List[float] = [0.5, 0.5]) numpy.ndarray[source]¶
Audio depth augmentation. Do audio depth augmentation to simulate the distortion brought by quantization.
- Parameters
y (np.ndarray) -- Input waveform array in 1D or 2D.
choices (List, optional) -- A list of data type to depth conversion. Defaults to ['int8', 'int16'].
probs (List[float], optional) -- Probabilities to depth conversion. Defaults to [0.5, 0.5].
- Returns
The augmented waveform.
- Return type
np.ndarray
- paddleaudio.compliance.librosa.hz_to_mel(frequencies: Union[float, List[float], numpy.ndarray], htk: bool = False) numpy.ndarray[source]¶
Convert Hz to Mels.
- paddleaudio.compliance.librosa.mel_frequencies(n_mels: int = 128, fmin: float = 0.0, fmax: float = 11025.0, htk: bool = False) numpy.ndarray[source]¶
Compute mel frequencies.
- Parameters
- Returns
Vector of n_mels frequencies in Hz with shape (n_mels,).
- Return type
np.ndarray
- paddleaudio.compliance.librosa.mel_to_hz(mels: Union[float, List[float], numpy.ndarray], htk: int = False) numpy.ndarray[source]¶
Convert mel bin numbers to frequencies.
- paddleaudio.compliance.librosa.melspectrogram(x: numpy.ndarray, sr: int = 16000, window_size: int = 512, hop_length: int = 320, n_mels: int = 64, fmin: float = 50.0, fmax: Optional[float] = None, window: str = 'hann', center: bool = True, pad_mode: str = 'reflect', power: float = 2.0, to_db: bool = True, ref: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None) numpy.ndarray[source]¶
Compute mel-spectrogram.
- Parameters
x (np.ndarray) -- Input waveform in one dimension.
sr (int, optional) -- Sample rate. Defaults to 16000.
window_size (int, optional) -- Size of FFT and window length. Defaults to 512.
hop_length (int, optional) -- Number of steps to advance between adjacent windows. Defaults to 320.
n_mels (int, optional) -- Number of mel bins. Defaults to 64.
fmin (float, optional) -- Minimum frequency in Hz. Defaults to 50.0.
fmax (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.
window (str, optional) -- A string of window specification. Defaults to "hann".
center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.
pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".
power (float, optional) -- Exponent for the magnitude melspectrogram. Defaults to 2.0.
to_db (bool, optional) -- Enable db scale. Defaults to True.
ref (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.
amin (float, optional) -- Minimum threshold. Defaults to 1e-10.
top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to None.
- Returns
The mel-spectrogram in power scale or db scale with shape (n_mels, num_frames).
- Return type
np.ndarray
- paddleaudio.compliance.librosa.mfcc(x: numpy.ndarray, sr: int = 16000, spect: Optional[numpy.ndarray] = None, n_mfcc: int = 20, dct_type: int = 2, norm: str = 'ortho', lifter: int = 0, **kwargs) numpy.ndarray[source]¶
Mel-frequency cepstral coefficients (MFCCs)
- Parameters
x (np.ndarray) -- Input waveform in one dimension.
sr (int, optional) -- Sample rate. Defaults to 16000.
spect (Optional[np.ndarray], optional) -- Input log-power Mel spectrogram. Defaults to None.
n_mfcc (int, optional) -- Number of cepstra in MFCC. Defaults to 20.
dct_type (int, optional) -- Discrete cosine transform (DCT) type. Defaults to 2.
norm (str, optional) -- Type of normalization. Defaults to "ortho".
lifter (int, optional) -- Cepstral filtering. Defaults to 0.
- Returns
Mel frequency cepstral coefficients array with shape (n_mfcc, num_frames).
- Return type
np.ndarray
- paddleaudio.compliance.librosa.mu_decode(y: numpy.ndarray, mu: int = 255, quantized: bool = True) numpy.ndarray[source]¶
Mu-law decoding. Compute the mu-law decoding given an input code. It assumes that the input y is in range [0,mu-1] when quantize is True and [-1,1] otherwise.
- Parameters
- Returns
The mu-law decoded waveform.
- Return type
np.ndarray
- paddleaudio.compliance.librosa.mu_encode(x: numpy.ndarray, mu: int = 255, quantized: bool = True) numpy.ndarray[source]¶
Mu-law encoding. Encode waveform based on mu-law companding. When quantized is True, the result will be converted to integer in range [0,mu-1]. Otherwise, the resulting waveform is in range [-1,1].
- Parameters
- Returns
The mu-law encoded waveform.
- Return type
np.ndarray
- paddleaudio.compliance.librosa.power_to_db(spect: numpy.ndarray, ref: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = 80.0) numpy.ndarray[source]¶
Convert a power spectrogram (amplitude squared) to decibel (dB) units. The function computes the scaling 10 * log10(x / ref) in a numerically stable way.
- Parameters
spect (np.ndarray) -- STFT power spectrogram of an input waveform.
ref (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.
amin (float, optional) -- Minimum threshold. Defaults to 1e-10.
top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to 80.0.
- Returns
Power spectrogram in db scale.
- Return type
np.ndarray
- paddleaudio.compliance.librosa.random_crop1d(y: numpy.ndarray, crop_len: int) numpy.ndarray[source]¶
Random cropping on a input waveform.
- Parameters
y (np.ndarray) -- Input waveform array in 1D.
crop_len (int) -- Length of waveform to crop.
- Returns
The cropped waveform.
- Return type
np.ndarray
- paddleaudio.compliance.librosa.random_crop2d(s: numpy.ndarray, crop_len: int, tempo_axis: int = 0) numpy.ndarray[source]¶
Random cropping on a spectrogram.
- paddleaudio.compliance.librosa.spect_augment(spect: numpy.ndarray, tempo_axis: int = 0, max_time_mask: int = 3, max_freq_mask: int = 3, max_time_mask_width: int = 30, max_freq_mask_width: int = 20) numpy.ndarray[source]¶
Do spectrogram augmentation in both time and freq axis.
- Parameters
spect (np.ndarray) -- Input spectrogram.
tempo_axis (int, optional) -- Indicate the tempo axis. Defaults to 0.
max_time_mask (int, optional) -- Maximum number of time masking. Defaults to 3.
max_freq_mask (int, optional) -- Maximum number of frenquence masking. Defaults to 3.
max_time_mask_width (int, optional) -- Maximum width of time masking. Defaults to 30.
max_freq_mask_width (int, optional) -- Maximum width of frenquence masking. Defaults to 20.
- Returns
The augmented spectrogram.
- Return type
np.ndarray
- paddleaudio.compliance.librosa.spectrogram(x: numpy.ndarray, sr: int = 16000, window_size: int = 512, hop_length: int = 320, window: str = 'hann', center: bool = True, pad_mode: str = 'reflect', power: float = 2.0) numpy.ndarray[source]¶
Compute spectrogram.
- Parameters
x (np.ndarray) -- Input waveform in one dimension.
sr (int, optional) -- Sample rate. Defaults to 16000.
window_size (int, optional) -- Size of FFT and window length. Defaults to 512.
hop_length (int, optional) -- Number of steps to advance between adjacent windows. Defaults to 320.
window (str, optional) -- A string of window specification. Defaults to "hann".
center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.
pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".
power (float, optional) -- Exponent for the magnitude melspectrogram. Defaults to 2.0.
- Returns
The STFT spectrogram in power scale (n_fft//2 + 1, num_frames).
- Return type
np.ndarray
- paddleaudio.compliance.librosa.stft(x: numpy.ndarray, n_fft: int = 2048, hop_length: typing.Optional[int] = None, win_length: typing.Optional[int] = None, window: str = 'hann', center: bool = True, dtype: type = <class 'numpy.complex64'>, pad_mode: str = 'reflect') numpy.ndarray[source]¶
Short-time Fourier transform (STFT).
- Parameters
x (np.ndarray) -- Input waveform in one dimension.
n_fft (int, optional) -- FFT size. Defaults to 2048.
hop_length (Optional[int], optional) -- Number of steps to advance between adjacent windows. Defaults to None.
win_length (Optional[int], optional) -- The size of window. Defaults to None.
window (str, optional) -- A string of window specification. Defaults to "hann".
center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.
dtype (type, optional) -- Data type of STFT results. Defaults to np.complex64.
pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".
- Returns
The complex STFT output with shape (n_fft//2 + 1, num_frames).
- Return type
np.ndarray