paddleaudio.compliance.librosa¶

paddleaudio.compliance.librosa.adaptive_spect_augment(spect: numpy.ndarray, tempo_axis: int = 0, level: float = 0.1) → numpy.ndarray[source]¶

Do adpative spectrogram augmentation. The level of the augmentation is gowern by the paramter level, ranging from 0 to 1, with 0 represents no augmentation.

Parameters

spect (np.ndarray) -- Input spectrogram.
tempo_axis (int, optional) -- Indicate the tempo axis. Defaults to 0.
level (float, optional) -- The level factor of masking. Defaults to 0.1.

Returns

The augmented spectrogram.

Return type

np.ndarray

paddleaudio.compliance.librosa.compute_fbank_matrix(sr: int, n_fft: int, n_mels: int = 128, fmin: float = 0.0, fmax: typing.Optional[float] = None, htk: bool = False, norm: str = 'slaney', dtype: type = <class 'numpy.float32'>) → numpy.ndarray[source]¶

Compute fbank matrix.

Parameters

sr (int) -- Sample rate.
n_fft (int) -- FFT size.
n_mels (int, optional) -- Number of mel bins. Defaults to 128.
fmin (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.
fmax (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.
htk (bool, optional) -- Use htk scaling. Defaults to False.
norm (str, optional) -- Type of normalization. Defaults to "slaney".
dtype (type, optional) -- Data type. Defaults to np.float32.

Returns

Mel transform matrix with shape (n_mels, n_fft//2 + 1).

Return type

np.ndarray

paddleaudio.compliance.librosa.depth_augment(y: numpy.ndarray, choices: List = ['int8', 'int16'], probs: List[float] = [0.5, 0.5]) → numpy.ndarray[source]¶

Audio depth augmentation. Do audio depth augmentation to simulate the distortion brought by quantization.

Parameters

y (np.ndarray) -- Input waveform array in 1D or 2D.
choices (List, optional) -- A list of data type to depth conversion. Defaults to ['int8', 'int16'].
probs (List[float], optional) -- Probabilities to depth conversion. Defaults to [0.5, 0.5].

Returns

The augmented waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.hz_to_mel(frequencies: Union[float, List[float], numpy.ndarray], htk: bool = False) → numpy.ndarray[source]¶

Convert Hz to Mels.

Parameters

frequencies (Union[float, List[float], np.ndarray]) -- Frequencies in Hz.
htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Frequency in mels.

Return type

np.ndarray

paddleaudio.compliance.librosa.mel_frequencies(n_mels: int = 128, fmin: float = 0.0, fmax: float = 11025.0, htk: bool = False) → numpy.ndarray[source]¶

Compute mel frequencies.

Parameters

n_mels (int, optional) -- Number of mel bins. Defaults to 128.
fmin (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.
fmax (float, optional) -- Maximum frequency in Hz. Defaults to 11025.0.
htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Vector of n_mels frequencies in Hz with shape (n_mels,).

Return type

np.ndarray

paddleaudio.compliance.librosa.mel_to_hz(mels: Union[float, List[float], numpy.ndarray], htk: int = False) → numpy.ndarray[source]¶

Convert mel bin numbers to frequencies.

Parameters

mels (Union[float, List[float], np.ndarray]) -- Frequency in mels.
htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Frequencies in Hz.

Return type

np.ndarray

paddleaudio.compliance.librosa.melspectrogram(x: numpy.ndarray, sr: int = 16000, window_size: int = 512, hop_length: int = 320, n_mels: int = 64, fmin: float = 50.0, fmax: Optional[float] = None, window: str = 'hann', center: bool = True, pad_mode: str = 'reflect', power: float = 2.0, to_db: bool = True, ref: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None) → numpy.ndarray[source]¶

Compute mel-spectrogram.

Parameters

x (np.ndarray) -- Input waveform in one dimension.
sr (int, optional) -- Sample rate. Defaults to 16000.
window_size (int, optional) -- Size of FFT and window length. Defaults to 512.
hop_length (int, optional) -- Number of steps to advance between adjacent windows. Defaults to 320.
n_mels (int, optional) -- Number of mel bins. Defaults to 64.
fmin (float, optional) -- Minimum frequency in Hz. Defaults to 50.0.
fmax (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.
window (str, optional) -- A string of window specification. Defaults to "hann".
center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.
pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".
power (float, optional) -- Exponent for the magnitude melspectrogram. Defaults to 2.0.
to_db (bool, optional) -- Enable db scale. Defaults to True.
ref (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.
amin (float, optional) -- Minimum threshold. Defaults to 1e-10.
top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to None.

Returns

The mel-spectrogram in power scale or db scale with shape (n_mels, num_frames).

Return type

np.ndarray

paddleaudio.compliance.librosa.mfcc(x: numpy.ndarray, sr: int = 16000, spect: Optional[numpy.ndarray] = None, n_mfcc: int = 20, dct_type: int = 2, norm: str = 'ortho', lifter: int = 0, **kwargs) → numpy.ndarray[source]¶

Mel-frequency cepstral coefficients (MFCCs)

Parameters

x (np.ndarray) -- Input waveform in one dimension.
sr (int, optional) -- Sample rate. Defaults to 16000.
spect (Optional[np.ndarray], optional) -- Input log-power Mel spectrogram. Defaults to None.
n_mfcc (int, optional) -- Number of cepstra in MFCC. Defaults to 20.
dct_type (int, optional) -- Discrete cosine transform (DCT) type. Defaults to 2.
norm (str, optional) -- Type of normalization. Defaults to "ortho".
lifter (int, optional) -- Cepstral filtering. Defaults to 0.

Returns

Mel frequency cepstral coefficients array with shape (n_mfcc, num_frames).

Return type

np.ndarray

paddleaudio.compliance.librosa.mu_decode(y: numpy.ndarray, mu: int = 255, quantized: bool = True) → numpy.ndarray[source]¶

Mu-law decoding. Compute the mu-law decoding given an input code. It assumes that the input y is in range [0,mu-1] when quantize is True and [-1,1] otherwise.

Parameters

y (np.ndarray) -- The encoded waveform.
mu (int, optional) -- The endoceding parameter. Defaults to 255.
quantized (bool, optional) -- If True, the input is assumed to be quantized to 1 + mu distinct integer values. Defaults to True.

Returns

The mu-law decoded waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.mu_encode(x: numpy.ndarray, mu: int = 255, quantized: bool = True) → numpy.ndarray[source]¶

Mu-law encoding. Encode waveform based on mu-law companding. When quantized is True, the result will be converted to integer in range [0,mu-1]. Otherwise, the resulting waveform is in range [-1,1].

Parameters

x (np.ndarray) -- The input waveform to encode.
mu (int, optional) -- The endoceding parameter. Defaults to 255.
quantized (bool, optional) -- If True, quantize the encoded values into 1 + mu distinct integer values. Defaults to True.

Returns

The mu-law encoded waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.power_to_db(spect: numpy.ndarray, ref: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = 80.0) → numpy.ndarray[source]¶

Convert a power spectrogram (amplitude squared) to decibel (dB) units. The function computes the scaling 10 * log10(x / ref) in a numerically stable way.

Parameters

spect (np.ndarray) -- STFT power spectrogram of an input waveform.
ref (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.
amin (float, optional) -- Minimum threshold. Defaults to 1e-10.
top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to 80.0.

Returns

Power spectrogram in db scale.

Return type

np.ndarray

paddleaudio.compliance.librosa.random_crop1d(y: numpy.ndarray, crop_len: int) → numpy.ndarray[source]¶

Random cropping on a input waveform.

Parameters

y (np.ndarray) -- Input waveform array in 1D.
crop_len (int) -- Length of waveform to crop.

Returns

The cropped waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.random_crop2d(s: numpy.ndarray, crop_len: int, tempo_axis: int = 0) → numpy.ndarray[source]¶

Random cropping on a spectrogram.

Parameters

s (np.ndarray) -- Input spectrogram in 2D.
crop_len (int) -- Length of spectrogram to crop.
tempo_axis (int, optional) -- Indicate the tempo axis. Defaults to 0.

Returns

The cropped spectrogram.

Return type

np.ndarray

paddleaudio.compliance.librosa.spect_augment(spect: numpy.ndarray, tempo_axis: int = 0, max_time_mask: int = 3, max_freq_mask: int = 3, max_time_mask_width: int = 30, max_freq_mask_width: int = 20) → numpy.ndarray[source]¶

Do spectrogram augmentation in both time and freq axis.

Parameters

spect (np.ndarray) -- Input spectrogram.
tempo_axis (int, optional) -- Indicate the tempo axis. Defaults to 0.
max_time_mask (int, optional) -- Maximum number of time masking. Defaults to 3.
max_freq_mask (int, optional) -- Maximum number of frenquence masking. Defaults to 3.
max_time_mask_width (int, optional) -- Maximum width of time masking. Defaults to 30.
max_freq_mask_width (int, optional) -- Maximum width of frenquence masking. Defaults to 20.

Returns

The augmented spectrogram.

Return type

np.ndarray

paddleaudio.compliance.librosa.spectrogram(x: numpy.ndarray, sr: int = 16000, window_size: int = 512, hop_length: int = 320, window: str = 'hann', center: bool = True, pad_mode: str = 'reflect', power: float = 2.0) → numpy.ndarray[source]¶

Compute spectrogram.

Parameters

x (np.ndarray) -- Input waveform in one dimension.
sr (int, optional) -- Sample rate. Defaults to 16000.
window_size (int, optional) -- Size of FFT and window length. Defaults to 512.
hop_length (int, optional) -- Number of steps to advance between adjacent windows. Defaults to 320.
window (str, optional) -- A string of window specification. Defaults to "hann".
center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.
pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".
power (float, optional) -- Exponent for the magnitude melspectrogram. Defaults to 2.0.

Returns

The STFT spectrogram in power scale (n_fft//2 + 1, num_frames).

Return type

np.ndarray

paddleaudio.compliance.librosa.stft(x: numpy.ndarray, n_fft: int = 2048, hop_length: typing.Optional[int] = None, win_length: typing.Optional[int] = None, window: str = 'hann', center: bool = True, dtype: type = <class 'numpy.complex64'>, pad_mode: str = 'reflect') → numpy.ndarray[source]¶

Short-time Fourier transform (STFT).

Parameters

x (np.ndarray) -- Input waveform in one dimension.
n_fft (int, optional) -- FFT size. Defaults to 2048.
hop_length (Optional[int], optional) -- Number of steps to advance between adjacent windows. Defaults to None.
win_length (Optional[int], optional) -- The size of window. Defaults to None.
window (str, optional) -- A string of window specification. Defaults to "hann".
center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.
dtype (type, optional) -- Data type of STFT results. Defaults to np.complex64.
pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".

Returns

The complex STFT output with shape (n_fft//2 + 1, num_frames).

Return type

np.ndarray