paddleaudio.compliance.librosa

paddleaudio.compliance.librosa.adaptive_spect_augment(spect: numpy.ndarray, tempo_axis: int = 0, level: float = 0.1) numpy.ndarray[source]

Do adpative spectrogram augmentation. The level of the augmentation is gowern by the paramter level, ranging from 0 to 1, with 0 represents no augmentation.

Parameters
  • spect (np.ndarray) -- Input spectrogram.

  • tempo_axis (int, optional) -- Indicate the tempo axis. Defaults to 0.

  • level (float, optional) -- The level factor of masking. Defaults to 0.1.

Returns

The augmented spectrogram.

Return type

np.ndarray

paddleaudio.compliance.librosa.compute_fbank_matrix(sr: int, n_fft: int, n_mels: int = 128, fmin: float = 0.0, fmax: typing.Optional[float] = None, htk: bool = False, norm: str = 'slaney', dtype: type = <class 'numpy.float32'>) numpy.ndarray[source]

Compute fbank matrix.

Parameters
  • sr (int) -- Sample rate.

  • n_fft (int) -- FFT size.

  • n_mels (int, optional) -- Number of mel bins. Defaults to 128.

  • fmin (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.

  • fmax (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

  • norm (str, optional) -- Type of normalization. Defaults to "slaney".

  • dtype (type, optional) -- Data type. Defaults to np.float32.

Returns

Mel transform matrix with shape (n_mels, n_fft//2 + 1).

Return type

np.ndarray

paddleaudio.compliance.librosa.depth_augment(y: numpy.ndarray, choices: List = ['int8', 'int16'], probs: List[float] = [0.5, 0.5]) numpy.ndarray[source]

Audio depth augmentation. Do audio depth augmentation to simulate the distortion brought by quantization.

Parameters
  • y (np.ndarray) -- Input waveform array in 1D or 2D.

  • choices (List, optional) -- A list of data type to depth conversion. Defaults to ['int8', 'int16'].

  • probs (List[float], optional) -- Probabilities to depth conversion. Defaults to [0.5, 0.5].

Returns

The augmented waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.hz_to_mel(frequencies: Union[float, List[float], numpy.ndarray], htk: bool = False) numpy.ndarray[source]

Convert Hz to Mels.

Parameters
  • frequencies (Union[float, List[float], np.ndarray]) -- Frequencies in Hz.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Frequency in mels.

Return type

np.ndarray

paddleaudio.compliance.librosa.mel_frequencies(n_mels: int = 128, fmin: float = 0.0, fmax: float = 11025.0, htk: bool = False) numpy.ndarray[source]

Compute mel frequencies.

Parameters
  • n_mels (int, optional) -- Number of mel bins. Defaults to 128.

  • fmin (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.

  • fmax (float, optional) -- Maximum frequency in Hz. Defaults to 11025.0.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Vector of n_mels frequencies in Hz with shape (n_mels,).

Return type

np.ndarray

paddleaudio.compliance.librosa.mel_to_hz(mels: Union[float, List[float], numpy.ndarray], htk: int = False) numpy.ndarray[source]

Convert mel bin numbers to frequencies.

Parameters
  • mels (Union[float, List[float], np.ndarray]) -- Frequency in mels.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Frequencies in Hz.

Return type

np.ndarray

paddleaudio.compliance.librosa.melspectrogram(x: numpy.ndarray, sr: int = 16000, window_size: int = 512, hop_length: int = 320, n_mels: int = 64, fmin: float = 50.0, fmax: Optional[float] = None, window: str = 'hann', center: bool = True, pad_mode: str = 'reflect', power: float = 2.0, to_db: bool = True, ref: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None) numpy.ndarray[source]

Compute mel-spectrogram.

Parameters
  • x (np.ndarray) -- Input waveform in one dimension.

  • sr (int, optional) -- Sample rate. Defaults to 16000.

  • window_size (int, optional) -- Size of FFT and window length. Defaults to 512.

  • hop_length (int, optional) -- Number of steps to advance between adjacent windows. Defaults to 320.

  • n_mels (int, optional) -- Number of mel bins. Defaults to 64.

  • fmin (float, optional) -- Minimum frequency in Hz. Defaults to 50.0.

  • fmax (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.

  • window (str, optional) -- A string of window specification. Defaults to "hann".

  • center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".

  • power (float, optional) -- Exponent for the magnitude melspectrogram. Defaults to 2.0.

  • to_db (bool, optional) -- Enable db scale. Defaults to True.

  • ref (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.

  • amin (float, optional) -- Minimum threshold. Defaults to 1e-10.

  • top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to None.

Returns

The mel-spectrogram in power scale or db scale with shape (n_mels, num_frames).

Return type

np.ndarray

paddleaudio.compliance.librosa.mfcc(x: numpy.ndarray, sr: int = 16000, spect: Optional[numpy.ndarray] = None, n_mfcc: int = 20, dct_type: int = 2, norm: str = 'ortho', lifter: int = 0, **kwargs) numpy.ndarray[source]

Mel-frequency cepstral coefficients (MFCCs)

Parameters
  • x (np.ndarray) -- Input waveform in one dimension.

  • sr (int, optional) -- Sample rate. Defaults to 16000.

  • spect (Optional[np.ndarray], optional) -- Input log-power Mel spectrogram. Defaults to None.

  • n_mfcc (int, optional) -- Number of cepstra in MFCC. Defaults to 20.

  • dct_type (int, optional) -- Discrete cosine transform (DCT) type. Defaults to 2.

  • norm (str, optional) -- Type of normalization. Defaults to "ortho".

  • lifter (int, optional) -- Cepstral filtering. Defaults to 0.

Returns

Mel frequency cepstral coefficients array with shape (n_mfcc, num_frames).

Return type

np.ndarray

paddleaudio.compliance.librosa.mu_decode(y: numpy.ndarray, mu: int = 255, quantized: bool = True) numpy.ndarray[source]

Mu-law decoding. Compute the mu-law decoding given an input code. It assumes that the input y is in range [0,mu-1] when quantize is True and [-1,1] otherwise.

Parameters
  • y (np.ndarray) -- The encoded waveform.

  • mu (int, optional) -- The endoceding parameter. Defaults to 255.

  • quantized (bool, optional) -- If True, the input is assumed to be quantized to 1 + mu distinct integer values. Defaults to True.

Returns

The mu-law decoded waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.mu_encode(x: numpy.ndarray, mu: int = 255, quantized: bool = True) numpy.ndarray[source]

Mu-law encoding. Encode waveform based on mu-law companding. When quantized is True, the result will be converted to integer in range [0,mu-1]. Otherwise, the resulting waveform is in range [-1,1].

Parameters
  • x (np.ndarray) -- The input waveform to encode.

  • mu (int, optional) -- The endoceding parameter. Defaults to 255.

  • quantized (bool, optional) -- If True, quantize the encoded values into 1 + mu distinct integer values. Defaults to True.

Returns

The mu-law encoded waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.power_to_db(spect: numpy.ndarray, ref: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = 80.0) numpy.ndarray[source]

Convert a power spectrogram (amplitude squared) to decibel (dB) units. The function computes the scaling 10 * log10(x / ref) in a numerically stable way.

Parameters
  • spect (np.ndarray) -- STFT power spectrogram of an input waveform.

  • ref (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.

  • amin (float, optional) -- Minimum threshold. Defaults to 1e-10.

  • top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to 80.0.

Returns

Power spectrogram in db scale.

Return type

np.ndarray

paddleaudio.compliance.librosa.random_crop1d(y: numpy.ndarray, crop_len: int) numpy.ndarray[source]

Random cropping on a input waveform.

Parameters
  • y (np.ndarray) -- Input waveform array in 1D.

  • crop_len (int) -- Length of waveform to crop.

Returns

The cropped waveform.

Return type

np.ndarray

paddleaudio.compliance.librosa.random_crop2d(s: numpy.ndarray, crop_len: int, tempo_axis: int = 0) numpy.ndarray[source]

Random cropping on a spectrogram.

Parameters
  • s (np.ndarray) -- Input spectrogram in 2D.

  • crop_len (int) -- Length of spectrogram to crop.

  • tempo_axis (int, optional) -- Indicate the tempo axis. Defaults to 0.

Returns

The cropped spectrogram.

Return type

np.ndarray

paddleaudio.compliance.librosa.spect_augment(spect: numpy.ndarray, tempo_axis: int = 0, max_time_mask: int = 3, max_freq_mask: int = 3, max_time_mask_width: int = 30, max_freq_mask_width: int = 20) numpy.ndarray[source]

Do spectrogram augmentation in both time and freq axis.

Parameters
  • spect (np.ndarray) -- Input spectrogram.

  • tempo_axis (int, optional) -- Indicate the tempo axis. Defaults to 0.

  • max_time_mask (int, optional) -- Maximum number of time masking. Defaults to 3.

  • max_freq_mask (int, optional) -- Maximum number of frenquence masking. Defaults to 3.

  • max_time_mask_width (int, optional) -- Maximum width of time masking. Defaults to 30.

  • max_freq_mask_width (int, optional) -- Maximum width of frenquence masking. Defaults to 20.

Returns

The augmented spectrogram.

Return type

np.ndarray

paddleaudio.compliance.librosa.spectrogram(x: numpy.ndarray, sr: int = 16000, window_size: int = 512, hop_length: int = 320, window: str = 'hann', center: bool = True, pad_mode: str = 'reflect', power: float = 2.0) numpy.ndarray[source]

Compute spectrogram.

Parameters
  • x (np.ndarray) -- Input waveform in one dimension.

  • sr (int, optional) -- Sample rate. Defaults to 16000.

  • window_size (int, optional) -- Size of FFT and window length. Defaults to 512.

  • hop_length (int, optional) -- Number of steps to advance between adjacent windows. Defaults to 320.

  • window (str, optional) -- A string of window specification. Defaults to "hann".

  • center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".

  • power (float, optional) -- Exponent for the magnitude melspectrogram. Defaults to 2.0.

Returns

The STFT spectrogram in power scale (n_fft//2 + 1, num_frames).

Return type

np.ndarray

paddleaudio.compliance.librosa.stft(x: numpy.ndarray, n_fft: int = 2048, hop_length: typing.Optional[int] = None, win_length: typing.Optional[int] = None, window: str = 'hann', center: bool = True, dtype: type = <class 'numpy.complex64'>, pad_mode: str = 'reflect') numpy.ndarray[source]

Short-time Fourier transform (STFT).

Parameters
  • x (np.ndarray) -- Input waveform in one dimension.

  • n_fft (int, optional) -- FFT size. Defaults to 2048.

  • hop_length (Optional[int], optional) -- Number of steps to advance between adjacent windows. Defaults to None.

  • win_length (Optional[int], optional) -- The size of window. Defaults to None.

  • window (str, optional) -- A string of window specification. Defaults to "hann".

  • center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • dtype (type, optional) -- Data type of STFT results. Defaults to np.complex64.

  • pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to "reflect".

Returns

The complex STFT output with shape (n_fft//2 + 1, num_frames).

Return type

np.ndarray