paddleaudio.features.layers

class paddleaudio.features.layers.LogMelSpectrogram(sr: int = 22050, n_fft: int = 512, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: str = 'hann', power: float = 2.0, center: bool = True, pad_mode: str = 'reflect', n_mels: int = 64, f_min: float = 50.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', ref_value: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None, dtype: str = 'float32')[source]

Bases: paddle.fluid.dygraph.layers.Layer

Compute log-mel-spectrogram feature of given signals, typically audio waveforms.

Parameters
  • sr (int, optional) -- Sample rate. Defaults to 22050.

  • n_fft (int, optional) -- The number of frequency components of the discrete Fourier transform. Defaults to 512.

  • hop_length (Optional[int], optional) -- The hop length of the short time FFT. If None, it is set to win_length//4. Defaults to None.

  • win_length (Optional[int], optional) -- The window length of the short time FFT. If None, it is set to same as n_fft. Defaults to None.

  • window (str, optional) -- The window function applied to the signal before the Fourier transform. Supported window functions: 'hamming', 'hann', 'kaiser', 'gaussian', 'exponential', 'triang', 'bohman', 'blackman', 'cosine', 'tukey', 'taylor'. Defaults to 'hann'.

  • power (float, optional) -- Exponent for the magnitude spectrogram. Defaults to 2.0.

  • center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to 'reflect'.

  • n_mels (int, optional) -- Number of mel bins. Defaults to 64.

  • f_min (float, optional) -- Minimum frequency in Hz. Defaults to 50.0.

  • f_max (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.

  • htk (bool, optional) -- Use HTK formula in computing fbank matrix. Defaults to False.

  • norm (Union[str, float], optional) -- Type of normalization in computing fbank matrix. Slaney-style is used by default. You can specify norm=1.0/2.0 to use customized p-norm normalization. Defaults to 'slaney'.

  • ref_value (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.

  • amin (float, optional) -- The minimum value of input magnitude. Defaults to 1e-10.

  • top_db (Optional[float], optional) -- The maximum db value of spectrogram. Defaults to None.

  • dtype (str, optional) -- Data type of input and window. Defaults to 'float32'.

forward(x: paddle.Tensor) paddle.Tensor[source]
Parameters

x (Tensor) -- Tensor of waveforms with shape (N, T)

Returns

Log mel spectrograms with shape (N, n_mels, num_frames).

Return type

Tensor

class paddleaudio.features.layers.MFCC(sr: int = 22050, n_mfcc: int = 40, n_fft: int = 512, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: str = 'hann', power: float = 2.0, center: bool = True, pad_mode: str = 'reflect', n_mels: int = 64, f_min: float = 50.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', ref_value: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None, dtype: str = paddle.float32)[source]

Bases: paddle.fluid.dygraph.layers.Layer

Compute mel frequency cepstral coefficients(MFCCs) feature of given waveforms.

Parameters
  • sr (int, optional) -- Sample rate. Defaults to 22050.

  • n_mfcc (int, optional) -- [description]. Defaults to 40.

  • n_fft (int, optional) -- The number of frequency components of the discrete Fourier transform. Defaults to 512.

  • hop_length (Optional[int], optional) -- The hop length of the short time FFT. If None, it is set to win_length//4. Defaults to None.

  • win_length (Optional[int], optional) -- The window length of the short time FFT. If None, it is set to same as n_fft. Defaults to None.

  • window (str, optional) -- The window function applied to the signal before the Fourier transform. Supported window functions: 'hamming', 'hann', 'kaiser', 'gaussian', 'exponential', 'triang', 'bohman', 'blackman', 'cosine', 'tukey', 'taylor'. Defaults to 'hann'.

  • power (float, optional) -- Exponent for the magnitude spectrogram. Defaults to 2.0.

  • center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to 'reflect'.

  • n_mels (int, optional) -- Number of mel bins. Defaults to 64.

  • f_min (float, optional) -- Minimum frequency in Hz. Defaults to 50.0.

  • f_max (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.

  • htk (bool, optional) -- Use HTK formula in computing fbank matrix. Defaults to False.

  • norm (Union[str, float], optional) -- Type of normalization in computing fbank matrix. Slaney-style is used by default. You can specify norm=1.0/2.0 to use customized p-norm normalization. Defaults to 'slaney'.

  • ref_value (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.

  • amin (float, optional) -- The minimum value of input magnitude. Defaults to 1e-10.

  • top_db (Optional[float], optional) -- The maximum db value of spectrogram. Defaults to None.

  • dtype (str, optional) -- Data type of input and window. Defaults to 'float32'.

forward(x: paddle.Tensor) paddle.Tensor[source]
Parameters

x (Tensor) -- Tensor of waveforms with shape (N, T)

Returns

Mel frequency cepstral coefficients with shape (N, n_mfcc, num_frames).

Return type

Tensor

class paddleaudio.features.layers.MelSpectrogram(sr: int = 22050, n_fft: int = 512, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: str = 'hann', power: float = 2.0, center: bool = True, pad_mode: str = 'reflect', n_mels: int = 64, f_min: float = 50.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', dtype: str = 'float32')[source]

Bases: paddle.fluid.dygraph.layers.Layer

Compute the melspectrogram of given signals, typically audio waveforms. It is computed by multiplying spectrogram with Mel filter bank matrix.

Parameters
  • sr (int, optional) -- Sample rate. Defaults to 22050.

  • n_fft (int, optional) -- The number of frequency components of the discrete Fourier transform. Defaults to 512.

  • hop_length (Optional[int], optional) -- The hop length of the short time FFT. If None, it is set to win_length//4. Defaults to None.

  • win_length (Optional[int], optional) -- The window length of the short time FFT. If None, it is set to same as n_fft. Defaults to None.

  • window (str, optional) -- The window function applied to the signal before the Fourier transform. Supported window functions: 'hamming', 'hann', 'kaiser', 'gaussian', 'exponential', 'triang', 'bohman', 'blackman', 'cosine', 'tukey', 'taylor'. Defaults to 'hann'.

  • power (float, optional) -- Exponent for the magnitude spectrogram. Defaults to 2.0.

  • center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to 'reflect'.

  • n_mels (int, optional) -- Number of mel bins. Defaults to 64.

  • f_min (float, optional) -- Minimum frequency in Hz. Defaults to 50.0.

  • f_max (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.

  • htk (bool, optional) -- Use HTK formula in computing fbank matrix. Defaults to False.

  • norm (Union[str, float], optional) -- Type of normalization in computing fbank matrix. Slaney-style is used by default. You can specify norm=1.0/2.0 to use customized p-norm normalization. Defaults to 'slaney'.

  • dtype (str, optional) -- Data type of input and window. Defaults to 'float32'.

forward(x: paddle.Tensor) paddle.Tensor[source]
Parameters

x (Tensor) -- Tensor of waveforms with shape (N, T)

Returns

Mel spectrograms with shape (N, n_mels, num_frames).

Return type

Tensor

class paddleaudio.features.layers.Spectrogram(n_fft: int = 512, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: str = 'hann', power: float = 2.0, center: bool = True, pad_mode: str = 'reflect', dtype: str = 'float32')[source]

Bases: paddle.fluid.dygraph.layers.Layer

Compute spectrogram of given signals, typically audio waveforms. The spectorgram is defined as the complex norm of the short-time Fourier transformation.

Parameters
  • n_fft (int, optional) -- The number of frequency components of the discrete Fourier transform. Defaults to 512.

  • hop_length (Optional[int], optional) -- The hop length of the short time FFT. If None, it is set to win_length//4. Defaults to None.

  • win_length (Optional[int], optional) -- The window length of the short time FFT. If None, it is set to same as n_fft. Defaults to None.

  • window (str, optional) -- The window function applied to the signal before the Fourier transform. Supported window functions: 'hamming', 'hann', 'kaiser', 'gaussian', 'exponential', 'triang', 'bohman', 'blackman', 'cosine', 'tukey', 'taylor'. Defaults to 'hann'.

  • power (float, optional) -- Exponent for the magnitude spectrogram. Defaults to 2.0.

  • center (bool, optional) -- Whether to pad x to make that the \(t imes hop\_length\) at the center of t-th frame. Defaults to True.

  • pad_mode (str, optional) -- Choose padding pattern when center is True. Defaults to 'reflect'.

  • dtype (str, optional) -- Data type of input and window. Defaults to 'float32'.

forward(x: paddle.Tensor) paddle.Tensor[source]
Parameters

x (Tensor) -- Tensor of waveforms with shape (N, T)

Returns

Spectrograms with shape (N, n_fft//2 + 1, num_frames).

Return type

Tensor