paddleaudio.functional.functional

paddleaudio.functional.functional.compute_fbank_matrix(sr: int, n_fft: int, n_mels: int = 64, f_min: float = 0.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', dtype: str = 'float32') paddle.Tensor[source]

Compute fbank matrix.

Parameters
  • sr (int) -- Sample rate.

  • n_fft (int) -- Number of fft bins.

  • n_mels (int, optional) -- Number of mel bins. Defaults to 64.

  • f_min (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.

  • f_max (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

  • norm (Union[str, float], optional) -- Type of normalization. Defaults to 'slaney'.

  • dtype (str, optional) -- The data type of the return matrix. Defaults to 'float32'.

Returns

Mel transform matrix with shape (n_mels, n_fft//2 + 1).

Return type

Tensor

paddleaudio.functional.functional.create_dct(n_mfcc: int, n_mels: int, norm: Optional[str] = 'ortho', dtype: str = 'float32') paddle.Tensor[source]

Create a discrete cosine transform(DCT) matrix.

Parameters
  • n_mfcc (int) -- Number of mel frequency cepstral coefficients.

  • n_mels (int) -- Number of mel filterbanks.

  • norm (Optional[str], optional) -- Normalizaiton type. Defaults to 'ortho'.

  • dtype (str, optional) -- The data type of the return matrix. Defaults to 'float32'.

Returns

The DCT matrix with shape (n_mels, n_mfcc).

Return type

Tensor

paddleaudio.functional.functional.fft_frequencies(sr: int, n_fft: int, dtype: str = 'float32') paddle.Tensor[source]

Compute fourier frequencies.

Parameters
  • sr (int) -- Sample rate.

  • n_fft (int) -- Number of fft bins.

  • dtype (str, optional) -- The data type of the return frequencies. Defaults to 'float32'.

Returns

FFT frequencies in Hz with shape (n_fft//2 + 1,).

Return type

Tensor

paddleaudio.functional.functional.hz_to_mel(freq: Union[paddle.Tensor, float], htk: bool = False) Union[paddle.Tensor, float][source]

Convert Hz to Mels.

Parameters
  • freq (Union[Tensor, float]) -- The input tensor with arbitrary shape.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Frequency in mels.

Return type

Union[Tensor, float]

paddleaudio.functional.functional.mel_frequencies(n_mels: int = 64, f_min: float = 0.0, f_max: float = 11025.0, htk: bool = False, dtype: str = 'float32') paddle.Tensor[source]

Compute mel frequencies.

Parameters
  • n_mels (int, optional) -- Number of mel bins. Defaults to 64.

  • f_min (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.

  • fmax (float, optional) -- Maximum frequency in Hz. Defaults to 11025.0.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

  • dtype (str, optional) -- The data type of the return frequencies. Defaults to 'float32'.

Returns

Tensor of n_mels frequencies in Hz with shape (n_mels,).

Return type

Tensor

paddleaudio.functional.functional.mel_to_hz(mel: Union[float, paddle.Tensor], htk: bool = False) Union[float, paddle.Tensor][source]

Convert mel bin numbers to frequencies.

Parameters
  • mel (Union[float, Tensor]) -- The mel frequency represented as a tensor with arbitrary shape.

  • htk (bool, optional) -- Use htk scaling. Defaults to False.

Returns

Frequencies in Hz.

Return type

Union[float, Tensor]

paddleaudio.functional.functional.power_to_db(spect: paddle.Tensor, ref_value: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None) paddle.Tensor[source]

Convert a power spectrogram (amplitude squared) to decibel (dB) units. The function computes the scaling 10 * log10(x / ref) in a numerically stable way.

Parameters
  • spect (Tensor) -- STFT power spectrogram.

  • ref_value (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.

  • amin (float, optional) -- Minimum threshold. Defaults to 1e-10.

  • top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to None.

Returns

Power spectrogram in db scale.

Return type

Tensor