paddleaudio.functional.functional¶
- paddleaudio.functional.functional.compute_fbank_matrix(sr: int, n_fft: int, n_mels: int = 64, f_min: float = 0.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', dtype: str = 'float32') paddle.Tensor[source]¶
Compute fbank matrix.
- Parameters
sr (int) -- Sample rate.
n_fft (int) -- Number of fft bins.
n_mels (int, optional) -- Number of mel bins. Defaults to 64.
f_min (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.
f_max (Optional[float], optional) -- Maximum frequency in Hz. Defaults to None.
htk (bool, optional) -- Use htk scaling. Defaults to False.
norm (Union[str, float], optional) -- Type of normalization. Defaults to 'slaney'.
dtype (str, optional) -- The data type of the return matrix. Defaults to 'float32'.
- Returns
Mel transform matrix with shape (n_mels, n_fft//2 + 1).
- Return type
Tensor
- paddleaudio.functional.functional.create_dct(n_mfcc: int, n_mels: int, norm: Optional[str] = 'ortho', dtype: str = 'float32') paddle.Tensor[source]¶
Create a discrete cosine transform(DCT) matrix.
- Parameters
- Returns
The DCT matrix with shape (n_mels, n_mfcc).
- Return type
Tensor
- paddleaudio.functional.functional.fft_frequencies(sr: int, n_fft: int, dtype: str = 'float32') paddle.Tensor[source]¶
Compute fourier frequencies.
- paddleaudio.functional.functional.hz_to_mel(freq: Union[paddle.Tensor, float], htk: bool = False) Union[paddle.Tensor, float][source]¶
Convert Hz to Mels.
- paddleaudio.functional.functional.mel_frequencies(n_mels: int = 64, f_min: float = 0.0, f_max: float = 11025.0, htk: bool = False, dtype: str = 'float32') paddle.Tensor[source]¶
Compute mel frequencies.
- Parameters
n_mels (int, optional) -- Number of mel bins. Defaults to 64.
f_min (float, optional) -- Minimum frequency in Hz. Defaults to 0.0.
fmax (float, optional) -- Maximum frequency in Hz. Defaults to 11025.0.
htk (bool, optional) -- Use htk scaling. Defaults to False.
dtype (str, optional) -- The data type of the return frequencies. Defaults to 'float32'.
- Returns
Tensor of n_mels frequencies in Hz with shape (n_mels,).
- Return type
Tensor
- paddleaudio.functional.functional.mel_to_hz(mel: Union[float, paddle.Tensor], htk: bool = False) Union[float, paddle.Tensor][source]¶
Convert mel bin numbers to frequencies.
- paddleaudio.functional.functional.power_to_db(spect: paddle.Tensor, ref_value: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None) paddle.Tensor[source]¶
Convert a power spectrogram (amplitude squared) to decibel (dB) units. The function computes the scaling 10 * log10(x / ref) in a numerically stable way.
- Parameters
spect (Tensor) -- STFT power spectrogram.
ref_value (float, optional) -- The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0.
amin (float, optional) -- Minimum threshold. Defaults to 1e-10.
top_db (Optional[float], optional) -- Threshold the output at top_db below the peak. Defaults to None.
- Returns
Power spectrogram in db scale.
- Return type
Tensor