paddleaudio.compliance.kaldi¶
- paddleaudio.compliance.kaldi.fbank(waveform: paddle.Tensor, blackman_coeff: float = 0.42, channel: int = - 1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, high_freq: float = 0.0, htk_compat: bool = False, low_freq: float = 20.0, n_mels: int = 23, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, use_energy: bool = False, use_log_fbank: bool = True, use_power: bool = True, vtln_high: float = - 500.0, vtln_low: float = 100.0, vtln_warp: float = 1.0, window_type: str = 'povey') paddle.Tensor[source]¶
Compute and return filter banks from a waveform. The output is identical to Kaldi's.
- Parameters
waveform (Tensor) -- A waveform tensor with shape (C, T).
blackman_coeff (float, optional) -- Coefficient for Blackman window.. Defaults to 0.42.
channel (int, optional) -- Select the channel of waveform. Defaults to -1.
dither (float, optional) -- Dithering constant . Defaults to 0.0.
energy_floor (float, optional) -- Floor on energy of the output Spectrogram. Defaults to 1.0.
frame_length (float, optional) -- Frame length in milliseconds. Defaults to 25.0.
frame_shift (float, optional) -- Shift between adjacent frames in milliseconds. Defaults to 10.0.
high_freq (float, optional) -- The upper cut-off frequency. Defaults to 0.0.
htk_compat (bool, optional) -- Put energy to the last when it is set True. Defaults to False.
low_freq (float, optional) -- The lower cut-off frequency. Defaults to 20.0.
n_mels (int, optional) -- Number of output mel bins. Defaults to 23.
preemphasis_coefficient (float, optional) -- Preemphasis coefficient for input waveform. Defaults to 0.97.
raw_energy (bool, optional) -- Whether to compute before preemphasis and windowing. Defaults to True.
remove_dc_offset (bool, optional) -- Whether to subtract mean from waveform on frames. Defaults to True.
round_to_power_of_two (bool, optional) -- If True, round window size to power of two by zero-padding input to FFT. Defaults to True.
sr (int, optional) -- Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional) -- Drop samples in the end of waveform that cann't fit a singal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional) -- Whether to subtract mean of feature files. Defaults to False.
use_energy (bool, optional) -- Add an dimension with energy of spectrogram to the output. Defaults to False.
use_log_fbank (bool, optional) -- Return log fbank when it is set True. Defaults to True.
use_power (bool, optional) -- Whether to use power instead of magnitude. Defaults to True.
vtln_high (float, optional) -- High inflection point in piecewise linear VTLN warping function. Defaults to -500.0.
vtln_low (float, optional) -- Low inflection point in piecewise linear VTLN warping function. Defaults to 100.0.
vtln_warp (float, optional) -- Vtln warp factor. Defaults to 1.0.
window_type (str, optional) -- Choose type of window for FFT computation. Defaults to POVEY.
- Returns
A filter banks tensor with shape (m, n_mels).
- Return type
Tensor
- paddleaudio.compliance.kaldi.mfcc(waveform: paddle.Tensor, blackman_coeff: float = 0.42, cepstral_lifter: float = 22.0, channel: int = - 1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, high_freq: float = 0.0, htk_compat: bool = False, low_freq: float = 20.0, n_mfcc: int = 13, n_mels: int = 23, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, use_energy: bool = False, vtln_high: float = - 500.0, vtln_low: float = 100.0, vtln_warp: float = 1.0, window_type: str = 'povey') paddle.Tensor[source]¶
- Compute and return mel frequency cepstral coefficients from a waveform. The output is
identical to Kaldi's.
- Parameters
waveform (Tensor) -- A waveform tensor with shape (C, T).
blackman_coeff (float, optional) -- Coefficient for Blackman window.. Defaults to 0.42.
cepstral_lifter (float, optional) -- Scaling of output mfccs. Defaults to 22.0.
channel (int, optional) -- Select the channel of waveform. Defaults to -1.
dither (float, optional) -- Dithering constant . Defaults to 0.0.
energy_floor (float, optional) -- Floor on energy of the output Spectrogram. Defaults to 1.0.
frame_length (float, optional) -- Frame length in milliseconds. Defaults to 25.0.
frame_shift (float, optional) -- Shift between adjacent frames in milliseconds. Defaults to 10.0.
high_freq (float, optional) -- The upper cut-off frequency. Defaults to 0.0.
htk_compat (bool, optional) -- Put energy to the last when it is set True. Defaults to False.
low_freq (float, optional) -- The lower cut-off frequency. Defaults to 20.0.
n_mfcc (int, optional) -- Number of cepstra in MFCC. Defaults to 13.
n_mels (int, optional) -- Number of output mel bins. Defaults to 23.
preemphasis_coefficient (float, optional) -- Preemphasis coefficient for input waveform. Defaults to 0.97.
raw_energy (bool, optional) -- Whether to compute before preemphasis and windowing. Defaults to True.
remove_dc_offset (bool, optional) -- Whether to subtract mean from waveform on frames. Defaults to True.
round_to_power_of_two (bool, optional) -- If True, round window size to power of two by zero-padding input to FFT. Defaults to True.
sr (int, optional) -- Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional) -- Drop samples in the end of waveform that cann't fit a singal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional) -- Whether to subtract mean of feature files. Defaults to False.
use_energy (bool, optional) -- Add an dimension with energy of spectrogram to the output. Defaults to False.
vtln_high (float, optional) -- High inflection point in piecewise linear VTLN warping function. Defaults to -500.0.
vtln_low (float, optional) -- Low inflection point in piecewise linear VTLN warping function. Defaults to 100.0.
vtln_warp (float, optional) -- Vtln warp factor. Defaults to 1.0.
window_type (str, optional) -- Choose type of window for FFT computation. Defaults to POVEY.
- Returns
A mel frequency cepstral coefficients tensor with shape (m, n_mfcc).
- Return type
Tensor
- paddleaudio.compliance.kaldi.spectrogram(waveform: paddle.Tensor, blackman_coeff: float = 0.42, channel: int = - 1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, window_type: str = 'povey') paddle.Tensor[source]¶
Compute and return a spectrogram from a waveform. The output is identical to Kaldi's.
- Parameters
waveform (Tensor) -- A waveform tensor with shape (C, T).
blackman_coeff (float, optional) -- Coefficient for Blackman window.. Defaults to 0.42.
channel (int, optional) -- Select the channel of waveform. Defaults to -1.
dither (float, optional) -- Dithering constant . Defaults to 0.0.
energy_floor (float, optional) -- Floor on energy of the output Spectrogram. Defaults to 1.0.
frame_length (float, optional) -- Frame length in milliseconds. Defaults to 25.0.
frame_shift (float, optional) -- Shift between adjacent frames in milliseconds. Defaults to 10.0.
preemphasis_coefficient (float, optional) -- Preemphasis coefficient for input waveform. Defaults to 0.97.
raw_energy (bool, optional) -- Whether to compute before preemphasis and windowing. Defaults to True.
remove_dc_offset (bool, optional) -- Whether to subtract mean from waveform on frames. Defaults to True.
round_to_power_of_two (bool, optional) -- If True, round window size to power of two by zero-padding input to FFT. Defaults to True.
sr (int, optional) -- Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional) -- Drop samples in the end of waveform that cann't fit a singal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional) -- Whether to subtract mean of feature files. Defaults to False.
window_type (str, optional) -- Choose type of window for FFT computation. Defaults to POVEY.
- Returns
- A spectrogram tensor with shape (m, padded_window_size // 2 + 1) where m is the number of frames
depends on frame_length and frame_shift.
- Return type
Tensor