paddleaudio.compliance.kaldi

paddleaudio.compliance.kaldi.fbank(waveform: paddle.Tensor, blackman_coeff: float = 0.42, channel: int = - 1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, high_freq: float = 0.0, htk_compat: bool = False, low_freq: float = 20.0, n_mels: int = 23, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, use_energy: bool = False, use_log_fbank: bool = True, use_power: bool = True, vtln_high: float = - 500.0, vtln_low: float = 100.0, vtln_warp: float = 1.0, window_type: str = 'povey') paddle.Tensor[source]

Compute and return filter banks from a waveform. The output is identical to Kaldi's.

Parameters
  • waveform (Tensor) -- A waveform tensor with shape (C, T).

  • blackman_coeff (float, optional) -- Coefficient for Blackman window.. Defaults to 0.42.

  • channel (int, optional) -- Select the channel of waveform. Defaults to -1.

  • dither (float, optional) -- Dithering constant . Defaults to 0.0.

  • energy_floor (float, optional) -- Floor on energy of the output Spectrogram. Defaults to 1.0.

  • frame_length (float, optional) -- Frame length in milliseconds. Defaults to 25.0.

  • frame_shift (float, optional) -- Shift between adjacent frames in milliseconds. Defaults to 10.0.

  • high_freq (float, optional) -- The upper cut-off frequency. Defaults to 0.0.

  • htk_compat (bool, optional) -- Put energy to the last when it is set True. Defaults to False.

  • low_freq (float, optional) -- The lower cut-off frequency. Defaults to 20.0.

  • n_mels (int, optional) -- Number of output mel bins. Defaults to 23.

  • preemphasis_coefficient (float, optional) -- Preemphasis coefficient for input waveform. Defaults to 0.97.

  • raw_energy (bool, optional) -- Whether to compute before preemphasis and windowing. Defaults to True.

  • remove_dc_offset (bool, optional) -- Whether to subtract mean from waveform on frames. Defaults to True.

  • round_to_power_of_two (bool, optional) -- If True, round window size to power of two by zero-padding input to FFT. Defaults to True.

  • sr (int, optional) -- Sample rate of input waveform. Defaults to 16000.

  • snip_edges (bool, optional) -- Drop samples in the end of waveform that cann't fit a singal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.

  • subtract_mean (bool, optional) -- Whether to subtract mean of feature files. Defaults to False.

  • use_energy (bool, optional) -- Add an dimension with energy of spectrogram to the output. Defaults to False.

  • use_log_fbank (bool, optional) -- Return log fbank when it is set True. Defaults to True.

  • use_power (bool, optional) -- Whether to use power instead of magnitude. Defaults to True.

  • vtln_high (float, optional) -- High inflection point in piecewise linear VTLN warping function. Defaults to -500.0.

  • vtln_low (float, optional) -- Low inflection point in piecewise linear VTLN warping function. Defaults to 100.0.

  • vtln_warp (float, optional) -- Vtln warp factor. Defaults to 1.0.

  • window_type (str, optional) -- Choose type of window for FFT computation. Defaults to POVEY.

Returns

A filter banks tensor with shape (m, n_mels).

Return type

Tensor

paddleaudio.compliance.kaldi.mfcc(waveform: paddle.Tensor, blackman_coeff: float = 0.42, cepstral_lifter: float = 22.0, channel: int = - 1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, high_freq: float = 0.0, htk_compat: bool = False, low_freq: float = 20.0, n_mfcc: int = 13, n_mels: int = 23, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, use_energy: bool = False, vtln_high: float = - 500.0, vtln_low: float = 100.0, vtln_warp: float = 1.0, window_type: str = 'povey') paddle.Tensor[source]
Compute and return mel frequency cepstral coefficients from a waveform. The output is

identical to Kaldi's.

Parameters
  • waveform (Tensor) -- A waveform tensor with shape (C, T).

  • blackman_coeff (float, optional) -- Coefficient for Blackman window.. Defaults to 0.42.

  • cepstral_lifter (float, optional) -- Scaling of output mfccs. Defaults to 22.0.

  • channel (int, optional) -- Select the channel of waveform. Defaults to -1.

  • dither (float, optional) -- Dithering constant . Defaults to 0.0.

  • energy_floor (float, optional) -- Floor on energy of the output Spectrogram. Defaults to 1.0.

  • frame_length (float, optional) -- Frame length in milliseconds. Defaults to 25.0.

  • frame_shift (float, optional) -- Shift between adjacent frames in milliseconds. Defaults to 10.0.

  • high_freq (float, optional) -- The upper cut-off frequency. Defaults to 0.0.

  • htk_compat (bool, optional) -- Put energy to the last when it is set True. Defaults to False.

  • low_freq (float, optional) -- The lower cut-off frequency. Defaults to 20.0.

  • n_mfcc (int, optional) -- Number of cepstra in MFCC. Defaults to 13.

  • n_mels (int, optional) -- Number of output mel bins. Defaults to 23.

  • preemphasis_coefficient (float, optional) -- Preemphasis coefficient for input waveform. Defaults to 0.97.

  • raw_energy (bool, optional) -- Whether to compute before preemphasis and windowing. Defaults to True.

  • remove_dc_offset (bool, optional) -- Whether to subtract mean from waveform on frames. Defaults to True.

  • round_to_power_of_two (bool, optional) -- If True, round window size to power of two by zero-padding input to FFT. Defaults to True.

  • sr (int, optional) -- Sample rate of input waveform. Defaults to 16000.

  • snip_edges (bool, optional) -- Drop samples in the end of waveform that cann't fit a singal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.

  • subtract_mean (bool, optional) -- Whether to subtract mean of feature files. Defaults to False.

  • use_energy (bool, optional) -- Add an dimension with energy of spectrogram to the output. Defaults to False.

  • vtln_high (float, optional) -- High inflection point in piecewise linear VTLN warping function. Defaults to -500.0.

  • vtln_low (float, optional) -- Low inflection point in piecewise linear VTLN warping function. Defaults to 100.0.

  • vtln_warp (float, optional) -- Vtln warp factor. Defaults to 1.0.

  • window_type (str, optional) -- Choose type of window for FFT computation. Defaults to POVEY.

Returns

A mel frequency cepstral coefficients tensor with shape (m, n_mfcc).

Return type

Tensor

paddleaudio.compliance.kaldi.spectrogram(waveform: paddle.Tensor, blackman_coeff: float = 0.42, channel: int = - 1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, window_type: str = 'povey') paddle.Tensor[source]

Compute and return a spectrogram from a waveform. The output is identical to Kaldi's.

Parameters
  • waveform (Tensor) -- A waveform tensor with shape (C, T).

  • blackman_coeff (float, optional) -- Coefficient for Blackman window.. Defaults to 0.42.

  • channel (int, optional) -- Select the channel of waveform. Defaults to -1.

  • dither (float, optional) -- Dithering constant . Defaults to 0.0.

  • energy_floor (float, optional) -- Floor on energy of the output Spectrogram. Defaults to 1.0.

  • frame_length (float, optional) -- Frame length in milliseconds. Defaults to 25.0.

  • frame_shift (float, optional) -- Shift between adjacent frames in milliseconds. Defaults to 10.0.

  • preemphasis_coefficient (float, optional) -- Preemphasis coefficient for input waveform. Defaults to 0.97.

  • raw_energy (bool, optional) -- Whether to compute before preemphasis and windowing. Defaults to True.

  • remove_dc_offset (bool, optional) -- Whether to subtract mean from waveform on frames. Defaults to True.

  • round_to_power_of_two (bool, optional) -- If True, round window size to power of two by zero-padding input to FFT. Defaults to True.

  • sr (int, optional) -- Sample rate of input waveform. Defaults to 16000.

  • snip_edges (bool, optional) -- Drop samples in the end of waveform that cann't fit a singal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.

  • subtract_mean (bool, optional) -- Whether to subtract mean of feature files. Defaults to False.

  • window_type (str, optional) -- Choose type of window for FFT computation. Defaults to POVEY.

Returns

A spectrogram tensor with shape (m, padded_window_size // 2 + 1) where m is the number of frames

depends on frame_length and frame_shift.

Return type

Tensor