目录

torchaudio.models

The models subpackage contains definitions of models for addressing common audio tasks.

ConvTasNet

class torchaudio.models.ConvTasNet(num_sources: int = 2, enc_kernel_size: int = 16, enc_num_feats: int = 512, msk_kernel_size: int = 3, msk_num_feats: int = 128, msk_num_hidden_feats: int = 512, msk_num_layers: int = 8, msk_num_stacks: int = 3)[source]

Conv-TasNet: a fully-convolutional time-domain audio separation network

Parameters
  • num_sources (int) – The number of sources to split.

  • enc_kernel_size (int) – The convolution kernel size of the encoder/decoder, <L>.

  • enc_num_feats (int) – The feature dimensions passed to mask generator, <N>.

  • msk_kernel_size (int) – The convolution kernel size of the mask generator, <P>.

  • msk_num_feats (int) – The input/output feature dimension of conv block in the mask generator, <B, Sc>.

  • msk_num_hidden_feats (int) – The internal feature dimension of conv block of the mask generator, <H>.

  • msk_num_layers (int) – The number of layers in one conv block of the mask generator, <X>.

  • msk_num_stacks (int) – The numbr of conv blocks of the mask generator, <R>.

Note

This implementation corresponds to the “non-causal” setting in the paper.

Reference:
forward(input: torch.Tensor) → torch.Tensor[source]

Perform source separation. Generate audio source waveforms.

Parameters

input (torch.Tensor) – 3D Tensor with shape [batch, channel==1, frames]

Returns

3D Tensor with shape [batch, channel==num_sources, frames]

Return type

torch.Tensor

Wav2Letter

class torchaudio.models.Wav2Letter(num_classes: int = 40, input_type: str = 'waveform', num_features: int = 1)[source]

Wav2Letter model architecture from the Wav2Letter an End-to-End ConvNet-based Speech Recognition System.

\(\text{padding} = \frac{\text{ceil}(\text{kernel} - \text{stride})}{2}\)

Parameters
  • num_classes (int, optional) – Number of classes to be classified. (Default: 40)

  • input_type (str, optional) – Wav2Letter can use as input: waveform, power_spectrum or mfcc (Default: waveform).

  • num_features (int, optional) – Number of input features that the network will receive (Default: 1).

forward(x: torch.Tensor) → torch.Tensor[source]
Parameters

x (torch.Tensor) – Tensor of dimension (batch_size, num_features, input_length).

Returns

Predictor tensor of dimension (batch_size, number_of_classes, input_length).

Return type

Tensor

WaveRNN

class torchaudio.models.WaveRNN(upsample_scales: List[int], n_classes: int, hop_length: int, n_res_block: int = 10, n_rnn: int = 512, n_fc: int = 512, kernel_size: int = 5, n_freq: int = 128, n_hidden: int = 128, n_output: int = 128)[source]

WaveRNN model based on the implementation from fatchord.

The original implementation was introduced in “Efficient Neural Audio Synthesis”. The input channels of waveform and spectrogram have to be 1. The product of upsample_scales must equal hop_length.

Parameters
  • upsample_scales – the list of upsample scales.

  • n_classes – the number of output classes.

  • hop_length – the number of samples between the starts of consecutive frames.

  • n_res_block – the number of ResBlock in stack. (Default: 10)

  • n_rnn – the dimension of RNN layer. (Default: 512)

  • n_fc – the dimension of fully connected layer. (Default: 512)

  • kernel_size – the number of kernel size in the first Conv1d layer. (Default: 5)

  • n_freq – the number of bins in a spectrogram. (Default: 128)

  • n_hidden – the number of hidden dimensions of resblock. (Default: 128)

  • n_output – the number of output dimensions of melresnet. (Default: 128)

Example
>>> wavernn = WaveRNN(upsample_scales=[5,5,8], n_classes=512, hop_length=200)
>>> waveform, sample_rate = torchaudio.load(file)
>>> # waveform shape: (n_batch, n_channel, (n_time - kernel_size + 1) * hop_length)
>>> specgram = MelSpectrogram(sample_rate)(waveform)  # shape: (n_batch, n_channel, n_freq, n_time)
>>> output = wavernn(waveform, specgram)
>>> # output shape: (n_batch, n_channel, (n_time - kernel_size + 1) * hop_length, n_classes)
forward(waveform: torch.Tensor, specgram: torch.Tensor) → torch.Tensor[source]

Pass the input through the WaveRNN model.

Parameters
  • waveform – the input waveform to the WaveRNN layer (n_batch, 1, (n_time - kernel_size + 1) * hop_length)

  • specgram – the input spectrogram to the WaveRNN layer (n_batch, 1, n_freq, n_time)

Returns

(n_batch, 1, (n_time - kernel_size + 1) * hop_length, n_classes)

Return type

Tensor shape

文档

访问 PyTorch 的全面开发人员文档

查看文档

教程

获取面向初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并解答您的问题

查看资源