目录

torchaudio.models

The torchaudio.models subpackage contains definitions of models for addressing common audio tasks.

Note

For models with pre-trained parameters, please refer to torchaudio.pipelines module.

Model defintions are responsible for constructing computation graphs and executing them.

Some models have complex structure and variations. For such models, factory functions are provided.

Conformer

Conformer architecture introduced in Conformer: Convolution-augmented Transformer for Speech Recognition [Gulati et al., 2020].

ConvTasNet

Conv-TasNet architecture introduced in Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation [Luo and Mesgarani, 2019].

DeepSpeech

DeepSpeech architecture introduced in Deep Speech: Scaling up end-to-end speech recognition [Hannun et al., 2014].

Emformer

Emformer architecture introduced in Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition [Shi et al., 2021].

HDemucs

Hybrid Demucs model from Hybrid Spectrogram and Waveform Source Separation [Défossez, 2021].

HuBERTPretrainModel

HuBERT model used for pretraining in HuBERT [Hsu et al., 2021].

RNNT

Recurrent neural network transducer (RNN-T) model.

RNNTBeamSearch

Beam search decoder for RNN-T model.

SquimObjective

Speech Quality and Intelligibility Measures (SQUIM) model that predicts objective metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR).

SquimSubjective

Speech Quality and Intelligibility Measures (SQUIM) model that predicts subjective metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)).

Tacotron2

Tacotron2 model from Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions [Shen et al., 2018] based on the implementation from Nvidia Deep Learning Examples.

Wav2Letter

Wav2Letter model architecture from Wav2Letter: an End-to-End ConvNet-based Speech Recognition System [Collobert et al., 2016].

Wav2Vec2Model

Acoustic model used in wav2vec 2.0 [Baevski et al., 2020].

WaveRNN

WaveRNN model from Efficient Neural Audio Synthesis [Kalchbrenner et al., 2018] based on the implementation from fatchord/WaveRNN.

文档

访问 PyTorch 的全面开发人员文档

查看文档

教程

获取面向初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并解答您的问题

查看资源