torchaudio.datasets¶

所有数据集都是torch.utils.data.Dataset并有和方法实施。因此，它们都可以传递给__getitem____len__torch.utils.data.DataLoader它可以使用 worker 并行加载多个样本。例如：torch.multiprocessing

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)

CMUARCTIC¶

类（根： Union[str， pathlib.Path]， url： str = 'aew'， folder_in_archive： str = 'ARCTIC'， 下载： bool = False）[来源]torchaudio.datasets.CMUARCTIC¶

为 CMU_ARCTIC 创建 Dataset。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL 或要下载的数据集的类型。（默认值：）允许的类型值为、或。"aew""aew""ahw""aup""awb""axb""bdl""clb""eey""fem""gka""jmk""ksp""ljm""lnh""rms""rxr""slp""slt"
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："ARCTIC")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, utterance_id)
返回类型: （张量、int、str、str)

CMUDict¶

类（根： Union[str， pathlib.Path]， exclude_punctuations： bool = True， *， download： bool = False， url： str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b'， url_symbols： str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols'）[来源]torchaudio.datasets.CMUDict¶

为 CMU 发音词典（CMUDict）创建数据集。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
exclude_punctuations （bool， optional） – 启用后，排除标点符号的发音，例如！感叹号和 #HASH 标记。
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
url （str， optional） – 要从中下载词典的 URL。（默认："http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b")
url_symbols （str， optional） – 要从中下载元件列表的 URL。（默认："http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols")

__getitem__(n： int） → Tuple[str， List[str]][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引。
返回: 相应的单词和音素 .(word, [phonemes])
返回类型: (str， List[str])

财产 symbols¶

音素符号列表，例如 AA、AE、AH。

类型: 列表[str]

COMMONVOICE 公司¶

类（根： Union[str， pathlib.路径]， tsv： str = 'train.tsv'）[来源]torchaudio.datasets.COMMONVOICE¶

为 CommonVoice 创建数据集。

参数

root （str 或 Path） – 数据集所在目录的路径。（如果存在文件。tsv
tsv （str，可选） – 用于构建元数据的 tsv 文件的名称，例如、和。（默认："train.tsv""test.tsv""dev.tsv""invalidated.tsv""validated.tsv""other.tsv""train.tsv")

__getitem__(n： int） → Tuple[torch.张量、整数、Dict[str， str]][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, dictionary)，其中 dictionary 是从 TSV 文件构建的，其中包含以下键：、和。client_idpathsentenceup_votesdown_votesagegenderaccent
返回类型: （张量、整数、字典[str、str])

GTZAN 公司¶

类（根： Union[str， pathlib.Path]， url： str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz'， folder_in_archive： str = 'genres'， 下载： bool = False，子集：可选[str] = None）[来源]torchaudio.datasets.GTZAN¶

为 GTZAN 创建一个数据集。

注意

如果您打算使用 http://marsyas.info/downloads/datasets.html 此数据集发布结果。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL。（默认："http://opihi.cs.uvic.ca/sound/genres.tar.gz")
folder_in_archive （str， optional） – 数据集的顶级目录。
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
subset （str 或 None，可选） – 要使用的数据集子集。、或之一。如果，则使用整个数据集。（默认值：）。"training""validation""testing"NoneNoneNone

__getitem__(n： int） → Tuple[torch.Tensor， int， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, label)
返回类型: （张量、整数、str)

LIBRISPEECH¶

类（根： Union[str， pathlib.Path]， url： str = 'train-clean-100'， folder_in_archive： str = 'LibriSpeech'， 下载：布尔 = False）[来源]torchaudio.datasets.LIBRISPEECH¶

为 LibriSpeech 创建数据集。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL，或要下载的数据集的类型。允许的类型值为、、、、和。（默认："dev-clean""dev-other""test-clean""test-other""train-clean-100""train-clean-360""train-other-500""train-clean-100")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："LibriSpeech")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.张量、int、str、int、int、int][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)
返回类型: （张量、int、str、int、int、int)

利布里茨¶

类（根： Union[str， pathlib.Path]， url： str = 'train-clean-100'， folder_in_archive： str = 'LibriTTS'， 下载： bool = False）[来源]torchaudio.datasets.LIBRITTS¶

为 LibriTTS 创建一个数据集。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL，或要下载的数据集的类型。允许的类型值为、、、、和。（默认："dev-clean""dev-other""test-clean""test-other""train-clean-100""train-clean-360""train-other-500""train-clean-100")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："LibriTTS")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.张量、int、str、str、int、int、str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, original_text, normalized_text, speaker_id, chapter_id, utterance_id)
返回类型: （张量、int、str、str、str、int、int、str)

LJSPEECH¶

类（根： Union[str， pathlib.Path]， url： str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2'， folder_in_archive： str = 'wavs'， 下载： bool = False）[来源]torchaudio.datasets.LJSPEECH¶

为 LJSpeech-1.1 创建 Dataset。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL。（默认："https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："wavs")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, normalized_transcript)
返回类型: （张量、int、str、str)

语音命令¶

类（根： Union[str， pathlib.Path]， url： str = 'speech_commands_v0.02'， folder_in_archive： str = 'SpeechCommands'， 下载：布尔 = False，子集：可选[str] = None）[来源]torchaudio.datasets.SPEECHCOMMANDS¶

为 Speech 命令创建 Dataset。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL，或要下载的数据集的类型。允许的类型值为和（默认值："speech_commands_v0.01""speech_commands_v0.02""speech_commands_v0.02")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："SpeechCommands")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
subset （str 或 None，可选） – 选择数据集的子集 [None， “training”， “validation”， “testing”]。None 表示整个数据集。“validation” 和 “testing” 在 “validation_list.txt” 和分别是 “testing_list.txt”，其余的则是 “training”。文件的详细信息 “validation_list.txt”和“testing_list.txt”在数据集的 README 中进行了解释以及原始论文第 7 节及其参考文献 12 的引言。这原始论文可以在这里找到。（默认：None)

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str， int][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, label, speaker_id, utterance_number)
返回类型: （张量、 int、 str、 str、 int)

特德利姆¶

类（根： Union[str， pathlib.Path]， release： str = 'release1'， subset： Optional[str] = None， 下载： bool = False， audio_ext： str = '.sph'）[来源]torchaudio.datasets.TEDLIUM¶

为 Tedlium 创建 Dataset。它支持版本 1、2 和 3。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
release （str， optional） – 发行版。允许的值为，或。（默认值：）。"release1""release2""release3""release1"
subset （str， optional）（子集，可选） – 要使用的数据集子集。有效选项包括、、对于版本 1&2，对于版本 3。默认为或。"train""dev""test"None"train"None
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
audio_ext （str， optional） – 音频文件的扩展名（默认："audio_ext")

__getitem__(n： int） → Tuple[torch.张量、int、str、int、int、int][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, talk_id, speaker_id, identifier)
返回类型: 元

财产 phoneme_dict¶

音素。从单词映射到音素元组。请注意，某些单词的音素为空。

类型: dict[str， tuple[str]]

VCTK¶

类（根： Union[str， pathlib.Path]， url： str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip'， folder_in_archive： str = 'VCTK-Corpus'， 下载：bool = False，下采样：bool = False）[来源]torchaudio.datasets.VCTK¶

创建 VCTK 数据集。

注意

此数据集不再公开可用。请使用VCTK_092
directory 被忽略，因为没有相应的文本文件。有关数据集的更多信息，请访问：https://datashare.is.ed.ac.uk/handle/10283/3443p315

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 未使用，因为数据集不再公开可用。
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："VCTK-Corpus")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。给予将导致错误，因为数据集不再是公开可用。Falsedownload=True
downsample （bool， optional）（未使用）。

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, utterance, speaker_id, utterance_id)
返回类型: 元

VCTK_092¶

class （root： str， mic_id： str = 'mic2'， 下载： bool = False， 网址： str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip'， audio_ext='.flac'）[来源]torchaudio.datasets.VCTK_092¶

创建 VCTK 0.92 数据集

参数

root （str） – 找到数据集的顶级目录的根目录。
mic_id （str， optional） – 麦克风 ID。（默认："mic1""mic2""mic2")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
url （str， optional） – 要从中下载数据集的 URL。（默认："https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")
audio_ext （str， optional） – 如果数据集转换为非默认音频格式，则为自定义音频扩展。

注意

由于缺少相应的文本文件，演讲者的所有演讲都将被跳过。p315
由于缺少音频文件，所有语音都将被跳过。p280mic_id="mic2"
由于缺少音频文件，演讲者的一些演讲将被跳过。p362
另请参见：https://datashare.is.ed.ac.uk/handle/10283/3443

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, speaker_id, utterance_id)
返回类型: （张量、int、str、str、str)

是的没有¶

类（根： Union[str， pathlib.Path]， url： str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz'， folder_in_archive： str = 'waves_yesno'， 下载： bool = False）[来源]torchaudio.datasets.YESNO¶

为 YesNo 创建 Dataset。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL。（默认："http://www.openslr.org/resources/1/waves_yesno.tar.gz")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："waves_yesno")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.张量、整数、列表[int]][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, labels)
返回类型: （张量、 int、 List[int])

torchaudio.datasets¶

CMUARCTIC¶

CMUDict¶

COMMONVOICE 公司¶

GTZAN 公司¶

LIBRISPEECH¶

利布里茨¶

LJSPEECH¶

语音命令¶

特德利姆¶

VCTK¶

VCTK_092¶

是的没有¶

文档

教程

资源