torchaudio.datasets¶
All datasets are subclasses of torch.utils.data.Dataset
i.e, they have __getitem__ and __len__ methods implemented.
Hence, they can all be passed to a torch.utils.data.DataLoader
which can load multiple samples parallelly using torch.multiprocessing workers.
For example:
yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
batch_size=1,
shuffle=True,
num_workers=args.nThreads)
The following datasets are available:
Datasets
All the datasets have almost similar API. They all have two common arguments:
transform and target_transform to transform the input and target respectively.
CMUARCTIC¶
-
class
torchaudio.datasets.CMUARCTIC(root: str, url: str = 'aew', folder_in_archive: str = 'ARCTIC', download: bool = False)[source]¶ Create a Dataset for CMU_ARCTIC.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from or the type of the dataset to dowload. (default:
"aew") Allowed type values are"aew","ahw","aup","awb","axb","bdl","clb","eey","fem","gka","jmk","ksp","ljm","lnh","rms","rxr","slp"or"slt".folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"ARCTIC")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).
COMMONVOICE¶
-
class
torchaudio.datasets.COMMONVOICE(root: str, tsv: str = 'train.tsv', url: str = 'english', folder_in_archive: str = 'CommonVoice', version: str = 'cv-corpus-4-2019-12-10', download: bool = False)[source]¶ Create a Dataset for CommonVoice.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
tsv (str, optional) – The name of the tsv file used to construct the metadata. (default:
"train.tsv")url (str, optional) – The URL to download the dataset from, or the language of the dataset to download. (default:
"english"). Allowed language values are"tatar","english","german","french","welsh","breton","chuvash","turkish","kyrgyz","irish","kabyle","catalan","taiwanese","slovenian","italian","dutch","hakha chin","esperanto","estonian","persian","portuguese","basque","spanish","chinese","mongolian","sakha","dhivehi","kinyarwanda","swedish","russian","indonesian","arabic","tamil","interlingua","latvian","japanese","votic","abkhaz","cantonese"and"romansh sursilvan".folder_in_archive (str, optional) – The top-level directory of the dataset.
version (str) – Version string. (default:
"cv-corpus-4-2019-12-10") For the other allowed values, Please checkout https://commonvoice.mozilla.org/en/datasets.download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).
GTZAN¶
-
class
torchaudio.datasets.GTZAN(root: str, url: str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz', folder_in_archive: str = 'genres', download: bool = False, subset: Optional[str] = None)[source]¶ Create a Dataset for GTZAN.
Note
Please see http://marsyas.info/downloads/datasets.html if you are planning to use this dataset to publish results.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from. (default:
"http://opihi.cs.uvic.ca/sound/genres.tar.gz")folder_in_archive (str, optional) – The top-level directory of the dataset.
download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).subset (str, optional) – Which subset of the dataset to use. One of
"training","validation","testing"orNone. IfNone, the entire dataset is used. (default:None).
LIBRISPEECH¶
-
class
torchaudio.datasets.LIBRISPEECH(root: str, url: str = 'train-clean-100', folder_in_archive: str = 'LibriSpeech', download: bool = False)[source]¶ Create a Dataset for LibriSpeech.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are
"dev-clean","dev-other","test-clean","test-other","train-clean-100","train-clean-360"and"train-other-500". (default:"train-clean-100")folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"LibriSpeech")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).
LIBRITTS¶
-
class
torchaudio.datasets.LIBRITTS(root: str, url: str = 'train-clean-100', folder_in_archive: str = 'LibriTTS', download: bool = False)[source]¶ Create a Dataset for LibriTTS.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are
"dev-clean","dev-other","test-clean","test-other","train-clean-100","train-clean-360"and"train-other-500". (default:"train-clean-100")folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"LibriTTS")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).
LJSPEECH¶
-
class
torchaudio.datasets.LJSPEECH(root: str, url: str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2', folder_in_archive: str = 'wavs', download: bool = False)[source]¶ Create a Dataset for LJSpeech-1.1.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from. (default:
"https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2")folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"wavs")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).
SPEECHCOMMANDS¶
-
class
torchaudio.datasets.SPEECHCOMMANDS(root: str, url: str = 'speech_commands_v0.02', folder_in_archive: str = 'SpeechCommands', download: bool = False)[source]¶ Create a Dataset for Speech Commands.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are
"speech_commands_v0.01"and"speech_commands_v0.02"(default:"speech_commands_v0.02")folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"SpeechCommands")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).
TEDLIUM¶
-
class
torchaudio.datasets.TEDLIUM(root: str, release: str = 'release1', subset: str = None, download: bool = False, audio_ext='.sph')[source]¶ Create a Dataset for Tedlium. It supports releases 1,2 and 3.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
release (str, optional) – Release version. Allowed values are
"release1","release2"or"release3". (default:"release1").subset (str, optional) – The subset of dataset to use. Valid options are
"train","dev", and"test"for releases 1&2,Nonefor release3. Defaults to"train"orNone.download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).
VCTK¶
-
class
torchaudio.datasets.VCTK(root: str, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', folder_in_archive: str = 'VCTK-Corpus', download: bool = False, downsample: bool = False, transform: Any = None, target_transform: Any = None)[source]¶ Create a Dataset for VCTK.
Note
This dataset is no longer publicly available. Please use
VCTK_092Directory
p315is ignored because there is no corresponding text files. For more information about the dataset visit: https://datashare.is.ed.ac.uk/handle/10283/3443
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – Not used as the dataset is no longer publicly available.
folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"VCTK-Corpus")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False). Givingdownload=Truewill result in error as the dataset is no longer publicly available.downsample (bool, optional) – Not used.
transform (callable, optional) – Optional transform applied on waveform. (default:
None)target_transform (callable, optional) – Optional transform applied on utterance. (default:
None)
VCTK_092¶
-
class
torchaudio.datasets.VCTK_092(root: str, mic_id: str = 'mic2', download: bool = False, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', audio_ext='.flac')[source]¶ Create VCTK 0.92 Dataset
- Parameters
root (str) – Root directory where the dataset’s top level directory is found.
mic_id (str) – Microphone ID. Either
"mic1"or"mic2". (default:"mic2")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).url (str, optional) – The URL to download the dataset from. (default:
"https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")audio_ext (str, optional) – Custom audio extension if dataset is converted to non-default audio format.
Note
All the speeches from speaker
p315will be skipped due to the lack of the corresponding text files.All the speeches from
p280will be skipped formic_id="mic2"due to the lack of the audio files.Some of the speeches from speaker
p362will be skipped due to the lack of the audio files.
YESNO¶
-
class
torchaudio.datasets.YESNO(root: str, url: str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz', folder_in_archive: str = 'waves_yesno', download: bool = False, transform: Any = None, target_transform: Any = None)[source]¶ Create a Dataset for YesNo.
- Parameters
root (str) – Path to the directory where the dataset is found or downloaded.
url (str, optional) – The URL to download the dataset from. (default:
"http://www.openslr.org/resources/1/waves_yesno.tar.gz")folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"waves_yesno")download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False).transform (callable, optional) – Optional transform applied on waveform. (default:
None)target_transform (callable, optional) – Optional transform applied on utterance. (default:
None)