torchaudio.sox_effects¶

警告

这SoxEffect和SoxEffectsChain类已弃用。请迁移到apply_effects_tensor()和apply_effects_file().

资源初始化 / 关闭¶

torchaudio.sox_effects.init_sox_effects()[来源]¶

初始化使用 sox 效果所需的资源。

注意

您无需手动调用此函数。它会自动调用。

初始化后，您无需在 SOX 效果，但只要只要shutdown_sox_effects()尚未调用。一次shutdown_sox_effects()调用时，您将无法再使用 SoX 效果和初始化将导致错误。

torchaudio.sox_effects.shutdown_sox_effects()[来源]¶

清理使用 SOX 效果所需的资源。

注意

您无需手动调用此函数。它会自动调用。

多次调用此函数是安全的。一次shutdown_sox_effects()调用时，您将无法再使用 SoX 效果，而再次初始化将导致错误。

列出支持的效果¶

torchaudio.sox_effects.effect_names()→ List[str][来源]¶

获取有效 sox 效果名称的列表

返回: 可用效果器名称的列表。
返回类型: 列表[str]

例

>>> torchaudio.sox_effects.effect_names()
['allpass', 'band', 'bandpass', ... ]

应用效果¶

在Torch上应用 SoX 效果链。Tensor 或 on file 并加载为 torch。张肌。

对 Tensor 应用效果¶

torchaudio.sox_effects.apply_effects_tensor(张量：Torch。张量、sample_rate：int、效果：List[List[str]]、channels_first：bool = True） → Tuple[torch.Tensor， int][来源]¶

将 sox 效果应用于给定的 Tensor

注意

此功能的工作方式与 command 非常相似，但有轻微的差异。例如，commnad 会自动添加某些效果（例如 effect after and and other effects），但此函数会仅应用给定的效果。（因此，要实际应用效果，您还需要需要以所需的采样率生效。soxsoxratespeedpitchspeedrate

参数

张量（Torch.Tensor） - 输入 2D 张量。
sample_rate （int） – 采样率
effects （List[List[str]]） - 效果列表。
channels_first （bool） – 指示输入 Tensor 的维度是或[channels, time][time, channels]

返回

生成的 Tensor 和 sample rate。生成的 Tensor 与输入 Tensor 相同，并且相同的频道顺序。Tensor 的形状可以根据应用的效果。采样率也可能根据应用的效果而有所不同。dtype

返回类型

元组[torch.张量、int]

示例 - 基本用法

>>>
>>> # Defines the effects to apply
>>> effects = [
...     ['gain', '-n'],  # normalises to 0dB
...     ['pitch', '5'],  # 5 cent pitch shift
...     ['rate', '8000'],  # resample to 8000 Hz
... ]
>>>
>>> # Generate pseudo wave:
>>> # normalized, channels first, 2ch, sampling rate 16000, 1 second
>>> sample_rate = 16000
>>> waveform = 2 * torch.rand([2, sample_rate * 1]) - 1
>>> waveform.shape
torch.Size([2, 16000])
>>> waveform
tensor([[ 0.3138,  0.7620, -0.9019,  ..., -0.7495, -0.4935,  0.5442],
        [-0.0832,  0.0061,  0.8233,  ..., -0.5176, -0.9140, -0.2434]])
>>>
>>> # Apply effects
>>> waveform, sample_rate = apply_effects_tensor(
...     wave_form, sample_rate, effects, channels_first=True)
>>>
>>> # Check the result
>>> # The new waveform is sampling rate 8000, 1 second.
>>> # normalization and channel order are preserved
>>> waveform.shape
torch.Size([2, 8000])
>>> waveform
tensor([[ 0.5054, -0.5518, -0.4800,  ..., -0.0076,  0.0096, -0.0110],
        [ 0.1331,  0.0436, -0.3783,  ..., -0.0035,  0.0012,  0.0008]])
>>> sample_rate
8000

示例 - 支持 Torchscript 的转换

>>>
>>> # Use `apply_effects_tensor` in `torch.nn.Module` and dump it to file,
>>> # then run sox effect via Torchscript runtime.
>>>
>>> class SoxEffectTransform(torch.nn.Module):
...     effects: List[List[str]]
...
...     def __init__(self, effects: List[List[str]]):
...         super().__init__()
...         self.effects = effects
...
...     def forward(self, tensor: torch.Tensor, sample_rate: int):
...         return sox_effects.apply_effects_tensor(
...             tensor, sample_rate, self.effects)
...
...
>>> # Create transform object
>>> effects = [
...     ["lowpass", "-1", "300"],  # apply single-pole lowpass filter
...     ["rate", "8000"],  # change sample rate to 8000
... ]
>>> transform = SoxEffectTensorTransform(effects, input_sample_rate)
>>>
>>> # Dump it to file and load
>>> path = 'sox_effect.zip'
>>> torch.jit.script(trans).save(path)
>>> transform = torch.jit.load(path)
>>>
>>>> # Run transform
>>> waveform, input_sample_rate = torchaudio.load("input.wav")
>>> waveform, sample_rate = transform(waveform, input_sample_rate)
>>> assert sample_rate == 8000

对文件应用效果¶

torchaudio.sox_effects.apply_effects_file(路径：str，效果：List[List[str]]，归一化：bool = True，channels_first：bool = True）→ Tuple[torch.Tensor， int][来源]¶

将 sox 效果器应用于音频文件，并将结果数据加载为 Tensor

注意

此功能的工作方式与 command 非常相似，但有轻微的差异。例如，commnad 会自动添加某些 effect （例如 effect after ，等），但此函数仅应用给定的影响。因此，要实际应用效果，您还需要以所需的采样率提供效果，因为在内部，效果只会改变采样 rate 并保持样品不变。soxsoxratespeedpitchspeedratespeed

参数

path （str） – 音频文件的路径。
effects （List[List[str]]） - 效果列表。
normalize （bool） - 当时，此函数始终返回，样本值为标准化为 . 如果 input file 是整数 WAV，则 give 会将生成的 Tensor 类型更改为 integer 类型。此参数对其他格式没有影响而不是整数 WAV 类型。Truefloat32[-1.0, 1.0]False
channels_first （bool） – 当为 True 时，返回的 Tensor 具有维度。否则，返回的 Tensor 的维度为。[channel, time][time, channel]

返回

生成的 Tensor 和 sample rate。如果，则生成的 Tensor 始终为 type。如果且输入音频文件为整数型 WAV 文件，则生成的 Tensor 具有相应的整数类型。（注意：不支持 24 位整数类型）如果，则生成的 Tensor 具有维度，否则。normalize=Truefloat32normalize=Falsechannels_first=True[channel, time][time, channel]

返回类型

元组[torch.张量、int]

示例 - 基本用法

>>>
>>> # Defines the effects to apply
>>> effects = [
...     ['gain', '-n'],  # normalises to 0dB
...     ['pitch', '5'],  # 5 cent pitch shift
...     ['rate', '8000'],  # resample to 8000 Hz
... ]
>>>
>>> # Apply effects and load data with channels_first=True
>>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True)
>>>
>>> # Check the result
>>> waveform.shape
torch.Size([2, 8000])
>>> waveform
tensor([[ 5.1151e-03,  1.8073e-02,  2.2188e-02,  ...,  1.0431e-07,
         -1.4761e-07,  1.8114e-07],
        [-2.6924e-03,  2.1860e-03,  1.0650e-02,  ...,  6.4122e-07,
         -5.6159e-07,  4.8103e-07]])
>>> sample_rate
8000

示例 - 对数据集应用随机速度扰动

>>>
>>> # Load data from file, apply random speed perturbation
>>> class RandomPerturbationFile(torch.utils.data.Dataset):
...     """Given flist, apply random speed perturbation
...
...     Suppose all the input files are at least one second long.
...     """
...     def __init__(self, flist: List[str], sample_rate: int):
...         super().__init__()
...         self.flist = flist
...         self.sample_rate = sample_rate
...         self.rng = None
...
...     def __getitem__(self, index):
...         speed = self.rng.uniform(0.5, 2.0)
...         effects = [
...             ['gain', '-n', '-10'],  # apply 10 db attenuation
...             ['remix', '-'],  # merge all the channels
...             ['speed', f'{speed:.5f}'],  # duration is now 0.5 ~ 2.0 seconds.
...             ['rate', f'{self.sample_rate}'],
...             ['pad', '0', '1.5'],  # add 1.5 seconds silence at the end
...             ['trim', '0', '2'],  # get the first 2 seconds
...         ]
...         waveform, _ = torchaudio.sox_effects.apply_effects_file(
...             self.flist[index], effects)
...         return waveform
...
...     def __len__(self):
...         return len(self.flist)
...
>>> dataset = RandomPerturbationFile(file_list, sample_rate=8000)
>>> loader = torch.utils.data.DataLoader(dataset, batch_size=32)
>>> for batch in loader:
>>>     pass

遗产¶

SoxEffect¶

类 [来源]torchaudio.sox_effects.SoxEffect¶

创建一个对象，用于在 Python 和 C++ 之间传递 sox 效果信息

警告

此功能已弃用。请迁移到apply_effects_file()或apply_effects_tensor().

返回: 具有以下属性的对象：ename （str），它是 name of effect 和 eopts （List[str]）（效果选项列表）。
返回类型: SoxEffect

SoxEffectsChain （索克斯效果链）¶

类（归一化： Union[bool， float， Callable] = True， channels_first： bool = True， out_siginfo： Any = None， out_encinfo： Any = None， filetype： str = 'raw'）[来源]torchaudio.sox_effects.SoxEffectsChain¶

SoX 效果链类。

警告

此类已弃用。请迁移到apply_effects_file()或apply_effects_tensor().

参数

normalization （bool， number， or callable， optional） – 如果为 boolean ，则输出除以（假定有符号的 32 位音频），并归一化为。如果，则 output 除以该数字。如果，则输出将作为参数传递给给定函数，则输出除以结果。（默认：True1 << 31[-1, 1]numbercallableTrue)
channels_first （bool， optional） – 在结果中首先设置通道或首先设置长度。（默认：True)
out_siginfo （sox_signalinfo_t，可选） – sox_signalinfo_t类型，如果音频类型不能自动确定。（默认：None)
out_encinfo （sox_encodinginfo_t，可选） – sox_encodinginfo_t类型，如果音频类型不能为自动确定。（默认：None)
FileType （STR， optional）（文件类型或扩展名） – 如果 SOX 无法自动确定，则要设置的文件类型或扩展名。（默认：'raw')

返回

大小或 L 为数字的输出 Tensor 音频帧数，C 是声道数。一个整数，它是音频（如文件的元数据中所列）[C x L][L x C]

返回类型

元组[Tensor， int]

例

>>> class MyDataset(Dataset):
...     def __init__(self, audiodir_path):
...         self.data = [
...             os.path.join(audiodir_path, fn)
...             for fn in os.listdir(audiodir_path)]
...         self.E = torchaudio.sox_effects.SoxEffectsChain()
...         self.E.append_effect_to_chain("rate", [16000])  # resample to 16000hz
...         self.E.append_effect_to_chain("channels", ["1"])  # mono signal
...     def __getitem__(self, index):
...         fn = self.data[index]
...         self.E.set_input_file(fn)
...         x, sr = self.E.sox_build_flow_effects()
...         return x, sr
...
...     def __len__(self):
...         return len(self.data)
...
>>> ds = MyDataset(path_to_audio_files)
>>> for sig, sr in ds:
...    pass

append_effect_to_chain(ename： str， eargs： Union[List[str]， str， None] = None） → None[来源]¶

将效果附加到 sox 效果链。

参数

ename （str） – 这是 effect 的名称
eargs （List[str] or str， optional） —— 这是一个效果选项列表。（默认：None)

clear_chain()→ 无[来源]¶: 清除 python 中的效果链

set_input_file(input_file： str） → None[来源]¶

设置链输入的输入文件

参数: input_file （str） – 输入文件的路径。

sox_build_flow_effects(out：可选[torch.Tensor] = None） → Tuple[torch.Tensor， int][来源]¶

构建效果链，并将效果从输入文件流向输出张量

参数: out （Tensor，可选） – 输出将写入的位置。（默认：None)
返回: 大小为 [C x L] 或 [L x C] 的输出 Tensor，其中 L 是音频帧数，C 是声道数。一个整数，即音频的采样率（如文件的元数据中所列）
返回类型: 元组[Tensor， int]

torchaudio.sox_effects¶

资源初始化 / 关闭¶

列出支持的效果¶

应用效果¶

对 Tensor 应用效果¶

对文件应用效果¶

遗产¶

SoxEffect¶

SoxEffectsChain （索克斯效果链）¶

文档

教程

资源