注意

单击此处下载完整的示例代码

StreamReader 基本用法¶

作者： Moto Hira

本教程介绍如何使用torchaudio.io.StreamReader自获取和解码音频/视频数据，并应用预处理 libavfilter 提供。

注意

本教程需要 FFmpeg 库。请参考 FFmpeg 依赖细节。

概述¶

Streaming API 利用 ffmpeg 强大的 I/O 功能。

它可以

加载各种格式的音频/视频
从本地/远程源加载音频/视频
从类文件对象加载音频/视频
从麦克风、摄像头和屏幕加载音频/视频
生成合成音视频信号。
逐块加载音频/视频
即时更改采样率/帧速率、图像大小
应用过滤器和预处理

流式处理 API 分三个步骤工作。

开放媒体源（文件、设备、合成模式生成器）
配置输出流
流式传输媒体

此时，ffmpeg 集成提供的功能仅限于

<一些媒体源> -> <可选处理> -> <tensor>

如果您有其他对您的用例有用的表单，（例如与 Torch 集成。Tensor 类型）请提交功能请求。

制备¶

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

import matplotlib.pyplot as plt
from torchaudio.io import StreamReader

base_url = "https://download.pytorch.org/torchaudio/tutorial-assets"
AUDIO_URL = f"{base_url}/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
VIDEO_URL = f"{base_url}/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"

2.4.0
2.4.0

打开源¶

流式 API 主要有三种不同的来源处理。无论使用哪个源，其余进程（配置输出，应用预处理）是相同的。

常见媒体格式（字符串类型或类似文件对象的资源指示符）
音频 / 视频设备
合成音频/视频源

以下部分介绍如何打开常见的媒体格式。对于其他流，请参阅 StreamReader 高级用法。

注意

支持的媒体（例如容器、编解码器和协议）的覆盖范围依赖于系统中找到的 FFmpeg 库。

如果 StreamReader 在打开源时引发错误，请检查那个 ffmpeg 命令可以处理它。

本地文件¶

要打开媒体文件，您只需将文件的路径传递给 StreamReader 的构造函数。

StreamReader(src="audio.wav")

StreamReader(src="audio.mp3")

这适用于图像文件、视频文件和视频流。

# Still image
StreamReader(src="image.jpeg")

# Video file
StreamReader(src="video.mpeg")

网络协议¶

您也可以直接传递 URL。

# Video on remote server
StreamReader(src="https://example.com/video.mp4")

# Playlist format
StreamReader(src="https://example.com/playlist.m3u")

# RTMP
StreamReader(src="rtmp://example.com:1935/live/app")

类似文件的对象¶

您还可以传递类似文件的对象。类文件对象必须实现符合readio.RawIOBase.read.

如果给定的类文件对象具有 method，则 StreamReader 会使用它也。在这种情况下，该方法应符合seekseekio.IOBase.seek.

# Open as fileobj with seek support
with open("input.mp4", "rb") as src:
    StreamReader(src=src)

如果第三方库实现，使其引发错误，您可以编写一个包装类来屏蔽该方法。seekseek

class UnseekableWrapper:
    def __init__(self, obj):
        self.obj = obj

    def read(self, n):
        return self.obj.read(n)

import requests

response = requests.get("https://example.com/video.mp4", stream=True)
s = StreamReader(UnseekableWrapper(response.raw))

import boto3

response = boto3.client("s3").get_object(Bucket="my_bucket", Key="key")
s = StreamReader(UnseekableWrapper(response["Body"]))

注意

当使用不可查找的类文件对象时，源媒体必须为 streamable 的。例如，有效的 MP4 格式对象可以具有其元数据在媒体数据的开头或结尾。开头有 metadata 的不能在没有 method seek 的情况下打开，但不能打开 metadata 在结尾的 metadata 的没有寻找。

无头介质¶

如果尝试加载无标题原始数据，则可以使用和指定数据的格式。formatoption

假设您使用命令将音频文件转换为 faw 格式如下;sox

# Headerless, 16-bit signed integer PCM, resampled at 16k Hz.
$ sox original.wav -r 16000 raw.s2

此类音频可以按如下方式打开。

StreamReader(src="raw.s2", format="s16le", option={"sample_rate": "16000"})

检查源流¶

打开媒体后，我们可以检查流并配置输出流。

您可以使用 .num_src_streams

注意

流的数量不是通道的数量。每个音频流可以包含任意数量的声道。

要检查源流的元数据，您可以使用 method 并提供源流的索引。get_src_stream_info()

此方法返回。如果源 stream 是 audio 类型，则返回类型为，即 SourceStream 的子类，具有其他特定于音频的属性。同样，如果源流为 video 类型，则返回类型为 .SourceStreamSourceAudioStreamSourceVideoStream

对于常规音频格式和静止图像格式（如 WAV 和 JPEG），存储流的数量为 1。

streamer = StreamReader(AUDIO_URL)
print("The number of source streams:", streamer.num_src_streams)
print(streamer.get_src_stream_info(0))

The number of source streams: 1
SourceAudioStream(media_type='audio', codec='pcm_s16le', codec_long_name='PCM signed 16-bit little-endian', format='s16', bit_rate=256000, num_frames=0, bits_per_sample=0, metadata={}, sample_rate=16000.0, num_channels=1)

容器格式和播放列表格式可能包含多个流不同媒体类型。

src = "https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8"
streamer = StreamReader(src)
print("The number of source streams:", streamer.num_src_streams)
for i in range(streamer.num_src_streams):
    print(streamer.get_src_stream_info(i))

The number of source streams: 27
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=335457, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:39.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '2177116'}, width=960, height=540, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1351204, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:42.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '8001098'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1019347, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:48.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '6312875'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=750899, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:54.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '4943747'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=513057, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:59.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '3216424'}, width=1280, height=720, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=185749, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:03.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1268994'}, width=768, height=432, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=111939, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:06.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '902298'}, width=640, height=360, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=59938, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:07.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '541052'}, width=480, height=270, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=335457, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:39.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '2399619'}, width=960, height=540, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1351204, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:42.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '8223601'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1019347, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:48.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '6535378'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=750899, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:54.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '5166250'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=513057, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:59.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '3438927'}, width=1280, height=720, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=185749, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:03.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1491497'}, width=768, height=432, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=111939, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:06.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1124801'}, width=640, height=360, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=59938, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:07.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '763555'}, width=480, height=270, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=335457, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:39.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '2207619'}, width=960, height=540, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1351204, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:42.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '8031601'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1019347, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:48.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '6343378'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=750899, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:54.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '4974250'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=513057, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:59.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '3246927'}, width=1280, height=720, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=185749, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:03.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1299497'}, width=768, height=432, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=111939, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:06.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '932801'}, width=640, height=360, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=59938, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:07.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '571555'}, width=480, height=270, frame_rate=30.0)
SourceAudioStream(media_type='audio', codec='aac', codec_long_name='AAC (Advanced Audio Coding)', format='fltp', bit_rate=60076, num_frames=0, bits_per_sample=0, metadata={'comment': 'English', 'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T22:14:57.000000Z', 'language': 'en', 'major_brand': 'mp42', 'minor_version': '1'}, sample_rate=48000.0, num_channels=2)
SourceAudioStream(media_type='audio', codec='ac3', codec_long_name='ATSC A/52A (AC-3)', format='fltp', bit_rate=384000, num_frames=0, bits_per_sample=0, metadata={'comment': 'English', 'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T22:15:46.000000Z', 'language': 'en', 'major_brand': 'mp42', 'minor_version': '1'}, sample_rate=48000.0, num_channels=6)
SourceAudioStream(media_type='audio', codec='eac3', codec_long_name='ATSC A/52B (AC-3, E-AC-3)', format='fltp', bit_rate=192000, num_frames=0, bits_per_sample=0, metadata={'comment': 'English', 'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T22:16:17.000000Z', 'language': 'en', 'major_brand': 'mp42', 'minor_version': '1'}, sample_rate=48000.0, num_channels=6)

配置输出流¶

流 API 允许您从任意组合中流式传输数据输入流。如果您的应用程序不需要音频或视频，您可以省略它们。或者，如果要应用不同的预处理添加到同一源流中，您可以复制源流。

默认流¶

当源中有多个流时，它不会立即明确应该使用哪个流。

FFmpeg 实现了一些启发式方法来确定默认流。生成的流索引通过

default_audio_stream和。default_video_stream

配置输出流¶

一旦您知道要使用哪个源流，您就可以使用和配置输出流。add_basic_audio_stream()add_basic_video_stream()

这些方法提供了一种简单的方法来更改 media 来匹配应用程序的要求。

两种方法共有的参数是;

frames_per_chunk：最大帧数在每次迭代时返回。对于音频，生成的张量将是（frames_per_chunk， num_channels）的形状。对于视频，它将是（frames_per_chunk， num_channels， height， width）。
buffer_chunk_size：内部缓冲的最大块数。当 StreamReader 缓冲了此数量的块并被要求拉取 more frames，则 StreamReader 会丢弃旧的帧/块。
stream_index：源流的索引。
decoder：如果提供，则覆盖解码器。如果检测不到，则很有用编解码器。
decoder_option：解码器的选项。

对于音频输出流，您可以提供以下附加参数来更改音频属性。

format：默认情况下，StreamReader 返回 float32 dtype 的张量，样本值范围为 [-1， 1]。通过提供参数生成的 dtype 和 value 范围已更改。format
sample_rate：如果提供，StreamReader 会动态对音频进行重新采样。

对于视频输出流，可以使用以下参数。

format：图像帧格式。默认情况下，StreamReader 返回 8 位 3 通道中的帧，按 RGB 顺序。
frame_rate：通过拖放或复制来更改帧速率框架。不执行插值。
width、：更改图像大小。height

streamer = StreamReader(...)

# Stream audio from default audio source stream
# 256 frames at a time, keeping the original sampling rate.
streamer.add_basic_audio_stream(
    frames_per_chunk=256,
)

# Stream audio from source stream `i`.
# Resample audio to 8k Hz, stream 256 frames at each
streamer.add_basic_audio_stream(
    frames_per_chunk=256,
    stream_index=i,
    sample_rate=8000,
)

# Stream video from default video source stream.
# 10 frames at a time, at 30 FPS
# RGB color channels.
streamer.add_basic_video_stream(
    frames_per_chunk=10,
    frame_rate=30,
    format="rgb24"
)

# Stream video from source stream `j`,
# 10 frames at a time, at 30 FPS
# BGR color channels with rescaling to 128x128
streamer.add_basic_video_stream(
    frames_per_chunk=10,
    stream_index=j,
    frame_rate=30,
    width=128,
    height=128,
    format="bgr24"
)

您可以按照与检查源流。报告配置的输出流的数量，并获取有关输出流的信息。num_out_streamsget_out_stream_info()

for i in range(streamer.num_out_streams):
    print(streamer.get_out_stream_info(i))

如果要删除输出流，可以使用 method 来实现。remove_stream()

# Removes the first output stream.
streamer.remove_stream(0)

流¶

要流式传输媒体数据，streamer 会交替执行获取和解码源数据，并将生成的音频/视频数据到客户端代码。

有一些低级方法可以执行这些作。和。is_buffer_ready()process_packet()pop_chunks()

在本教程中，我们将使用高级 API，即迭代器协议。它就像一个循环一样简单。for

streamer = StreamReader(...)
streamer.add_basic_audio_stream(...)
streamer.add_basic_video_stream(...)

for chunks in streamer.stream():
    audio_chunk, video_chunk = chunks
    ...

例¶

让我们通过一个示例视频来配置输出流。我们将使用以下视频。

来源：https://svs.gsfc.nasa.gov/13013（此视频属于公共领域）

图片来源：NASA 的戈达德太空飞行中心。

NASA 的媒体使用指南：https://www.nasa.gov/multimedia/guidelines/index.html

打开源媒体¶

首先，我们列出可用的 streams 及其属性。

streamer = StreamReader(VIDEO_URL)
for i in range(streamer.num_src_streams):
    print(streamer.get_src_stream_info(i))

SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=9958354, num_frames=6175, bits_per_sample=8, metadata={'creation_time': '2018-07-24T17:40:48.000000Z', 'encoder': 'AVC Coding', 'handler_name': '\x1fMainconcept Video Media Handler', 'language': 'eng', 'vendor_id': '[0][0][0][0]'}, width=1920, height=1080, frame_rate=29.97002997002997)
SourceAudioStream(media_type='audio', codec='aac', codec_long_name='AAC (Advanced Audio Coding)', format='fltp', bit_rate=317375, num_frames=9658, bits_per_sample=0, metadata={'creation_time': '2018-07-24T17:40:48.000000Z', 'handler_name': '#Mainconcept MP4 Sound Media Handler', 'language': 'eng', 'vendor_id': '[0][0][0][0]'}, sample_rate=48000.0, num_channels=2)

现在我们配置输出流。

配置 ouptut 流¶

# fmt: off
# Audio stream with 8k Hz
streamer.add_basic_audio_stream(
    frames_per_chunk=8000,
    sample_rate=8000,
)

# Audio stream with 16k Hz
streamer.add_basic_audio_stream(
    frames_per_chunk=16000,
    sample_rate=16000,
)

# Video stream with 960x540 at 1 FPS.
streamer.add_basic_video_stream(
    frames_per_chunk=1,
    frame_rate=1,
    width=960,
    height=540,
    format="rgb24",
)

# Video stream with 320x320 (stretched) at 3 FPS, grayscale
streamer.add_basic_video_stream(
    frames_per_chunk=3,
    frame_rate=3,
    width=320,
    height=320,
    format="gray",
)
# fmt: on

注意

在配置多个 output streams 时，为了保留所有 streams synced，请设置参数，使和或之间的比率为在输出流中保持一致。frames_per_chunksample_rateframe_rate

检查输出流。

for i in range(streamer.num_out_streams):
    print(streamer.get_out_stream_info(i))

OutputAudioStream(source_index=1, filter_description='aresample=8000,aformat=sample_fmts=fltp', media_type='audio', format='fltp', sample_rate=8000.0, num_channels=2)
OutputAudioStream(source_index=1, filter_description='aresample=16000,aformat=sample_fmts=fltp', media_type='audio', format='fltp', sample_rate=16000.0, num_channels=2)
OutputVideoStream(source_index=0, filter_description='fps=1,scale=width=960:height=540,format=pix_fmts=rgb24', media_type='video', format='rgb24', width=960, height=540, frame_rate=1.0)
OutputVideoStream(source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0)

删除第二个音频流。

streamer.remove_stream(1)
for i in range(streamer.num_out_streams):
    print(streamer.get_out_stream_info(i))

OutputAudioStream(source_index=1, filter_description='aresample=8000,aformat=sample_fmts=fltp', media_type='audio', format='fltp', sample_rate=8000.0, num_channels=2)
OutputVideoStream(source_index=0, filter_description='fps=1,scale=width=960:height=540,format=pix_fmts=rgb24', media_type='video', format='rgb24', width=960, height=540, frame_rate=1.0)
OutputVideoStream(source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0)

流¶

跳到 10 秒点。

streamer.seek(10.0)

现在，让我们最终迭代输出流。

n_ite = 3
waveforms, vids1, vids2 = [], [], []
for i, (waveform, vid1, vid2) in enumerate(streamer.stream()):
    waveforms.append(waveform)
    vids1.append(vid1)
    vids2.append(vid2)
    if i + 1 == n_ite:
        break

对于音频流，chunk Tensor 的形状为（frames_per_chunk， num_channels），对于视频流，它是（frames_per_chunk， num_color_channels， height， width）。

print(waveforms[0].shape)
print(vids1[0].shape)
print(vids2[0].shape)

torch.Size([8000, 2])
torch.Size([1, 3, 540, 960])
torch.Size([3, 1, 320, 320])

让我们想象一下我们收到了什么。

k = 3
fig = plt.figure()
gs = fig.add_gridspec(3, k * n_ite)
for i, waveform in enumerate(waveforms):
    ax = fig.add_subplot(gs[0, k * i : k * (i + 1)])
    ax.specgram(waveform[:, 0], Fs=8000)
    ax.set_yticks([])
    ax.set_xticks([])
    ax.set_title(f"Iteration {i}")
    if i == 0:
        ax.set_ylabel("Stream 0")
for i, vid in enumerate(vids1):
    ax = fig.add_subplot(gs[1, k * i : k * (i + 1)])
    ax.imshow(vid[0].permute(1, 2, 0))  # NCHW->HWC
    ax.set_yticks([])
    ax.set_xticks([])
    if i == 0:
        ax.set_ylabel("Stream 1")
for i, vid in enumerate(vids2):
    for j in range(3):
        ax = fig.add_subplot(gs[2, k * i + j : k * i + j + 1])
        ax.imshow(vid[j].permute(1, 2, 0), cmap="gray")
        ax.set_yticks([])
        ax.set_xticks([])
        if i == 0 and j == 0:
            ax.set_ylabel("Stream 2")
plt.tight_layout()

标记：torchaudio.io

脚本总运行时间：（0 分 13.447 秒）

由 Sphinx-Gallery 生成的图库

StreamReader 基本用法¶

概述¶

制备¶

打开源¶

本地文件¶

网络协议¶

类似文件的对象¶

无头介质¶

检查源流¶

配置输出流¶

默认流¶

配置输出流¶

流¶

例¶

打开源媒体¶

配置 ouptut 流¶

流¶

文档

教程

资源