0%

audio_ogg

Ogg简述

rfc: https://datatracker.ietf.org/doc/html/rfc3533
Ogg是一种音频的容器格式,常见的音频容器格式还有:mp3,aac,wav等等;
“Ogg”意指一种文件格式,可以纳入各式各样自由和开放源代码的编解码器,包含音效、视频、文字(像字幕)与元数据的处理。
OGGVobis(oggVorbis)是一种新的音频压缩格式,类似于MP3等的音乐格式。
OggVobis是完全免费、开放和没有专利限制的。OggVorbis文件的扩展名是.OGG。
Ogg文件格式可以不断地进行大小和音质的改良,而不影响旧有的编码器或播放器。OGG Vorbis有一个特点是支持多声道。

Ogg的特点

  • Ogg封装多种格式的二进制数据,它可以封装任何类型的: video, audio,image, text, or, generally speaking, any time-continuously sampled data.
  • Ogg可以被那些能提供自己的帧分离机制的传输协议如(UDP,RTP)直接使用;OggS是一个基于流式存储(如文件)和传输(如tcp/管道)的方案;而传输时需要指明其承载的是哪种编码协议;
  • Ogg可以封装多种类型的编码格式,并以逻辑流来呈现区分,Ogg传输比特流被设计用来提供帧式的,具备错误校验和包含未封装的数据包组成的高层次编码流,例如
    作为Vorbis音频编解码器或即将到来的Tarkin和Theora视频
  • 它能够交错不同的二进制媒体和其他时间连续的数据流,这些数据流由编码器准备成一个数据包序列。Ogg提供了足够的信息来正确地将数据分离回这种编码器在原始数据包边界处创建的数据包中,而不需要依赖解码来寻找数据包边界。

Ogg的物理流和逻辑流的概念

  • Ogg的实际文件称为物理流,而其中封装的一个或多个的不同编码流,称为逻辑流;一个逻辑流提供给ogg的封装过程,有一个结构,
    例如,它被分离成一系列称为包的东西;包由该逻辑位流的编码器创建,仅代表该编码器的有意义的实体(例如,未压缩的流可以使用
    视频帧作为信息包)。它们不包含边界信息——它们串在一起就像是没有标志的随机字节流。(注意这里的包和网络的包不同)
  • Ogg背后的设计理念是提供一种通用的线性媒体传输格式支持基于文件的存储和基于流的存储独立的一个或几个交叉媒体流的传输
    媒体数据的编码格式。这样一个封装格式需要提供: 其实就是Ogg本身支持的特性:
    • framing for logical bitstreams 逻辑位流的帧
    • interleaving(交错) of different logical bitstreams.
    • detection of corruption. 校验差错
    • recapture after a parsing error 解析错误重新捕获
    • position landmarks for direct random access of arbitrary positions in the bitstream.
    • streaming capability (i.e., no seeking is needed to build a 100% complete bitstream)small overhead (i.e., use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking).
    • simplicity to enable fast parsing.
    • simple concatenation mechanism of several physical bitstreams.
      Ogg支持逻辑流,可以封装多个逻辑流,每个逻辑流有自己的头和数据页;

Ogg物理流和逻辑流的封装,bos,eos等

  • 物理流包含了多个逻辑流,由页交错组成;并在页的级别上有序;
    每个逻辑流被一个唯一的序列号标识,在物理页的头部中;这个号是随机的,和内容和编码器没有任何关系;
    多个逻辑流是共存的,他们不需要有序,仅需要在自己的逻辑流上有序就行;在重组时,会依赖头部的相关字段进行有序的重组
    恢复每个逻辑流;

  • 每个逻辑流只能包含一种类型的数据,但是页是变长的,并且有一个页头部包含封装信息和错误恢复信息;每个逻辑流都以bos页
    (begining of stream)开头,并以eos页(end of stream)结尾

  • bos页需要包含的内容,对音频:采样率,通道数等解码需要的字段,而在最前面的多个字节往往是编码的标识魔数,bos也支持第二个辅助头,
    因为不知道这个头什么时候结束,或者有多大;它也不包含任何实际的载荷数据,所以一个物理流开始于bos页,接着是辅助头,接着实际数据

  • 封装一个或多个逻辑流被称为媒体映射,一个例子是Ogg Vorbis,即使用了Ogg封装了Vorbis编码的音频流,并进行tcp传输

  • Ogg提供了两种混合的方式,grouping和chaining,前者是交错的,不同的编码逻辑流交错一起,用于需要类似音画同步的多编码同步中;而chaining是一种简单的有序形式,一个逻辑流之后才是下一个逻辑流

  • 在grouping方式下的基本特点,即bos连在一起,接着是secondary header辅助头连在一起,接着是数据,最后是eos,eos不用全都连在一起,见下面的例子,且每个逻辑流有唯一的id,在实际封装的物理流中

  • 这两种方式可以共存,但是得保持他们各自的特点,grouped和chained可以共存,如上,在grouped结束后紧跟着chained的

    1
    2
    3
    4
    5
    6
                physical bitstream with pages of
    different logical bitstreams grouped and chained
    -------------------------------------------------------------
    |*A*|*B*|*C*|A|A|C|B|A|B|#A#|C|...|B|C|#B#|#C#|*D*|D|...|#D#|
    -------------------------------------------------------------
    bos bos bos eos eos eos bos eos

    解释:
    A B C是三个不同的逻辑流,被封装在一个物理流中; A为流A的bos,以此类推,#A# 是流A的eos,
    ABC逻辑流是grouping的方式,而D是chaining的方式,以为是放在ABC之后的(不一定有辅助头)

  • ogg不知道时间只知道需要,依赖上层给出来和位置等,无音画时间同步等
    Ogg does not know any specifics about the codec data except that each

  • logical bitstream belongs to a different codec,ogg 不知道数据更具体的细节,除了每个逻辑流属于不同的编码, 编码的数按序写入,并带了位置标记Granule position.)
    Ogg不知道时间,它只知道顺序增长。以及无单位的位置标记。app只能通过更高层的拿到时间信息,那些能调用编解码API,来分配和转换granule positions or time.

Ogg如何封装一个编码的逻辑流: 一个Packet可能跨页,或者包含多个包

  • 包packet是从编码器编码后的数据,它依赖于编码的格式;
  • 从Ogg角度看,包可以是任意size,一个具体的媒体映射将定义如何组装和分拆包,从一个媒体编码器;Ogg有最大64KB的限制,为了简化,Ogg分割每个包成255B长的chunks 加最后一个比较短的chunk(即packet size%255后剩下的)这些chunk称为 Ogg Segments, 它们只是逻辑上的构建,并且自己没有header;
  • 一组连续的seg 被封装在长度可变的page中,并在page前插入一个header; page header中的seg table告诉关于每个seg的长度;页头中有个字段Header_type ,表示是否是和上个页属于同个packet(即同个packet的连续页的下一页); 可以通过判断255这个数字,来判断是否是packet的最后一个page
  • 编码是比较快的,并且期望每个包大小在20-200bytes之间的大小;这个是设计上的调整而不是建议;极端的2字节小,则每个都加header,则开销很大,而若分的大,则有分两段的情况;
    下面是一个实际的例子:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
        logical bitstream with packet boundaries
    -----------------------------------------------------------------
    > | packet_1 | packet_2 | packet_3 | <
    -----------------------------------------------------------------

    |segmentation (logically only)
    v

    packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs)
    ------------------------------ -------------------- ------------
    .. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|s_4| |seg_1|s_2 | ..
    ------------------------------ -------------------- ------------

    | page encapsulation
    v

    page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data)
    ------------------------ ---------------- ------------------------
    |H|------------------- | |H|----------- | |H|------------------- |
    |D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ...
    |R|------------------- | |R|----------- | |R|------------------- |
    ------------------------ ---------------- ------------------------

    |
    pages of |
    other --------| |
    logical -------
    bitstreams | MUX |
    -------
    |
    v

    page_1 page_2 page_3
    ------ ------ ------- ----- -------
    ... || | || | || | || | || | ...
    ------ ------ ------- ----- -------
    physical Ogg bitstream

    Ogg的页头部封装格式:

    // ogg格式
    // ——————————————————————————–
    // 域名称      占用字节  描述
    // ——————————————————————————–
    // capture_pattern    4  页标识,”OggS”的ASCII字符 4F 67 67 53
    // structure_version   1  版本ID,当前版本默认=0
    // Header_type_flag   1  页头部类型
    // Granule_position   8  区段位置
    // Serial_number     4  逻辑流的序列号
    // Page_seguence_number 4  本页在逻辑流的序号,OGG解码器据此识别有无页丢失。
    // CRC_cbecksum     4  循环冗余校验码校验和
    // Number_page_segments 1  本页的区段数量,指明区段表中有多少个区段长度,≤
    // Segment_table    ≤255 区段长度表,每个字节表示一个区段的长度
    // ——————————————————————————–
    详细说明:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
      The Ogg page header has the following format:

    0 1 2 3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | capture_pattern: Magic number for page start "OggS" | 0-3
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | version | header_type | granule_position 8B | 4-7
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | | 8-11
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | | bitstream_serial_number | 12-15
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | | page_sequence_number | 16-19
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | | CRC_checksum | 20-23
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | |page_segments | segment_table | 24-27
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ... | 28-
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    The fields in the page header have the following meaning:

    1. capture_pattern: a 4 Byte field that signifies the beginning of a
    page. It contains the magic numbers:魔数

    0x4f 'O'

    0x67 'g'

    0x67 'g'

    0x53 'S'

    It helps a decoder to find the page boundaries and regain
    synchronisation after parsing a corrupted stream. Once the
    capture pattern is found, the decoder verifies page sync and
    integrity by computing and comparing the checksum.

    2. stream_structure_version: 1 Byte signifying the version number of
    the Ogg file format used in this stream (this document specifies
    version 0). 版本号

    3. header_type_flag: the bits in this 1 Byte field identify the
    specific type of this page. 头标记

    * bit 0x01

    set: page contains data of a packet continued from the previous
    page 和前一个page属于同个packet

    unset: page contains a fresh packet 新packet中的page

    * bit 0x02

    set: this is the first page of a logical bitstream (bos)

    unset: this page is not a first page 逻辑流的第一个包bos

    * bit 0x04

    set: this is the last page of a logical bitstream (eos)

    unset: this page is not a last page

    4. granule_position: an 8 Byte field containing position information.
    For example, for an audio stream, it MAY contain the total number
    of PCM samples encoded after including all frames finished on this
    page. For a video stream it MAY contain the total number of video

    frames encoded after this page. This is a hint for the decoder
    and gives it some timing and position information. Its meaning is
    dependent on the codec for that logical bitstream and specified in
    a specific media mapping. A special value of -1 (in two's
    complement) indicates that no packets finish on this page. 比较复杂

    5. bitstream_serial_number: a 4 Byte field containing the unique
    serial number by which the logical bitstream is identified. 逻辑流id

    6. page_sequence_number: a 4 Byte field containing the sequence
    number of the page so the decoder can identify page loss. This
    sequence number is increasing on each logical bitstream
    separately.逻辑流中的页id,每个逻辑流中增加

    7. CRC_checksum: a 4 Byte field containing a 32 bit CRC checksum of
    the page (including header with zero CRC field and page content).
    The generator polynomial is 0x04c11db7. CRC 校验

    8. number_page_segments: 1 Byte giving the number of segment entries
    encoded in the segment table.seg的数量 ,一个页由多个seg构成,一个packet可能包含多个页;其实是packet分为多个seg后封装到page中;见上;

    9. segment_table: number_page_segments Bytes containing the lacing
    values of all segments in this page. Each Byte contains one
    lacing value. number_page_segments个字节,包含页中所有segments的lacing values, 每个字节为一个lacing value

    The total header size in bytes is given by:
    header_size = number_page_segments + 27 [Byte]

    The total page size in Bytes is given by: 一个完整页的大小:
    page_size = header_size + sum(lacing_values: 1..number_page_segments)
    [Byte]

ogg格式文件和例子:

见附件有一个ogg格式的音频:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4f 67 67 53 00 02 00 00 00 00 00 00 00 00 5c 59 
48 bg 00 00 00 00 e1 8f c5 fe 01 1e 01 76 6f 72 62 69 73

4f 67 67 53 : 4字节的页标识:字符 OggS
00:1B的版本号,0
02:1B的header_type,这里表示是一个bos
00 00 00 00 00 00 00 00:8B的Granule_position区段位置为0
5c 59 48 bg:4B的逻辑流ID
00 00 00 00: 4B的页Id,本页在逻辑流中的序号为0
e1 8f c5 fe: 4B的CRC循环冗余校验码校验和
01 :seg数量
1e :第一个seg长度:1e
76 6f 72 62 69 73:vorbis的Ascii码
之后是vorbis协议的数据;见相关协议;
当然还可以承载Opus 如OpusHeader这样的;详见Opus介绍

ogg和它的媒体协议;

首页数据的最开始的几个字节就是描述编码的标识的ascii码;
有:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
    Codec Identifier             | Codecs Parameter
-----------------------------------------------------------
char[5]: 'BBCD\0' | dirac
char[5]: '\177FLAC' | flac
char[7]: '\x80theora' | theora
char[7]: '\x01vorbis' | vorbis
char[8]: 'CELT ' | celt
char[8]: 'CMML\0\0\0\0' | cmml
char[8]: '\213JNG\r\n\032\n' | jng
char[8]: '\x80kate\0\0\0' | kate
char[8]: 'OggMIDI\0' | midi
char[8]: '\212MNG\r\n\032\n' | mng
char[8]: 'PCM ' | pcm
char[8]: '\211PNG\r\n\032\n' | png
char[8]: 'Speex ' | speex
char[8]: 'YUV4MPEG' | yuv4mpeg

An up-to-date version of this table is kept at Xiph.org (see
[Codecs]).
现在加了Opus,具体见opus编码rfc https://datatracker.ietf.org/doc/html/rfc7845

Ogg相关重要概念的解释:

  • Granule Position
    类似于dts,一些实现将dts填充到这个字段上。
    granule翻译为颗粒,在这里应该表示为单位时间的采样数,。在首页和comment header中,必须是0。即逻辑流的音频数据前的头都是0
    The granule position MUST be zero for the ID header page and the page
    where the comment header completes. That is, the first page in the
    logical stream and the last header page before the first audio data
    page both have a granule position of zero.

在音频数据页中的granule position, encodes(表示)PCM采样的总数,在这个流直到这个页的最后一个可解码的采样,所以一般是大于0的 :所以可以用dts直接填充?
The granule position of an audio data page encodes the total number
of PCM samples in the stream up to and including the last fully
decodable sample from the last packet completed on that page. The
granule position of the first audio data page will usually be larger
than zero, as described in Section 4.5.

跨页的情况,中间的是-1
A page that is entirely spanned by a single packet (that completes on
a subsequent page) has no granule position, and the granule position
field is set to the special value ‘-1’ in two’s complement.

在一个音频数据页中的采样颗粒位置是以PCM音频为单位采样频率固定为48千赫;但是可能运行Opus 解码是不同的采样率,但所有的Opus包编码的采样是在一个采样率48Khz下; 因此,granule position还是总是count samples假设是48KHz
The granule position of an audio data page is in units of PCM audio
samples at a fixed rate of 48 kHz (per channel; a stereo stream’s
granule position does not increment at twice the speed of a mono
stream). It is possible to run an Opus decoder at other sampling
rates, but all Opus packets encode samples at a sampling rate that
evenly divides 48 kHz. Therefore, the value in the granule position
field always counts samples assuming a 48 kHz decoding rate, and the
rest of this specification makes the same assumption.

一个Opus包的时长,可以是任意2.5ms的倍数,最大是120ms.duration被编码在TOC sequence,在每个包开始地方;采样数被解码器返回,根据这个duration,即使是头几个包
例如:一个20ms的包 喂到一个解码器,以48KHz的,将返回960个采样;一个demuxer分流器可能在每个ogg包的开始解析TOC sequence,从一个已知的包根据一个已知的granule position向后或向前工作
为了分配一个granule位置给每个包;或设置每个单独的采样

The duration of an Opus packet as defined in [RFC6716] can be any
multiple of 2.5 ms, up to a maximum of 120 ms. This duration is
encoded in the TOC sequence at the beginning of each packet. The
number of samples returned by a decoder corresponds to this duration
exactly, even for the first few packets. For example, a 20 ms packet
fed to a decoder running at 48 kHz will always return 960 samples. A
demuxer can parse the TOC sequence at the beginning of each Ogg
packet to work backwards or forwards from a packet with a known
granule position (i.e., the last packet completed on some page) in
order to assign granule positions to every packet, or even every
individual sample. The one exception is the last page in the stream,
as described below.唯一的例外是流的最后一页,

如下所述。
所有其他有带完整包的页,在第一个之后, 必须有一个等于该页中完整的packets中包含的采样数量的granule position,加上最近的带完整包的页的granule position
All other pages with completed packets after the first MUST have a
granule position equal to the number of samples contained in packets
that complete on that page plus the granule position of the most
recent page with completed packets.
这个保证了一个分流器能分配独立的包相同的granule position,当向前或向后工作。对这个情况,没有任何的gap
This guarantees that a demuxer
can assign individual packets the same granule position when working
forwards as when working backwards. For this to work, there cannot
be any gaps.

 更多解释:https://wiki.xiph.org/OggOpus
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(This is packed in the same way the rest of Ogg data is packed; LSb of LSB first. Note that the 'position' data specifies a 'sample' number (eg, in a CD quality sample is four octets, 16 bits for left and 16 bits for right; in video it would likely be the frame number. It is up to the specific codec in use to define the semantic meaning of the granule position value). The position specified is the total samples encoded after including all packets finished on this page (packets begun on this page but continuing on to the next page do not count). The rationale here is that the position specified in the frame header of the last page tells how long the data coded by the bitstream is. A truncated stream will still return the proper number of samples that can be decoded fully.
A special value of '-1' (in two's complement) indicates that no packets finish on this page.
这个打包和之后的ogg数据的打包方式一样。LSb of LSB first.注意到这个位置数据具体化采样的数量。(例如,在一个CD中采样是4B,16bit左声 16bit右声。就像在视频中的frame number,如何具体定义取决于具体的编解码器)。
位置的具体值是在这个page包含了所有完成的packet后的所有的采样值;(注意若packet在这个page开始,但下个page还持续,则不统计到这个page的granule position上)。这里的基本原理是,
在最后一页的帧头中指定的位置告诉了由位流编码的数据的长度。被截断的流仍然会返回适当数量的可以被完全解码的样本。
一个特殊值'-1'(在2的补码中)表示该页上没有包完成。
byte value

6 0xXX LSB
7 0xXX
8 0xXX
9 0xXX
10 0xXX
11 0xXX
12 0xXX
13 0xXX MSB


https://www.xiph.org/ogg/doc/framing.html

Ogg封包成帧的过程:

  • 1 PCM采集原始音频:
    sample: 即采样,单位,一般说48Khz则是1s有48000个采样; 即48000samples per second;
    对48khz而言,PCM 1s采样 48000个samples;
    音频帧: 对50fps而言,1s有50帧,则1帧是20ms,对48Khz而言,则1帧有 20ms/1000ms * 48000 个samples

  • 2 将帧打包进入opus编码
    一个Opus包的时长,可以是任意2.5ms的倍数,最大是120ms.duration被编码在TOC sequence,在每个包开始地方;采样数被解码器返回,根据这个duration,即使是头几个包
    例如:一个20ms的包 喂到一个解码器,以48KHz的,将返回960个采样;
    Opus包的格式可以是:https://datatracker.ietf.org/doc/html/rfc6716#section-3
    每个Opus包以TOC或类似的头开始,有几种方式:1帧一包,2帧一包,多帧一包;注意这里的toc中的config配置的对应的ms是对应的packet中的compressed frame对应的,若一个packet有多个frame,计算这个packet的总
    samples时,要乘以 packet中的frames的数量;以此来判断<120ms,否则异常:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
       +-----------------------+-----------+-----------+-------------------+
    | Configuration | Mode | Bandwidth | Frame Sizes |
    | Number(s) | | | |
    +-----------------------+-----------+-----------+-------------------+
    | 0...3 | SILK-only | NB | 10, 20, 40, 60 ms |
    | | | | |
    | 4...7 | SILK-only | MB | 10, 20, 40, 60 ms |
    | | | | |
    | 8...11 | SILK-only | WB | 10, 20, 40, 60 ms |
    | | | | |
    | 12...13 | Hybrid | SWB | 10, 20 ms |
    | | | | |
    | 14...15 | Hybrid | FB | 10, 20 ms |
    | | | | |
    | 16...19 | CELT-only | NB | 2.5, 5, 10, 20 ms |
    | | | | |
    | 20...23 | CELT-only | WB | 2.5, 5, 10, 20 ms |
    | | | | |
    | 24...27 | CELT-only | SWB | 2.5, 5, 10, 20 ms |
    | | | | |
    | 28...31 | CELT-only | FB | 2.5, 5, 10, 20 ms |
    +-----------------------+-----------+-----------+-------------------+
    eg:
    0 1 2 3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | config |s|1|0| N1 (1-2 bytes): |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :
    | Compressed frame 1 (N1 bytes)... |
    : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | | |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
    | Compressed frame 2... :
    : |
    | |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    /*Returns the duration of the packet (in samples at 48 kHz), or a negative
    value on error.*/
    static int op_get_packet_duration(const unsigned char *_data,int _len){
    int nframes;
    int frame_size;
    int nsamples;
    nframes=opus_packet_get_nb_frames(_data,_len);
    if(OP_UNLIKELY(nframes<0))return OP_EBADPACKET;
    frame_size=opus_packet_get_samples_per_frame(_data,48000);
    nsamples=nframes*frame_size;
    if(OP_UNLIKELY(nsamples>120*48))return OP_EBADPACKET;
    return nsamples;
    }
    更多打包方式见opus协议
  • 3 将opus编码打包到Ogg

oggs从 opus编码器拿到packet后,因为oggs本身的封装是最大255个segtable,每个seg的最大值是255Byte,所以oggs页最大是64K;但是因为网络等问题,需要分段,即将opus传递进来的packet分成255byte的chunk,和
packet分完最后的小的:如

1
2
3
4
5
raw packet:
___________________________________________
|______________packet data__________________| 753 bytes

lacing values for page header segment table: 255,255,243

From Ogg’s perspective, packets can be of any arbitrary size. A
specific media mapping will define how to group or break up packets
from a specific media encoder. As Ogg pages have a maximum size of
about 64 kBytes, sometimes a packet has to be distributed over
several pages. To simplify that process, Ogg divides each packet
into 255 byte long chunks plus a final shorter chunk. These chunks
are called “Ogg Segments”. They are only a logical construct and do
not have a header for themselves.
所以这里可以理解:下图中packet是opus封包后的一个packet对应一个toc头,可能压缩了多个frame; 一个packet被拆为多个seg,装到page中;带头下发;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
    logical bitstream with packet boundaries
-----------------------------------------------------------------
> | packet_1 | packet_2 | packet_3 | <
-----------------------------------------------------------------

|segmentation (logically only)
v

packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs)
------------------------------ -------------------- ------------
.. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|s_4| |seg_1|s_2 | ..
------------------------------ -------------------- ------------

| page encapsulation
v

page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data)
------------------------ ---------------- ------------------------
|H|------------------- | |H|----------- | |H|------------------- |
|D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ...
|R|------------------- | |R|----------- | |R|------------------- |
------------------------ ---------------- ------------------------

|
pages of |
other --------| |
logical -------
bitstreams | MUX |
-------
|
v

page_1 page_2 page_3
------ ------ ------- ----- -------
... || | || | || | || | || | ...
------ ------ ------- ----- -------
physical Ogg bitstream
  + 一个Ogg页,若不限制,可以封装<=255个segment,一个seg是一个帧,根据oggopus rfc,opus packet 是2.5ms的倍数,最大是120ms;故若一个Oggs页最多可以包255个音频帧:
    This is allowed to be up to 32 bits to support the maximum duration of a single Ogg page (255 packets * 120 ms per
   packet == 1,468,800 samples at 48 kHz).  255*120/1000 *48000 =1468800  这里的packet,就是上面提到的音频帧
  + 常见的是一个page有一个segment
  + 协议里的lacing value,是segment的大小,在segment table中有多个lacing value,每个表示每个seg的大小;
  + lacing value的大小推荐: <=255Bytes, 即一个seg的大小,:https://www.xiph.org/ogg/doc/framing.html
  • 4 将原始音频进行封装:
    1)在网络传输中,以包的形式传输,包的大小和网络协议等有关,对udp而言,包的大小受udp包的大小影响,又进一步被MTU影响;
    2)根据包和帧的关系,将帧打包到包中,可以是一帧一包,或者一帧多包,或者多帧一包;

其他:libopus中对ganule posiontion的检查

因为在使用库的过程中,踩过坑,这个字段填充的不太对导致解码失败:
下面看看源码对这个字段怎么检查:
前言:
对ganule posiontion的解释: 是至今到此page 时,已完成的packet的采样数量:
若对于20ms每帧来说,第一帧完成时的ganule posiontion是20/1000 *48000 = 960,所以对于1page1seg1frame下来说,第一个page音频数据页中的ganule postion是960,以此类推;
相关库:
opusfile-0.11
opus-1.3
libopusenc-0.2.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
/*Starting from current cursor position, get the initial PCM offset of the next
page.
This also validates the granule position on the first page with a completed
audio data packet, as required by the spec.
If this link is completely empty (no pages with completed packets), then this
function sets pcm_start=pcm_end=0 and returns the BOS page of the next link
(if any).
In the seekable case, we initialize pcm_end=-1 before calling this function,
so that later we can detect that the link was empty before calling
op_find_final_pcm_offset().
[inout] _link: The link for which to find pcm_start.
[out] _og: Returns the BOS page of the next link if this link was empty.
In the unseekable case, we can then feed this to
op_fetch_headers() to start the next link.
The caller may pass NULL (e.g., for seekable streams), in
which case this page will be discarded.
Return: 0 on success, 1 if there is a buffered BOS page available, or a
negative value on unrecoverable error.*/
static int op_find_initial_pcm_offset(OggOpusFile *_of,
OggOpusLink *_link,ogg_page *_og){
ogg_page og;
opus_int64 page_offset;
ogg_int64_t pcm_start;
ogg_int64_t prev_packet_gp;
ogg_int64_t cur_page_gp;
ogg_uint32_t serialno;
opus_int32 total_duration;
int durations[255];
int cur_page_eos;
int op_count;
int pi;
if(_og==NULL)_og=&og;
serialno=_of->os.serialno;
op_count=0;
/*We shouldn't have to initialize total_duration, but gcc is too dumb to
figure out that op_count>0 implies we've been through the whole loop at
least once.*/
total_duration=0;
do{
page_offset=op_get_next_page(_of,_og,_of->end);
/*We should get a page unless the file is truncated or mangled.
Otherwise there are no audio data packets in the whole logical stream.*/
if(OP_UNLIKELY(page_offset<0)){
/*Fail if there was a read error.*/
if(page_offset<OP_FALSE)return (int)page_offset;
/*Fail if the pre-skip is non-zero, since it's asking us to skip more
samples than exist.*/
if(_link->head.pre_skip>0)return OP_EBADTIMESTAMP;
_link->pcm_file_offset=0;
/*Set pcm_end and end_offset so we can skip the call to
op_find_final_pcm_offset().*/
_link->pcm_start=_link->pcm_end=0;
_link->end_offset=_link->data_offset;
return 0;
}
/*Similarly, if we hit the next link in the chain, we've gone too far.*/
if(OP_UNLIKELY(ogg_page_bos(_og))){
if(_link->head.pre_skip>0)return OP_EBADTIMESTAMP;
/*Set pcm_end and end_offset so we can skip the call to
op_find_final_pcm_offset().*/
_link->pcm_file_offset=0;
_link->pcm_start=_link->pcm_end=0;
_link->end_offset=_link->data_offset;
/*Tell the caller we've got a buffered page for them.*/
return 1;
}
/*Ignore pages from other streams (not strictly necessary, because of the
checks in ogg_stream_pagein(), but saves some work).*/
if(serialno!=(ogg_uint32_t)ogg_page_serialno(_og))continue;
ogg_stream_pagein(&_of->os,_og);
/*Bitrate tracking: add the header's bytes here.
The body bytes are counted when we consume the packets.*/
_of->bytes_tracked+=_og->header_len;
/*Count the durations of all packets in the page.*/
do total_duration=op_collect_audio_packets(_of,durations);
/*Ignore holes.*/
while(OP_UNLIKELY(total_duration<0));
op_count=_of->op_count;
}
while(op_count<=0);
/*We found the first page with a completed audio data packet: actually look
at the granule position.
RFC 3533 says, "A special value of -1 (in two's complement) indicates that
no packets finish on this page," which does not say that a granule
position that is NOT -1 indicates that some packets DO finish on that page
(even though this was the intention, libogg itself violated this intention
for years before we fixed it).
The Ogg Opus specification only imposes its start-time requirements
on the granule position of the first page with completed packets,
so we ignore any set granule positions until then.*/
cur_page_gp=_of->op[op_count-1].granulepos;
/*But getting a packet without a valid granule position on the page is not
okay.*/
if(cur_page_gp==-1)return OP_EBADTIMESTAMP;
cur_page_eos=_of->op[op_count-1].e_o_s;
if(OP_LIKELY(!cur_page_eos)){
/*The EOS flag wasn't set.
Work backwards from the provided granule position to get the starting PCM
offset.*/
if(OP_UNLIKELY(op_granpos_add(&pcm_start,cur_page_gp,-total_duration)<0)){
/*The starting granule position MUST not be smaller than the amount of
audio on the first page with completed packets.*/
return OP_EBADTIMESTAMP;
}
}
else{
/*The first page with completed packets was also the last.*/
if(OP_LIKELY(op_granpos_add(&pcm_start,cur_page_gp,-total_duration)<0)){ //这里会拿当前的ganule_pos和算出来的duration的应该有的sample来对比,若小,则返回异常;
/*If there's less audio on the page than indicated by the granule
position, then we're doing end-trimming, and the starting PCM offset
is zero by spec mandate.*/
pcm_start=0;
/*However, the end-trimming MUST not ask us to trim more samples than
exist after applying the pre-skip.*/
if(OP_UNLIKELY(op_granpos_cmp(cur_page_gp,_link->head.pre_skip)<0)){
return OP_EBADTIMESTAMP;
}
}
}
/*Timestamp the individual packets.*/
prev_packet_gp=pcm_start;
for(pi=0;pi<op_count;pi++){
if(cur_page_eos){
ogg_int64_t diff;
OP_ALWAYS_TRUE(!op_granpos_diff(&diff,cur_page_gp,prev_packet_gp));
diff=durations[pi]-diff;
/*If we have samples to trim...*/
if(diff>0){
/*If we trimmed the entire packet, stop (the spec says encoders
shouldn't do this, but we support it anyway).*/
if(OP_UNLIKELY(diff>durations[pi]))break;
_of->op[pi].granulepos=prev_packet_gp=cur_page_gp;
/*Move the EOS flag to this packet, if necessary, so we'll trim the
samples.*/
_of->op[pi].e_o_s=1;
continue;
}
}
/*Update the granule position as normal.*/
OP_ALWAYS_TRUE(!op_granpos_add(&_of->op[pi].granulepos,
prev_packet_gp,durations[pi]));
prev_packet_gp=_of->op[pi].granulepos;
}
/*Update the packet count after end-trimming.*/
_of->op_count=pi;
_of->cur_discard_count=_link->head.pre_skip;
_link->pcm_file_offset=0;
_of->prev_packet_gp=_link->pcm_start=pcm_start;
_of->prev_page_offset=page_offset;
return 0;
}

//收集所有包的duration:
/*Grab all the packets currently in the stream state, and compute their
durations.
_of->op_count is set to the number of packets collected.
[out] _durations: Returns the durations of the individual packets.
Return: The total duration of all packets, or OP_HOLE if there was a hole.*/
static opus_int32 op_collect_audio_packets(OggOpusFile *_of,
int _durations[255]){
opus_int32 total_duration;
int op_count;
/*Count the durations of all packets in the page.*/
op_count=0;
total_duration=0;
for(;;){
int ret;
/*This takes advantage of undocumented libogg behavior that returned
ogg_packet buffers are valid at least until the next page is
submitted.
Relying on this is not too terrible, as _none_ of the Ogg memory
ownership/lifetime rules are well-documented.
But I can read its code and know this will work.*/
ret=ogg_stream_packetout(&_of->os,_of->op+op_count);
if(!ret)break;
if(OP_UNLIKELY(ret<0)){
/*We shouldn't get holes in the middle of pages.*/
OP_ASSERT(op_count==0);
/*Set the return value and break out of the loop.
We want to make sure op_count gets set to 0, because we've ingested a
page, so any previously loaded packets are now invalid.*/
total_duration=OP_HOLE;
break;
}
/*Unless libogg is broken, we can't get more than 255 packets from a
single page.*/
OP_ASSERT(op_count<255);
_durations[op_count]=op_get_packet_duration(_of->op[op_count].packet,//返回的duration是packet的duration,若一个packet多个frame,要按照*来算;
_of->op[op_count].bytes);
if(OP_LIKELY(_durations[op_count]>0)){
/*With at most 255 packets on a page, this can't overflow.*/
total_duration+=_durations[op_count++];
}
/*Ignore packets with an invalid TOC sequence.*/
else if(op_count>0){
/*But save the granule position, if there was one.*/
_of->op[op_count-1].granulepos=_of->op[op_count].granulepos;
}
}
_of->op_pos=0;
_of->op_count=op_count;
return total_duration;
}

解析toc:
/*Returns the duration of the packet (in samples at 48 kHz), or a negative
value on error.*/
static int op_get_packet_duration(const unsigned char *_data,int _len){
int nframes;
int frame_size;
int nsamples;
nframes=opus_packet_get_nb_frames(_data,_len);
if(OP_UNLIKELY(nframes<0))return OP_EBADPACKET;
frame_size=opus_packet_get_samples_per_frame(_data,48000);
nsamples=nframes*frame_size; //一个packet的采样大小等于 toc的config中解析出来的每frame samples * 一个packet中的frames数量,
if(OP_UNLIKELY(nsamples>120*48))return OP_EBADPACKET; //一packet不能大于120ms
return nsamples;
}

//解析toc中的count:看是1packet封了几个frame
int opus_packet_get_nb_frames(const unsigned char packet[], opus_int32 len)
{
int count;
if (len<1)
return OPUS_BAD_ARG;
count = packet[0]&0x3;
if (count==0)
return 1;
else if (count!=3)
return 2;
else if (len<2)
return OPUS_INVALID_PACKET;
else
return packet[1]&0x3F;
}

//解析toc中的config,拿到具体的ms,即一帧是多少ms-->一帧多少sample
int opus_packet_get_samples_per_frame(const unsigned char *data,
opus_int32 Fs)
{
int audiosize;
if (data[0]&0x80)
{
audiosize = ((data[0]>>3)&0x3);
audiosize = (Fs<<audiosize)/400;
} else if ((data[0]&0x60) == 0x60)
{
audiosize = (data[0]&0x08) ? Fs/50 : Fs/100;
} else {
audiosize = ((data[0]>>3)&0x3);
if (audiosize == 3)
audiosize = Fs*60/1000;
else
audiosize = (Fs<<audiosize)/100;
}
return audiosize;
}

ogg文件实践:

在网上暂时没有找到相关parse,可以用Audacity打开;或者自己写

ogg相关库和官网,ref

https://www.xiph.org/ogg/doc/rfc3533.txt
https://www.xiph.org/ogg/doc/libogg/datastructures.html
https://rfc2cn.com/rfc3533.html