Ogg简述
rfc: https://datatracker.ietf.org/doc/html/rfc3533
Ogg是一种音频的容器格式,常见的音频容器格式还有:mp3,aac,wav等等;
“Ogg”意指一种文件格式,可以纳入各式各样自由和开放源代码的编解码器,包含音效、视频、文字(像字幕)与元数据的处理。
OGGVobis(oggVorbis)是一种新的音频压缩格式,类似于MP3等的音乐格式。
OggVobis是完全免费、开放和没有专利限制的。OggVorbis文件的扩展名是.OGG。
Ogg文件格式可以不断地进行大小和音质的改良,而不影响旧有的编码器或播放器。OGG Vorbis有一个特点是支持多声道。
Ogg的特点
- Ogg封装多种格式的二进制数据,它可以封装任何类型的: video, audio,image, text, or, generally speaking, any time-continuously sampled data.
- Ogg可以被那些能提供自己的帧分离机制的传输协议如(UDP,RTP)直接使用;OggS是一个基于流式存储(如文件)和传输(如tcp/管道)的方案;而传输时需要指明其承载的是哪种编码协议;
- Ogg可以封装多种类型的编码格式,并以逻辑流来呈现区分,Ogg传输比特流被设计用来提供帧式的,具备错误校验和包含未封装的数据包组成的高层次编码流,例如
作为Vorbis音频编解码器或即将到来的Tarkin和Theora视频 - 它能够交错不同的二进制媒体和其他时间连续的数据流,这些数据流由编码器准备成一个数据包序列。Ogg提供了足够的信息来正确地将数据分离回这种编码器在原始数据包边界处创建的数据包中,而不需要依赖解码来寻找数据包边界。
Ogg的物理流和逻辑流的概念
- Ogg的实际文件称为物理流,而其中封装的一个或多个的不同编码流,称为逻辑流;一个逻辑流提供给ogg的封装过程,有一个结构,
例如,它被分离成一系列称为包的东西;包由该逻辑位流的编码器创建,仅代表该编码器的有意义的实体(例如,未压缩的流可以使用
视频帧作为信息包)。它们不包含边界信息——它们串在一起就像是没有标志的随机字节流。(注意这里的包和网络的包不同) - Ogg背后的设计理念是提供一种通用的线性媒体传输格式支持基于文件的存储和基于流的存储独立的一个或几个交叉媒体流的传输
媒体数据的编码格式。这样一个封装格式需要提供: 其实就是Ogg本身支持的特性:- framing for logical bitstreams 逻辑位流的帧
- interleaving(交错) of different logical bitstreams.
- detection of corruption. 校验差错
- recapture after a parsing error 解析错误重新捕获
- position landmarks for direct random access of arbitrary positions in the bitstream.
- streaming capability (i.e., no seeking is needed to build a 100% complete bitstream)small overhead (i.e., use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking).
- simplicity to enable fast parsing.
- simple concatenation mechanism of several physical bitstreams.
Ogg支持逻辑流,可以封装多个逻辑流,每个逻辑流有自己的头和数据页;
Ogg物理流和逻辑流的封装,bos,eos等
物理流包含了多个逻辑流,由页交错组成;并在页的级别上有序;
每个逻辑流被一个唯一的序列号标识,在物理页的头部中;这个号是随机的,和内容和编码器没有任何关系;
多个逻辑流是共存的,他们不需要有序,仅需要在自己的逻辑流上有序就行;在重组时,会依赖头部的相关字段进行有序的重组
恢复每个逻辑流;每个逻辑流只能包含一种类型的数据,但是页是变长的,并且有一个页头部包含封装信息和错误恢复信息;每个逻辑流都以bos页
(begining of stream)开头,并以eos页(end of stream)结尾bos页需要包含的内容,对音频:采样率,通道数等解码需要的字段,而在最前面的多个字节往往是编码的标识魔数,bos也支持第二个辅助头,
因为不知道这个头什么时候结束,或者有多大;它也不包含任何实际的载荷数据,所以一个物理流开始于bos页,接着是辅助头,接着实际数据封装一个或多个逻辑流被称为媒体映射,一个例子是Ogg Vorbis,即使用了Ogg封装了Vorbis编码的音频流,并进行tcp传输
Ogg提供了两种混合的方式,grouping和chaining,前者是交错的,不同的编码逻辑流交错一起,用于需要类似音画同步的多编码同步中;而chaining是一种简单的有序形式,一个逻辑流之后才是下一个逻辑流
在grouping方式下的基本特点,即bos连在一起,接着是secondary header辅助头连在一起,接着是数据,最后是eos,eos不用全都连在一起,见下面的例子,且每个逻辑流有唯一的id,在实际封装的物理流中
这两种方式可以共存,但是得保持他们各自的特点,grouped和chained可以共存,如上,在grouped结束后紧跟着chained的
1
2
3
4
5
6physical bitstream with pages of
different logical bitstreams grouped and chained
-------------------------------------------------------------
|*A*|*B*|*C*|A|A|C|B|A|B|#A#|C|...|B|C|#B#|#C#|*D*|D|...|#D#|
-------------------------------------------------------------
bos bos bos eos eos eos bos eos解释:
A B C是三个不同的逻辑流,被封装在一个物理流中; A为流A的bos,以此类推,#A# 是流A的eos,
ABC逻辑流是grouping的方式,而D是chaining的方式,以为是放在ABC之后的(不一定有辅助头)ogg不知道时间只知道需要,依赖上层给出来和位置等,无音画时间同步等
Ogg does not know any specifics about the codec data except that eachlogical bitstream belongs to a different codec,ogg 不知道数据更具体的细节,除了每个逻辑流属于不同的编码, 编码的数按序写入,并带了位置标记Granule position.)
Ogg不知道时间,它只知道顺序增长。以及无单位的位置标记。app只能通过更高层的拿到时间信息,那些能调用编解码API,来分配和转换granule positions or time.
Ogg如何封装一个编码的逻辑流: 一个Packet可能跨页,或者包含多个包
- 包packet是从编码器编码后的数据,它依赖于编码的格式;
- 从Ogg角度看,包可以是任意size,一个具体的媒体映射将定义如何组装和分拆包,从一个媒体编码器;Ogg有最大64KB的限制,为了简化,Ogg分割每个包成255B长的chunks 加最后一个比较短的chunk(即packet size%255后剩下的)这些chunk称为 Ogg Segments, 它们只是逻辑上的构建,并且自己没有header;
- 一组连续的seg 被封装在长度可变的page中,并在page前插入一个header; page header中的seg table告诉关于每个seg的长度;页头中有个字段Header_type ,表示是否是和上个页属于同个packet(即同个packet的连续页的下一页); 可以通过判断255这个数字,来判断是否是packet的最后一个page
- 编码是比较快的,并且期望每个包大小在20-200bytes之间的大小;这个是设计上的调整而不是建议;极端的2字节小,则每个都加header,则开销很大,而若分的大,则有分两段的情况;
下面是一个实际的例子:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37logical bitstream with packet boundaries
-----------------------------------------------------------------
> | packet_1 | packet_2 | packet_3 | <
-----------------------------------------------------------------
|segmentation (logically only)
v
packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs)
------------------------------ -------------------- ------------
.. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|s_4| |seg_1|s_2 | ..
------------------------------ -------------------- ------------
| page encapsulation
v
page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data)
------------------------ ---------------- ------------------------
|H|------------------- | |H|----------- | |H|------------------- |
|D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ...
|R|------------------- | |R|----------- | |R|------------------- |
------------------------ ---------------- ------------------------
|
pages of |
other --------| |
logical -------
bitstreams | MUX |
-------
|
v
page_1 page_2 page_3
------ ------ ------- ----- -------
... || | || | || | || | || | ...
------ ------ ------- ----- -------
physical Ogg bitstreamOgg的页头部封装格式:
// ogg格式
// ——————————————————————————–
// 域名称 占用字节 描述
// ——————————————————————————–
// capture_pattern 4 页标识,”OggS”的ASCII字符 4F 67 67 53
// structure_version 1 版本ID,当前版本默认=0
// Header_type_flag 1 页头部类型
// Granule_position 8 区段位置
// Serial_number 4 逻辑流的序列号
// Page_seguence_number 4 本页在逻辑流的序号,OGG解码器据此识别有无页丢失。
// CRC_cbecksum 4 循环冗余校验码校验和
// Number_page_segments 1 本页的区段数量,指明区段表中有多少个区段长度,≤
// Segment_table ≤255 区段长度表,每个字节表示一个区段的长度
// ——————————————————————————–
详细说明:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101The Ogg page header has the following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_pattern: Magic number for page start "OggS" | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | header_type | granule_position 8B | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | bitstream_serial_number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_sequence_number | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC_checksum | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The fields in the page header have the following meaning:
1. capture_pattern: a 4 Byte field that signifies the beginning of a
page. It contains the magic numbers:魔数
0x4f 'O'
0x67 'g'
0x67 'g'
0x53 'S'
It helps a decoder to find the page boundaries and regain
synchronisation after parsing a corrupted stream. Once the
capture pattern is found, the decoder verifies page sync and
integrity by computing and comparing the checksum.
2. stream_structure_version: 1 Byte signifying the version number of
the Ogg file format used in this stream (this document specifies
version 0). 版本号
3. header_type_flag: the bits in this 1 Byte field identify the
specific type of this page. 头标记
* bit 0x01
set: page contains data of a packet continued from the previous
page 和前一个page属于同个packet
unset: page contains a fresh packet 新packet中的page
* bit 0x02
set: this is the first page of a logical bitstream (bos)
unset: this page is not a first page 逻辑流的第一个包bos
* bit 0x04
set: this is the last page of a logical bitstream (eos)
unset: this page is not a last page
4. granule_position: an 8 Byte field containing position information.
For example, for an audio stream, it MAY contain the total number
of PCM samples encoded after including all frames finished on this
page. For a video stream it MAY contain the total number of video
frames encoded after this page. This is a hint for the decoder
and gives it some timing and position information. Its meaning is
dependent on the codec for that logical bitstream and specified in
a specific media mapping. A special value of -1 (in two's
complement) indicates that no packets finish on this page. 比较复杂
5. bitstream_serial_number: a 4 Byte field containing the unique
serial number by which the logical bitstream is identified. 逻辑流id
6. page_sequence_number: a 4 Byte field containing the sequence
number of the page so the decoder can identify page loss. This
sequence number is increasing on each logical bitstream
separately.逻辑流中的页id,每个逻辑流中增加
7. CRC_checksum: a 4 Byte field containing a 32 bit CRC checksum of
the page (including header with zero CRC field and page content).
The generator polynomial is 0x04c11db7. CRC 校验
8. number_page_segments: 1 Byte giving the number of segment entries
encoded in the segment table.seg的数量 ,一个页由多个seg构成,一个packet可能包含多个页;其实是packet分为多个seg后封装到page中;见上;
9. segment_table: number_page_segments Bytes containing the lacing
values of all segments in this page. Each Byte contains one
lacing value. number_page_segments个字节,包含页中所有segments的lacing values, 每个字节为一个lacing value
The total header size in bytes is given by:
header_size = number_page_segments + 27 [Byte]
The total page size in Bytes is given by: 一个完整页的大小:
page_size = header_size + sum(lacing_values: 1..number_page_segments)
[Byte]
ogg格式文件和例子:
见附件有一个ogg格式的音频:
1 | 4f 67 67 53 00 02 00 00 00 00 00 00 00 00 5c 59 |
ogg和它的媒体协议;
首页数据的最开始的几个字节就是描述编码的标识的ascii码;
有:
1 | Codec Identifier | Codecs Parameter |
Ogg相关重要概念的解释:
- Granule Position
类似于dts,一些实现将dts填充到这个字段上。
granule翻译为颗粒,在这里应该表示为单位时间的采样数,。在首页和comment header中,必须是0。即逻辑流的音频数据前的头都是0
The granule position MUST be zero for the ID header page and the page
where the comment header completes. That is, the first page in the
logical stream and the last header page before the first audio data
page both have a granule position of zero.
在音频数据页中的granule position, encodes(表示)PCM采样的总数,在这个流直到这个页的最后一个可解码的采样,所以一般是大于0的 :所以可以用dts直接填充?
The granule position of an audio data page encodes the total number
of PCM samples in the stream up to and including the last fully
decodable sample from the last packet completed on that page. The
granule position of the first audio data page will usually be larger
than zero, as described in Section 4.5.
跨页的情况,中间的是-1
A page that is entirely spanned by a single packet (that completes on
a subsequent page) has no granule position, and the granule position
field is set to the special value ‘-1’ in two’s complement.
在一个音频数据页中的采样颗粒位置是以PCM音频为单位采样频率固定为48千赫;但是可能运行Opus 解码是不同的采样率,但所有的Opus包编码的采样是在一个采样率48Khz下; 因此,granule position还是总是count samples假设是48KHz
The granule position of an audio data page is in units of PCM audio
samples at a fixed rate of 48 kHz (per channel; a stereo stream’s
granule position does not increment at twice the speed of a mono
stream). It is possible to run an Opus decoder at other sampling
rates, but all Opus packets encode samples at a sampling rate that
evenly divides 48 kHz. Therefore, the value in the granule position
field always counts samples assuming a 48 kHz decoding rate, and the
rest of this specification makes the same assumption.
一个Opus包的时长,可以是任意2.5ms的倍数,最大是120ms.duration被编码在TOC sequence,在每个包开始地方;采样数被解码器返回,根据这个duration,即使是头几个包
例如:一个20ms的包 喂到一个解码器,以48KHz的,将返回960个采样;一个demuxer分流器可能在每个ogg包的开始解析TOC sequence,从一个已知的包根据一个已知的granule position向后或向前工作
为了分配一个granule位置给每个包;或设置每个单独的采样
The duration of an Opus packet as defined in [RFC6716] can be any
multiple of 2.5 ms, up to a maximum of 120 ms. This duration is
encoded in the TOC sequence at the beginning of each packet. The
number of samples returned by a decoder corresponds to this duration
exactly, even for the first few packets. For example, a 20 ms packet
fed to a decoder running at 48 kHz will always return 960 samples. A
demuxer can parse the TOC sequence at the beginning of each Ogg
packet to work backwards or forwards from a packet with a known
granule position (i.e., the last packet completed on some page) in
order to assign granule positions to every packet, or even every
individual sample. The one exception is the last page in the stream,
as described below.唯一的例外是流的最后一页,
如下所述。
所有其他有带完整包的页,在第一个之后, 必须有一个等于该页中完整的packets中包含的采样数量的granule position,加上最近的带完整包的页的granule position
All other pages with completed packets after the first MUST have a
granule position equal to the number of samples contained in packets
that complete on that page plus the granule position of the most
recent page with completed packets.
这个保证了一个分流器能分配独立的包相同的granule position,当向前或向后工作。对这个情况,没有任何的gap
This guarantees that a demuxer
can assign individual packets the same granule position when working
forwards as when working backwards. For this to work, there cannot
be any gaps.
更多解释:https://wiki.xiph.org/OggOpus
1 | (This is packed in the same way the rest of Ogg data is packed; LSb of LSB first. Note that the 'position' data specifies a 'sample' number (eg, in a CD quality sample is four octets, 16 bits for left and 16 bits for right; in video it would likely be the frame number. It is up to the specific codec in use to define the semantic meaning of the granule position value). The position specified is the total samples encoded after including all packets finished on this page (packets begun on this page but continuing on to the next page do not count). The rationale here is that the position specified in the frame header of the last page tells how long the data coded by the bitstream is. A truncated stream will still return the proper number of samples that can be decoded fully. |
Ogg封包成帧的过程:
1 PCM采集原始音频:
sample: 即采样,单位,一般说48Khz则是1s有48000个采样; 即48000samples per second;
对48khz而言,PCM 1s采样 48000个samples;
音频帧: 对50fps而言,1s有50帧,则1帧是20ms,对48Khz而言,则1帧有 20ms/1000ms * 48000 个samples2 将帧打包进入opus编码
一个Opus包的时长,可以是任意2.5ms的倍数,最大是120ms.duration被编码在TOC sequence,在每个包开始地方;采样数被解码器返回,根据这个duration,即使是头几个包
例如:一个20ms的包 喂到一个解码器,以48KHz的,将返回960个采样;
Opus包的格式可以是:https://datatracker.ietf.org/doc/html/rfc6716#section-3
每个Opus包以TOC或类似的头开始,有几种方式:1帧一包,2帧一包,多帧一包;注意这里的toc中的config配置的对应的ms是对应的packet中的compressed frame对应的,若一个packet有多个frame,计算这个packet的总
samples时,要乘以 packet中的frames的数量;以此来判断<120ms,否则异常:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51+-----------------------+-----------+-----------+-------------------+
| Configuration | Mode | Bandwidth | Frame Sizes |
| Number(s) | | | |
+-----------------------+-----------+-----------+-------------------+
| 0...3 | SILK-only | NB | 10, 20, 40, 60 ms |
| | | | |
| 4...7 | SILK-only | MB | 10, 20, 40, 60 ms |
| | | | |
| 8...11 | SILK-only | WB | 10, 20, 40, 60 ms |
| | | | |
| 12...13 | Hybrid | SWB | 10, 20 ms |
| | | | |
| 14...15 | Hybrid | FB | 10, 20 ms |
| | | | |
| 16...19 | CELT-only | NB | 2.5, 5, 10, 20 ms |
| | | | |
| 20...23 | CELT-only | WB | 2.5, 5, 10, 20 ms |
| | | | |
| 24...27 | CELT-only | SWB | 2.5, 5, 10, 20 ms |
| | | | |
| 28...31 | CELT-only | FB | 2.5, 5, 10, 20 ms |
+-----------------------+-----------+-----------+-------------------+
eg:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| config |s|1|0| N1 (1-2 bytes): |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :
| Compressed frame 1 (N1 bytes)... |
: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Compressed frame 2... :
: |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/*Returns the duration of the packet (in samples at 48 kHz), or a negative
value on error.*/
static int op_get_packet_duration(const unsigned char *_data,int _len){
int nframes;
int frame_size;
int nsamples;
nframes=opus_packet_get_nb_frames(_data,_len);
if(OP_UNLIKELY(nframes<0))return OP_EBADPACKET;
frame_size=opus_packet_get_samples_per_frame(_data,48000);
nsamples=nframes*frame_size;
if(OP_UNLIKELY(nsamples>120*48))return OP_EBADPACKET;
return nsamples;
}
更多打包方式见opus协议3 将opus编码打包到Ogg
oggs从 opus编码器拿到packet后,因为oggs本身的封装是最大255个segtable,每个seg的最大值是255Byte,所以oggs页最大是64K;但是因为网络等问题,需要分段,即将opus传递进来的packet分成255byte的chunk,和
packet分完最后的小的:如
1 | raw packet: |
From Ogg’s perspective, packets can be of any arbitrary size. A
specific media mapping will define how to group or break up packets
from a specific media encoder. As Ogg pages have a maximum size of
about 64 kBytes, sometimes a packet has to be distributed over
several pages. To simplify that process, Ogg divides each packet
into 255 byte long chunks plus a final shorter chunk. These chunks
are called “Ogg Segments”. They are only a logical construct and do
not have a header for themselves.
所以这里可以理解:下图中packet是opus封包后的一个packet对应一个toc头,可能压缩了多个frame; 一个packet被拆为多个seg,装到page中;带头下发;
1 | logical bitstream with packet boundaries |
+ 一个Ogg页,若不限制,可以封装<=255个segment,一个seg是一个帧,根据oggopus rfc,opus packet 是2.5ms的倍数,最大是120ms;故若一个Oggs页最多可以包255个音频帧:
This is allowed to be up to 32 bits to support the maximum duration of a single Ogg page (255 packets * 120 ms per
packet == 1,468,800 samples at 48 kHz). 255*120/1000 *48000 =1468800 这里的packet,就是上面提到的音频帧
+ 常见的是一个page有一个segment
+ 协议里的lacing value,是segment的大小,在segment table中有多个lacing value,每个表示每个seg的大小;
+ lacing value的大小推荐: <=255Bytes, 即一个seg的大小,:https://www.xiph.org/ogg/doc/framing.html
- 4 将原始音频进行封装:
1)在网络传输中,以包的形式传输,包的大小和网络协议等有关,对udp而言,包的大小受udp包的大小影响,又进一步被MTU影响;
2)根据包和帧的关系,将帧打包到包中,可以是一帧一包,或者一帧多包,或者多帧一包;
其他:libopus中对ganule posiontion的检查
因为在使用库的过程中,踩过坑,这个字段填充的不太对导致解码失败:
下面看看源码对这个字段怎么检查:
前言:
对ganule posiontion的解释: 是至今到此page 时,已完成的packet的采样数量:
若对于20ms每帧来说,第一帧完成时的ganule posiontion是20/1000 *48000 = 960,所以对于1page1seg1frame下来说,第一个page音频数据页中的ganule postion是960,以此类推;
相关库:
opusfile-0.11
opus-1.3
libopusenc-0.2.1
…
1 | /*Starting from current cursor position, get the initial PCM offset of the next |
ogg文件实践:
在网上暂时没有找到相关parse,可以用Audacity打开;或者自己写
ogg相关库和官网,ref
https://www.xiph.org/ogg/doc/rfc3533.txt
https://www.xiph.org/ogg/doc/libogg/datastructures.html
https://rfc2cn.com/rfc3533.html