Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/libogg-1.1.3/doc/framing.html @ 52

Last change on this file since 52 was 15, checked in by landauf, 17 years ago

added libogg

File size: 14.4 KB
Line 
1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2<html>
3<head>
4
5<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
6<title>Ogg Documentation</title>
7
8<style type="text/css">
9body {
10  margin: 0 18px 0 18px;
11  padding-bottom: 30px;
12  font-family: Verdana, Arial, Helvetica, sans-serif;
13  color: #333333;
14  font-size: .8em;
15}
16
17a {
18  color: #3366cc;
19}
20
21img {
22  border: 0;
23}
24
25#xiphlogo {
26  margin: 30px 0 16px 0;
27}
28
29#content p {
30  line-height: 1.4;
31}
32
33h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
34  font-weight: bold;
35  color: #ff9900;
36  margin: 1.3em 0 8px 0;
37}
38
39h1 {
40  font-size: 1.3em;
41}
42
43h2 {
44  font-size: 1.2em;
45}
46
47h3 {
48  font-size: 1.1em;
49}
50
51li {
52  line-height: 1.4;
53}
54
55#copyright {
56  margin-top: 30px;
57  line-height: 1.5em;
58  text-align: center;
59  font-size: .8em;
60  color: #888888;
61  clear: both;
62}
63</style>
64
65</head>
66
67<body>
68
69<div id="xiphlogo">
70  <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
71</div>
72
73<h1>Ogg logical bitstream framing</h1>
74
75<h2>Ogg bitstreams</h2>
76
77<p>The Ogg transport bitstream is designed to provide framing, error
78protection and seeking structure for higher-level codec streams that
79consist of raw, unencapsulated data packets, such as the Vorbis audio
80codec or Tarkin video codec.</p>
81
82<h2>Application example: Vorbis</h2>
83
84<p>Vorbis encodes short-time blocks of PCM data into raw packets of
85bit-packed data.  These raw packets may be used directly by transport
86mechanisms that provide their own framing and packet-separation
87mechanisms (such as UDP datagrams).  For stream based storage (such as
88files) and transport (such as TCP streams or pipes), Vorbis uses the
89Ogg bitstream format to provide framing/sync, sync recapture
90after error, landmarks during seeking, and enough information to
91properly separate data back into packets at the original packet
92boundaries without relying on decoding to find packet boundaries.</p>
93
94<h2>Design constraints for Ogg bitstreams</h2>
95
96<ol>
97<li>True streaming; we must not need to seek to build a 100% complete bitstream.</li>
98<li>Use no more than approximately 1-2% of bitstream bandwidth for packet
99  boundary marking, high-level framing, sync and seeking.</li>
100<li>Specification of absolute position within the original sample stream.</li>
101<li>Simple mechanism to ease limited editing, such as a simplified concatenation
102  mechanism.</li>
103<li>Detection of corruption, recapture after error and direct, random
104  access to data at arbitrary positions in the bitstream.</li>
105</ol>
106
107<h2>Logical and Physical Bitstreams</h2>
108
109<p>A <em>logical</em> Ogg bitstream is a contiguous stream of
110sequential pages belonging only to the logical bitstream.  A
111<em>physical</em> Ogg bitstream is constructed from one or more
112than one logical Ogg bitstream (the simplest physical bitstream
113is simply a single logical bitstream).  We describe below the exact
114formatting of an Ogg logical bitstream.  Combining logical
115bitstreams into more complex physical bitstreams is described in the
116<a href="oggstream.html">Ogg bitstream overview</a>.  The exact
117mapping of raw Vorbis packets into a valid Ogg Vorbis physical
118bitstream is described in <a href="vorbis-stream.html">Vorbis
119bitstream mapping</a>.</p>
120
121<h2>Bitstream structure</h2>
122
123<p>An Ogg stream is structured by dividing incoming packets into
124segments of up to 255 bytes and then wrapping a group of contiguous
125packet segments into a variable length page preceded by a page
126header.  Both the header size and page size are variable; the page
127header contains sizing information and checksum data to determine
128header/page size and data integrity.</p>
129
130<p>The bitstream is captured (or recaptured) by looking for the beginning
131of a page, specifically the capture pattern.  Once the capture pattern
132is found, the decoder verifies page sync and integrity by computing
133and comparing the checksum. At that point, the decoder can extract the
134packets themselves.</p>
135
136<h3>Packet segmentation</h3>
137
138<p>Packets are logically divided into multiple segments before encoding
139into a page. Note that the segmentation and fragmentation process is a
140logical one; it's used to compute page header values and the original
141page data need not be disturbed, even when a packet spans page
142boundaries.</p>
143
144<p>The raw packet is logically divided into [n] 255 byte segments and a
145last fractional segment of &lt; 255 bytes.  A packet size may well
146consist only of the trailing fractional segment, and a fractional
147segment may be zero length.  These values, called "lacing values" are
148then saved and placed into the header segment table.</p>
149
150<p>An example should make the basic concept clear:</p>
151
152<pre>
153<tt>
154raw packet:
155  ___________________________________________
156 |______________packet data__________________| 753 bytes
157
158lacing values for page header segment table: 255,255,243
159</tt>
160</pre>
161
162<p>We simply add the lacing values for the total size; the last lacing
163value for a packet is always the value that is less than 255. Note
164that this encoding both avoids imposing a maximum packet size as well
165as imposing minimum overhead on small packets (as opposed to, eg,
166simply using two bytes at the head of every packet and having a max
167packet size of 32k.  Small packets (&lt;255, the typical case) are
168penalized with twice the segmentation overhead). Using the lacing
169values as suggested, small packets see the minimum possible
170byte-aligned overheade (1 byte) and large packets, over 512 bytes or
171so, see a fairly constant ~.5% overhead on encoding space.</p>
172
173<p>Note that a lacing value of 255 implies that a second lacing value
174follows in the packet, and a value of &lt; 255 marks the end of the
175packet after that many additional bytes.  A packet of 255 bytes (or a
176multiple of 255 bytes) is terminated by a lacing value of 0:</p>
177
178<pre><tt>
179raw packet:
180  _______________________________
181 |________packet data____________|          255 bytes
182
183lacing values: 255, 0
184</tt></pre>
185
186<p>Note also that a 'nil' (zero length) packet is not an error; it
187consists of nothing more than a lacing value of zero in the header.</p>
188
189<h3>Packets spanning pages</h3>
190
191<p>Packets are not restricted to beginning and ending within a page,
192although individual segments are, by definition, required to do so.
193Packets are not restricted to a maximum size, although excessively
194large packets in the data stream are discouraged; the Ogg
195bitstream specification strongly recommends nominal page size of
196approximately 4-8kB (large packets are foreseen as being useful for
197initialization data at the beginning of a logical bitstream).</p>
198
199<p>After segmenting a packet, the encoder may decide not to place all the
200resulting segments into the current page; to do so, the encoder places
201the lacing values of the segments it wishes to belong to the current
202page into the current segment table, then finishes the page.  The next
203page is begun with the first value in the segment table belonging to
204the next packet segment, thus continuing the packet (data in the
205packet body must also correspond properly to the lacing values in the
206spanned pages. The segment data in the first packet corresponding to
207the lacing values of the first page belong in that page; packet
208segments listed in the segment table of the following page must begin
209the page body of the subsequent page).</p>
210
211<p>The last mechanic to spanning a page boundary is to set the header
212flag in the new page to indicate that the first lacing value in the
213segment table continues rather than begins a packet; a header flag of
2140x01 is set to indicate a continued packet.  Although mandatory, it
215is not actually algorithmically necessary; one could inspect the
216preceding segment table to determine if the packet is new or
217continued.  Adding the information to the packet_header flag allows a
218simpler design (with no overhead) that needs only inspect the current
219page header after frame capture.  This also allows faster error
220recovery in the event that the packet originates in a corrupt
221preceding page, implying that the previous page's segment table
222cannot be trusted.</p>
223
224<p>Note that a packet can span an arbitrary number of pages; the above
225spanning process is repeated for each spanned page boundary.  Also a
226'zero termination' on a packet size that is an even multiple of 255
227must appear even if the lacing value appears in the next page as a
228zero-length continuation of the current packet.  The header flag
229should be set to 0x01 to indicate that the packet spanned, even though
230the span is a nil case as far as data is concerned.</p>
231
232<p>The encoding looks odd, but is properly optimized for speed and the
233expected case of the majority of packets being between 50 and 200
234bytes (note that it is designed such that packets of wildly different
235sizes can be handled within the model; placing packet size
236restrictions on the encoder would have only slightly simplified design
237in page generation and increased overall encoder complexity).</p>
238
239<p>The main point behind tracking individual packets (and packet
240segments) is to allow more flexible encoding tricks that requiring
241explicit knowledge of packet size. An example is simple bandwidth
242limiting, implemented by simply truncating packets in the nominal case
243if the packet is arranged so that the least sensitive portion of the
244data comes last.</p>
245
246<h3>Page header</h3>
247
248<p>The headering mechanism is designed to avoid copying and re-assembly
249of the packet data (ie, making the packet segmentation process a
250logical one); the header can be generated directly from incoming
251packet data.  The encoder buffers packet data until it finishes a
252complete page at which point it writes the header followed by the
253buffered packet segments.</p>
254
255<h4>capture_pattern</h4>
256
257<p>A header begins with a capture pattern that simplifies identifying
258pages; once the decoder has found the capture pattern it can do a more
259intensive job of verifying that it has in fact found a page boundary
260(as opposed to an inadvertent coincidence in the byte stream).</p>
261
262<pre><tt>
263 byte value
264
265  0  0x4f 'O'
266  1  0x67 'g'
267  2  0x67 'g'
268  3  0x53 'S' 
269</tt></pre>
270
271<h4>stream_structure_version</h4>
272
273<p>The capture pattern is followed by the stream structure revision:</p>
274
275<pre><tt>
276 byte value
277
278  4  0x00
279</tt></pre>
280 
281<h4>header_type_flag</h4>
282 
283<p>The header type flag identifies this page's context in the bitstream:</p>
284
285<pre><tt>
286 byte value
287
288  5  bitflags: 0x01: unset = fresh packet
289                       set = continued packet
290               0x02: unset = not first page of logical bitstream
291                       set = first page of logical bitstream (bos)
292               0x04: unset = not last page of logical bitstream
293                       set = last page of logical bitstream (eos)
294</tt></pre>
295
296<h4>absolute granule position</h4>
297
298<p>(This is packed in the same way the rest of Ogg data is packed; LSb
299of LSB first.  Note that the 'position' data specifies a 'sample'
300number (eg, in a CD quality sample is four octets, 16 bits for left
301and 16 bits for right; in video it would likely be the frame number.
302It is up to the specific codec in use to define the semantic meaning
303of the granule position value).  The position specified is the total
304samples encoded after including all packets finished on this page
305(packets begun on this page but continuing on to the next page do not
306count).  The rationale here is that the position specified in the
307frame header of the last page tells how long the data coded by the
308bitstream is.  A truncated stream will still return the proper number
309of samples that can be decoded fully.</p>
310
311<p>A special value of '-1' (in two's complement) indicates that no packets
312finish on this page.</p>
313
314<pre><tt>
315 byte value
316
317  6  0xXX LSB
318  7  0xXX
319  8  0xXX
320  9  0xXX
321 10  0xXX
322 11  0xXX
323 12  0xXX
324 13  0xXX MSB
325</tt></pre>
326
327<h4>stream serial number</h4>
328 
329<p>Ogg allows for separate logical bitstreams to be mixed at page
330granularity in a physical bitstream.  The most common case would be
331sequential arrangement, but it is possible to interleave pages for
332two separate bitstreams to be decoded concurrently.  The serial
333number is the means by which pages physical pages are associated with
334a particular logical stream.  Each logical stream must have a unique
335serial number within a physical stream:</p>
336
337<pre><tt>
338 byte value
339
340 14  0xXX LSB
341 15  0xXX
342 16  0xXX
343 17  0xXX MSB
344</tt></pre>
345
346<h4>page sequence no</h4>
347
348<p>Page counter; lets us know if a page is lost (useful where packets
349span page boundaries).</p>
350
351<pre><tt>
352 byte value
353
354 18  0xXX LSB
355 19  0xXX
356 20  0xXX
357 21  0xXX MSB
358</tt></pre>
359
360<h4>page checksum</h4>
361     
362<p>32 bit CRC value (direct algorithm, initial val and final XOR = 0,
363generator polynomial=0x04c11db7).  The value is computed over the
364entire header (with the CRC field in the header set to zero) and then
365continued over the page.  The CRC field is then filled with the
366computed value.</p>
367
368<p>(A thorough discussion of CRC algorithms can be found in <a
369href="ftp://ftp.rocksoft.com/papers/crc_v3.txt">"A
370Painless Guide to CRC Error Detection Algorithms"</a> by Ross
371Williams <a
372href="mailto:ross@guest.adelaide.edu.au">ross@guest.adelaide.edu.au</a>.)</p>
373
374<pre><tt>
375 byte value
376
377 22  0xXX LSB
378 23  0xXX
379 24  0xXX
380 25  0xXX MSB
381</tt></pre>
382
383<h4>page_segments</h4>
384
385<p>The number of segment entries to appear in the segment table. The
386maximum number of 255 segments (255 bytes each) sets the maximum
387possible physical page size at 65307 bytes or just under 64kB (thus
388we know that a header corrupted so as destroy sizing/alignment
389information will not cause a runaway bitstream.  We'll read in the
390page according to the corrupted size information that's guaranteed to
391be a reasonable size regardless, notice the checksum mismatch, drop
392sync and then look for recapture).</p>
393
394<pre><tt>
395 byte value
396
397 26 0x00-0xff (0-255)
398</tt></pre>
399
400<h4>segment_table (containing packet lacing values)</h4>
401
402<p>The lacing values for each packet segment physically appearing in
403this page are listed in contiguous order.</p>
404
405<pre><tt>
406 byte value
407
408 27 0x00-0xff (0-255)
409 [...]
410 n  0x00-0xff (0-255, n=page_segments+26)
411</tt></pre>
412
413<p>Total page size is calculated directly from the known header size and
414lacing values in the segment table. Packet data segments follow
415immediately after the header.</p>
416
417<p>Page headers typically impose a flat .25-.5% space overhead assuming
418nominal ~8k page sizes.  The segmentation table needed for exact
419packet recovery in the streaming layer adds approximately .5-1%
420nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
421stereo encodings.</p>
422
423<div id="copyright">
424  The Xiph Fish Logo is a
425  trademark (&trade;) of Xiph.Org.<br/>
426
427  These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
428</div>
429
430</body>
431</html>
Note: See TracBrowser for help on using the repository browser.