[16] | 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
---|
| 2 | <html> |
---|
| 3 | <head> |
---|
| 4 | |
---|
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> |
---|
| 6 | <title>Ogg Vorbis Documentation</title> |
---|
| 7 | |
---|
| 8 | <style type="text/css"> |
---|
| 9 | body { |
---|
| 10 | margin: 0 18px 0 18px; |
---|
| 11 | padding-bottom: 30px; |
---|
| 12 | font-family: Verdana, Arial, Helvetica, sans-serif; |
---|
| 13 | color: #333333; |
---|
| 14 | font-size: .8em; |
---|
| 15 | } |
---|
| 16 | |
---|
| 17 | a { |
---|
| 18 | color: #3366cc; |
---|
| 19 | } |
---|
| 20 | |
---|
| 21 | img { |
---|
| 22 | border: 0; |
---|
| 23 | } |
---|
| 24 | |
---|
| 25 | #xiphlogo { |
---|
| 26 | margin: 30px 0 16px 0; |
---|
| 27 | } |
---|
| 28 | |
---|
| 29 | #content p { |
---|
| 30 | line-height: 1.4; |
---|
| 31 | } |
---|
| 32 | |
---|
| 33 | h1, h1 a, h2, h2 a, h3, h3 a { |
---|
| 34 | font-weight: bold; |
---|
| 35 | color: #ff9900; |
---|
| 36 | margin: 1.3em 0 8px 0; |
---|
| 37 | } |
---|
| 38 | |
---|
| 39 | h1 { |
---|
| 40 | font-size: 1.3em; |
---|
| 41 | } |
---|
| 42 | |
---|
| 43 | h2 { |
---|
| 44 | font-size: 1.2em; |
---|
| 45 | } |
---|
| 46 | |
---|
| 47 | h3 { |
---|
| 48 | font-size: 1.1em; |
---|
| 49 | } |
---|
| 50 | |
---|
| 51 | li { |
---|
| 52 | line-height: 1.4; |
---|
| 53 | } |
---|
| 54 | |
---|
| 55 | #copyright { |
---|
| 56 | margin-top: 30px; |
---|
| 57 | line-height: 1.5em; |
---|
| 58 | text-align: center; |
---|
| 59 | font-size: .8em; |
---|
| 60 | color: #888888; |
---|
| 61 | clear: both; |
---|
| 62 | } |
---|
| 63 | </style> |
---|
| 64 | |
---|
| 65 | </head> |
---|
| 66 | |
---|
| 67 | <body> |
---|
| 68 | |
---|
| 69 | <div id="xiphlogo"> |
---|
| 70 | <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a> |
---|
| 71 | </div> |
---|
| 72 | |
---|
| 73 | <h1>Ogg logical bitstream framing</h1> |
---|
| 74 | |
---|
| 75 | <h2>Ogg bitstreams</h2> |
---|
| 76 | |
---|
| 77 | <p>The Ogg transport bitstream is designed to provide framing, error |
---|
| 78 | protection and seeking structure for higher-level codec streams that |
---|
| 79 | consist of raw, unencapsulated data packets, such as the Vorbis audio |
---|
| 80 | codec or Theora video codec.</p> |
---|
| 81 | |
---|
| 82 | <h2>Application example: Vorbis</h2> |
---|
| 83 | |
---|
| 84 | <p>Vorbis encodes short-time blocks of PCM data into raw packets of |
---|
| 85 | bit-packed data. These raw packets may be used directly by transport |
---|
| 86 | mechanisms that provide their own framing and packet-separation |
---|
| 87 | mechanisms (such as UDP datagrams). For stream based storage (such as |
---|
| 88 | files) and transport (such as TCP streams or pipes), Vorbis uses the |
---|
| 89 | Ogg bitstream format to provide framing/sync, sync recapture |
---|
| 90 | after error, landmarks during seeking, and enough information to |
---|
| 91 | properly separate data back into packets at the original packet |
---|
| 92 | boundaries without relying on decoding to find packet boundaries.</p> |
---|
| 93 | |
---|
| 94 | <h2>Design constraints for Ogg bitstreams</h2> |
---|
| 95 | |
---|
| 96 | <ol> |
---|
| 97 | <li>True streaming; we must not need to seek to build a 100% |
---|
| 98 | complete bitstream.</li> |
---|
| 99 | <li>Use no more than approximately 1-2% of bitstream bandwidth for |
---|
| 100 | packet boundary marking, high-level framing, sync and seeking.</li> |
---|
| 101 | <li>Specification of absolute position within the original sample |
---|
| 102 | stream.</li> |
---|
| 103 | <li>Simple mechanism to ease limited editing, such as a simplified |
---|
| 104 | concatenation mechanism.</li> |
---|
| 105 | <li>Detection of corruption, recapture after error and direct, random |
---|
| 106 | access to data at arbitrary positions in the bitstream.</li> |
---|
| 107 | </ol> |
---|
| 108 | |
---|
| 109 | <h2>Logical and Physical Bitstreams</h2> |
---|
| 110 | |
---|
| 111 | <p>A <em>logical</em> Ogg bitstream is a contiguous stream of |
---|
| 112 | sequential pages belonging only to the logical bitstream. A |
---|
| 113 | <em>physical</em> Ogg bitstream is constructed from one or more |
---|
| 114 | than one logical Ogg bitstream (the simplest physical bitstream |
---|
| 115 | is simply a single logical bitstream). We describe below the exact |
---|
| 116 | formatting of an Ogg logical bitstream. Combining logical |
---|
| 117 | bitstreams into more complex physical bitstreams is described in the |
---|
| 118 | <a href="oggstream.html">Ogg bitstream overview</a>. The exact |
---|
| 119 | mapping of raw Vorbis packets into a valid Ogg Vorbis physical |
---|
| 120 | bitstream is described in the Vorbis I Specification.</p> |
---|
| 121 | |
---|
| 122 | <h2>Bitstream structure</h2> |
---|
| 123 | |
---|
| 124 | <p>An Ogg stream is structured by dividing incoming packets into |
---|
| 125 | segments of up to 255 bytes and then wrapping a group of contiguous |
---|
| 126 | packet segments into a variable length page preceded by a page |
---|
| 127 | header. Both the header size and page size are variable; the page |
---|
| 128 | header contains sizing information and checksum data to determine |
---|
| 129 | header/page size and data integrity.</p> |
---|
| 130 | |
---|
| 131 | <p>The bitstream is captured (or recaptured) by looking for the beginning |
---|
| 132 | of a page, specifically the capture pattern. Once the capture pattern |
---|
| 133 | is found, the decoder verifies page sync and integrity by computing |
---|
| 134 | and comparing the checksum. At that point, the decoder can extract the |
---|
| 135 | packets themselves.</p> |
---|
| 136 | |
---|
| 137 | <h3>Packet segmentation</h3> |
---|
| 138 | |
---|
| 139 | <p>Packets are logically divided into multiple segments before encoding |
---|
| 140 | into a page. Note that the segmentation and fragmentation process is a |
---|
| 141 | logical one; it's used to compute page header values and the original |
---|
| 142 | page data need not be disturbed, even when a packet spans page |
---|
| 143 | boundaries.</p> |
---|
| 144 | |
---|
| 145 | <p>The raw packet is logically divided into [n] 255 byte segments and a |
---|
| 146 | last fractional segment of < 255 bytes. A packet size may well |
---|
| 147 | consist only of the trailing fractional segment, and a fractional |
---|
| 148 | segment may be zero length. These values, called "lacing values" are |
---|
| 149 | then saved and placed into the header segment table.</p> |
---|
| 150 | |
---|
| 151 | <p>An example should make the basic concept clear:</p> |
---|
| 152 | |
---|
| 153 | <pre> |
---|
| 154 | <tt> |
---|
| 155 | raw packet: |
---|
| 156 | ___________________________________________ |
---|
| 157 | |______________packet data__________________| 753 bytes |
---|
| 158 | |
---|
| 159 | lacing values for page header segment table: 255,255,243 |
---|
| 160 | </tt> |
---|
| 161 | </pre> |
---|
| 162 | |
---|
| 163 | <p>We simply add the lacing values for the total size; the last lacing |
---|
| 164 | value for a packet is always the value that is less than 255. Note |
---|
| 165 | that this encoding both avoids imposing a maximum packet size as well |
---|
| 166 | as imposing minimum overhead on small packets (as opposed to, eg, |
---|
| 167 | simply using two bytes at the head of every packet and having a max |
---|
| 168 | packet size of 32k. Small packets (<255, the typical case) are |
---|
| 169 | penalized with twice the segmentation overhead). Using the lacing |
---|
| 170 | values as suggested, small packets see the minimum possible |
---|
| 171 | byte-aligned overheade (1 byte) and large packets, over 512 bytes or |
---|
| 172 | so, see a fairly constant ~.5% overhead on encoding space.</p> |
---|
| 173 | |
---|
| 174 | <p>Note that a lacing value of 255 implies that a second lacing value |
---|
| 175 | follows in the packet, and a value of < 255 marks the end of the |
---|
| 176 | packet after that many additional bytes. A packet of 255 bytes (or a |
---|
| 177 | multiple of 255 bytes) is terminated by a lacing value of 0:</p> |
---|
| 178 | |
---|
| 179 | <pre><tt> |
---|
| 180 | raw packet: |
---|
| 181 | _______________________________ |
---|
| 182 | |________packet data____________| 255 bytes |
---|
| 183 | |
---|
| 184 | lacing values: 255, 0 |
---|
| 185 | </tt></pre> |
---|
| 186 | |
---|
| 187 | <p>Note also that a 'nil' (zero length) packet is not an error; it |
---|
| 188 | consists of nothing more than a lacing value of zero in the header.</p> |
---|
| 189 | |
---|
| 190 | <h3>Packets spanning pages</h3> |
---|
| 191 | |
---|
| 192 | <p>Packets are not restricted to beginning and ending within a page, |
---|
| 193 | although individual segments are, by definition, required to do so. |
---|
| 194 | Packets are not restricted to a maximum size, although excessively |
---|
| 195 | large packets in the data stream are discouraged; the Ogg |
---|
| 196 | bitstream specification strongly recommends nominal page size of |
---|
| 197 | approximately 4-8kB (large packets are foreseen as being useful for |
---|
| 198 | initialization data at the beginning of a logical bitstream).</p> |
---|
| 199 | |
---|
| 200 | <p>After segmenting a packet, the encoder may decide not to place all the |
---|
| 201 | resulting segments into the current page; to do so, the encoder places |
---|
| 202 | the lacing values of the segments it wishes to belong to the current |
---|
| 203 | page into the current segment table, then finishes the page. The next |
---|
| 204 | page is begun with the first value in the segment table belonging to |
---|
| 205 | the next packet segment, thus continuing the packet (data in the |
---|
| 206 | packet body must also correspond properly to the lacing values in the |
---|
| 207 | spanned pages. The segment data in the first packet corresponding to |
---|
| 208 | the lacing values of the first page belong in that page; packet |
---|
| 209 | segments listed in the segment table of the following page must begin |
---|
| 210 | the page body of the subsequent page).</p> |
---|
| 211 | |
---|
| 212 | <p>The last mechanic to spanning a page boundary is to set the header |
---|
| 213 | flag in the new page to indicate that the first lacing value in the |
---|
| 214 | segment table continues rather than begins a packet; a header flag of |
---|
| 215 | 0x01 is set to indicate a continued packet. Although mandatory, it |
---|
| 216 | is not actually algorithmically necessary; one could inspect the |
---|
| 217 | preceding segment table to determine if the packet is new or |
---|
| 218 | continued. Adding the information to the packet_header flag allows a |
---|
| 219 | simpler design (with no overhead) that needs only inspect the current |
---|
| 220 | page header after frame capture. This also allows faster error |
---|
| 221 | recovery in the event that the packet originates in a corrupt |
---|
| 222 | preceding page, implying that the previous page's segment table |
---|
| 223 | cannot be trusted.</p> |
---|
| 224 | |
---|
| 225 | <p>Note that a packet can span an arbitrary number of pages; the above |
---|
| 226 | spanning process is repeated for each spanned page boundary. Also a |
---|
| 227 | 'zero termination' on a packet size that is an even multiple of 255 |
---|
| 228 | must appear even if the lacing value appears in the next page as a |
---|
| 229 | zero-length continuation of the current packet. The header flag |
---|
| 230 | should be set to 0x01 to indicate that the packet spanned, even though |
---|
| 231 | the span is a nil case as far as data is concerned.</p> |
---|
| 232 | |
---|
| 233 | <p>The encoding looks odd, but is properly optimized for speed and the |
---|
| 234 | expected case of the majority of packets being between 50 and 200 |
---|
| 235 | bytes (note that it is designed such that packets of wildly different |
---|
| 236 | sizes can be handled within the model; placing packet size |
---|
| 237 | restrictions on the encoder would have only slightly simplified design |
---|
| 238 | in page generation and increased overall encoder complexity).</p> |
---|
| 239 | |
---|
| 240 | <p>The main point behind tracking individual packets (and packet |
---|
| 241 | segments) is to allow more flexible encoding tricks that requiring |
---|
| 242 | explicit knowledge of packet size. An example is simple bandwidth |
---|
| 243 | limiting, implemented by simply truncating packets in the nominal case |
---|
| 244 | if the packet is arranged so that the least sensitive portion of the |
---|
| 245 | data comes last.</p> |
---|
| 246 | |
---|
| 247 | <h3>Page header</h3> |
---|
| 248 | |
---|
| 249 | <p>The headering mechanism is designed to avoid copying and re-assembly |
---|
| 250 | of the packet data (ie, making the packet segmentation process a |
---|
| 251 | logical one); the header can be generated directly from incoming |
---|
| 252 | packet data. The encoder buffers packet data until it finishes a |
---|
| 253 | complete page at which point it writes the header followed by the |
---|
| 254 | buffered packet segments.</p> |
---|
| 255 | |
---|
| 256 | <h4>capture_pattern</h4> |
---|
| 257 | |
---|
| 258 | <p>A header begins with a capture pattern that simplifies identifying |
---|
| 259 | pages; once the decoder has found the capture pattern it can do a more |
---|
| 260 | intensive job of verifying that it has in fact found a page boundary |
---|
| 261 | (as opposed to an inadvertent coincidence in the byte stream).</p> |
---|
| 262 | |
---|
| 263 | <pre><tt> |
---|
| 264 | byte value |
---|
| 265 | |
---|
| 266 | 0 0x4f 'O' |
---|
| 267 | 1 0x67 'g' |
---|
| 268 | 2 0x67 'g' |
---|
| 269 | 3 0x53 'S' |
---|
| 270 | </tt></pre> |
---|
| 271 | |
---|
| 272 | <h4>stream_structure_version</h4> |
---|
| 273 | |
---|
| 274 | <p>The capture pattern is followed by the stream structure revision:</p> |
---|
| 275 | |
---|
| 276 | <pre><tt> |
---|
| 277 | byte value |
---|
| 278 | |
---|
| 279 | 4 0x00 |
---|
| 280 | </tt></pre> |
---|
| 281 | |
---|
| 282 | <h4>header_type_flag</h4> |
---|
| 283 | |
---|
| 284 | <p>The header type flag identifies this page's context in the bitstream:</p> |
---|
| 285 | |
---|
| 286 | <pre><tt> |
---|
| 287 | byte value |
---|
| 288 | |
---|
| 289 | 5 bitflags: 0x01: unset = fresh packet |
---|
| 290 | set = continued packet |
---|
| 291 | 0x02: unset = not first page of logical bitstream |
---|
| 292 | set = first page of logical bitstream (bos) |
---|
| 293 | 0x04: unset = not last page of logical bitstream |
---|
| 294 | set = last page of logical bitstream (eos) |
---|
| 295 | </tt></pre> |
---|
| 296 | |
---|
| 297 | <h4>absolute granule position</h4> |
---|
| 298 | |
---|
| 299 | <p>(This is packed in the same way the rest of Ogg data is packed; LSb |
---|
| 300 | of LSB first. Note that the 'position' data specifies a 'sample' |
---|
| 301 | number (eg, in a CD quality sample is four octets, 16 bits for left |
---|
| 302 | and 16 bits for right; in video it would likely be the frame number. |
---|
| 303 | It is up to the specific codec in use to define the semantic meaning |
---|
| 304 | of the granule position value). The position specified is the total |
---|
| 305 | samples encoded after including all packets finished on this page |
---|
| 306 | (packets begun on this page but continuing on to the next page do not |
---|
| 307 | count). The rationale here is that the position specified in the |
---|
| 308 | frame header of the last page tells how long the data coded by the |
---|
| 309 | bitstream is. A truncated stream will still return the proper number |
---|
| 310 | of samples that can be decoded fully.</p> |
---|
| 311 | |
---|
| 312 | <p>A special value of '-1' (in two's complement) indicates that no packets |
---|
| 313 | finish on this page.</p> |
---|
| 314 | |
---|
| 315 | <pre><tt> |
---|
| 316 | byte value |
---|
| 317 | |
---|
| 318 | 6 0xXX LSB |
---|
| 319 | 7 0xXX |
---|
| 320 | 8 0xXX |
---|
| 321 | 9 0xXX |
---|
| 322 | 10 0xXX |
---|
| 323 | 11 0xXX |
---|
| 324 | 12 0xXX |
---|
| 325 | 13 0xXX MSB |
---|
| 326 | </tt></pre> |
---|
| 327 | |
---|
| 328 | <h4>stream serial number</h4> |
---|
| 329 | |
---|
| 330 | <p>Ogg allows for separate logical bitstreams to be mixed at page |
---|
| 331 | granularity in a physical bitstream. The most common case would be |
---|
| 332 | sequential arrangement, but it is possible to interleave pages for |
---|
| 333 | two separate bitstreams to be decoded concurrently. The serial |
---|
| 334 | number is the means by which pages physical pages are associated with |
---|
| 335 | a particular logical stream. Each logical stream must have a unique |
---|
| 336 | serial number within a physical stream:</p> |
---|
| 337 | |
---|
| 338 | <pre><tt> |
---|
| 339 | byte value |
---|
| 340 | |
---|
| 341 | 14 0xXX LSB |
---|
| 342 | 15 0xXX |
---|
| 343 | 16 0xXX |
---|
| 344 | 17 0xXX MSB |
---|
| 345 | </tt></pre> |
---|
| 346 | |
---|
| 347 | <h4>page sequence no</h4> |
---|
| 348 | |
---|
| 349 | <p>Page counter; lets us know if a page is lost (useful where packets |
---|
| 350 | span page boundaries).</p> |
---|
| 351 | |
---|
| 352 | <pre><tt> |
---|
| 353 | byte value |
---|
| 354 | |
---|
| 355 | 18 0xXX LSB |
---|
| 356 | 19 0xXX |
---|
| 357 | 20 0xXX |
---|
| 358 | 21 0xXX MSB |
---|
| 359 | </tt></pre> |
---|
| 360 | |
---|
| 361 | <h4>page checksum</h4> |
---|
| 362 | |
---|
| 363 | <p>32 bit CRC value (direct algorithm, initial val and final XOR = 0, |
---|
| 364 | generator polynomial=0x04c11db7). The value is computed over the |
---|
| 365 | entire header (with the CRC field in the header set to zero) and then |
---|
| 366 | continued over the page. The CRC field is then filled with the |
---|
| 367 | computed value.</p> |
---|
| 368 | |
---|
| 369 | <p>(A thorough discussion of CRC algorithms can be found in <a |
---|
| 370 | href="http://www.ross.net/crc/download/crc_v3.txt">"A |
---|
| 371 | Painless Guide to CRC Error Detection Algorithms"</a> by Ross |
---|
| 372 | Williams <a href="mailto:ross@ross.net">ross@ross.net</a>.)</p> |
---|
| 373 | |
---|
| 374 | <pre><tt> |
---|
| 375 | byte value |
---|
| 376 | |
---|
| 377 | 22 0xXX LSB |
---|
| 378 | 23 0xXX |
---|
| 379 | 24 0xXX |
---|
| 380 | 25 0xXX MSB |
---|
| 381 | </tt></pre> |
---|
| 382 | |
---|
| 383 | <h4>page_segments</h4> |
---|
| 384 | |
---|
| 385 | <p>The number of segment entries to appear in the segment table. The |
---|
| 386 | maximum number of 255 segments (255 bytes each) sets the maximum |
---|
| 387 | possible physical page size at 65307 bytes or just under 64kB (thus |
---|
| 388 | we know that a header corrupted so as destroy sizing/alignment |
---|
| 389 | information will not cause a runaway bitstream. We'll read in the |
---|
| 390 | page according to the corrupted size information that's guaranteed to |
---|
| 391 | be a reasonable size regardless, notice the checksum mismatch, drop |
---|
| 392 | sync and then look for recapture).</p> |
---|
| 393 | |
---|
| 394 | <pre><tt> |
---|
| 395 | byte value |
---|
| 396 | |
---|
| 397 | 26 0x00-0xff (0-255) |
---|
| 398 | </tt></pre> |
---|
| 399 | |
---|
| 400 | <h4>segment_table (containing packet lacing values)</h4> |
---|
| 401 | |
---|
| 402 | <p>The lacing values for each packet segment physically appearing in |
---|
| 403 | this page are listed in contiguous order.</p> |
---|
| 404 | |
---|
| 405 | <pre><tt> |
---|
| 406 | byte value |
---|
| 407 | |
---|
| 408 | 27 0x00-0xff (0-255) |
---|
| 409 | [...] |
---|
| 410 | n 0x00-0xff (0-255, n=page_segments+26) |
---|
| 411 | </tt></pre> |
---|
| 412 | |
---|
| 413 | <p>Total page size is calculated directly from the known header size and |
---|
| 414 | lacing values in the segment table. Packet data segments follow |
---|
| 415 | immediately after the header.</p> |
---|
| 416 | |
---|
| 417 | <p>Page headers typically impose a flat .25-.5% space overhead assuming |
---|
| 418 | nominal ~8k page sizes. The segmentation table needed for exact |
---|
| 419 | packet recovery in the streaming layer adds approximately .5-1% |
---|
| 420 | nominal assuming expected encoder behavior in the 44.1kHz, 128kbps |
---|
| 421 | stereo encodings.</p> |
---|
| 422 | |
---|
| 423 | <div id="copyright"> |
---|
| 424 | The Xiph Fish Logo is a |
---|
| 425 | trademark (™) of Xiph.Org.<br/> |
---|
| 426 | |
---|
| 427 | These pages © 1994 - 2005 Xiph.Org. All rights reserved. |
---|
| 428 | </div> |
---|
| 429 | |
---|
| 430 | </body> |
---|
| 431 | </html> |
---|