[15] | 1 | |
---|
| 2 | |
---|
| 3 | |
---|
| 4 | |
---|
| 5 | |
---|
| 6 | |
---|
| 7 | Network Working Group S. Pfeiffer |
---|
| 8 | Request for Comments: 3533 CSIRO |
---|
| 9 | Category: Informational May 2003 |
---|
| 10 | |
---|
| 11 | |
---|
| 12 | The Ogg Encapsulation Format Version 0 |
---|
| 13 | |
---|
| 14 | Status of this Memo |
---|
| 15 | |
---|
| 16 | This memo provides information for the Internet community. It does |
---|
| 17 | not specify an Internet standard of any kind. Distribution of this |
---|
| 18 | memo is unlimited. |
---|
| 19 | |
---|
| 20 | Copyright Notice |
---|
| 21 | |
---|
| 22 | Copyright (C) The Internet Society (2003). All Rights Reserved. |
---|
| 23 | |
---|
| 24 | Abstract |
---|
| 25 | |
---|
| 26 | This document describes the Ogg bitstream format version 0, which is |
---|
| 27 | a general, freely-available encapsulation format for media streams. |
---|
| 28 | It is able to encapsulate any kind and number of video and audio |
---|
| 29 | encoding formats as well as other data streams in a single bitstream. |
---|
| 30 | |
---|
| 31 | Terminology |
---|
| 32 | |
---|
| 33 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", |
---|
| 34 | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this |
---|
| 35 | document are to be interpreted as described in BCP 14, RFC 2119 [2]. |
---|
| 36 | |
---|
| 37 | Table of Contents |
---|
| 38 | |
---|
| 39 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 |
---|
| 40 | 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2 |
---|
| 41 | 3. Requirements for a generic encapsulation format . . . . . . . 3 |
---|
| 42 | 4. The Ogg bitstream format . . . . . . . . . . . . . . . . . . . 3 |
---|
| 43 | 5. The encapsulation process . . . . . . . . . . . . . . . . . . 6 |
---|
| 44 | 6. The Ogg page format . . . . . . . . . . . . . . . . . . . . . 9 |
---|
| 45 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 |
---|
| 46 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 |
---|
| 47 | A. Glossary of terms and abbreviations . . . . . . . . . . . . . 13 |
---|
| 48 | B. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 |
---|
| 49 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . 14 |
---|
| 50 | Full Copyright Statement . . . . . . . . . . . . . . . . . . . 15 |
---|
| 51 | |
---|
| 52 | |
---|
| 53 | |
---|
| 54 | |
---|
| 55 | |
---|
| 56 | |
---|
| 57 | |
---|
| 58 | Pfeiffer Informational [Page 1] |
---|
| 59 | |
---|
| 60 | RFC 3533 OGG May 2003 |
---|
| 61 | |
---|
| 62 | |
---|
| 63 | 1. Introduction |
---|
| 64 | |
---|
| 65 | The Ogg bitstream format has been developed as a part of a larger |
---|
| 66 | project aimed at creating a set of components for the coding and |
---|
| 67 | decoding of multimedia content (codecs) which are to be freely |
---|
| 68 | available and freely re-implementable, both in software and in |
---|
| 69 | hardware for the computing community at large, including the Internet |
---|
| 70 | community. It is the intention of the Ogg developers represented by |
---|
| 71 | Xiph.Org that it be usable without intellectual property concerns. |
---|
| 72 | |
---|
| 73 | This document describes the Ogg bitstream format and how to use it to |
---|
| 74 | encapsulate one or several media bitstreams created by one or several |
---|
| 75 | encoders. The Ogg transport bitstream is designed to provide |
---|
| 76 | framing, error protection and seeking structure for higher-level |
---|
| 77 | codec streams that consist of raw, unencapsulated data packets, such |
---|
| 78 | as the Vorbis audio codec or the upcoming Tarkin and Theora video |
---|
| 79 | codecs. It is capable of interleaving different binary media and |
---|
| 80 | other time-continuous data streams that are prepared by an encoder as |
---|
| 81 | a sequence of data packets. Ogg provides enough information to |
---|
| 82 | properly separate data back into such encoder created data packets at |
---|
| 83 | the original packet boundaries without relying on decoding to find |
---|
| 84 | packet boundaries. |
---|
| 85 | |
---|
| 86 | Please note that the MIME type application/ogg has been registered |
---|
| 87 | with the IANA [1]. |
---|
| 88 | |
---|
| 89 | 2. Definitions |
---|
| 90 | |
---|
| 91 | For describing the Ogg encapsulation process, a set of terms will be |
---|
| 92 | used whose meaning needs to be well understood. Therefore, some of |
---|
| 93 | the most fundamental terms are defined now before we start with the |
---|
| 94 | description of the requirements for a generic media stream |
---|
| 95 | encapsulation format, the process of encapsulation, and the concrete |
---|
| 96 | format of the Ogg bitstream. See the Appendix for a more complete |
---|
| 97 | glossary. |
---|
| 98 | |
---|
| 99 | The result of an Ogg encapsulation is called the "Physical (Ogg) |
---|
| 100 | Bitstream". It encapsulates one or several encoder-created |
---|
| 101 | bitstreams, which are called "Logical Bitstreams". A logical |
---|
| 102 | bitstream, provided to the Ogg encapsulation process, has a |
---|
| 103 | structure, i.e., it is split up into a sequence of so-called |
---|
| 104 | "Packets". The packets are created by the encoder of that logical |
---|
| 105 | bitstream and represent meaningful entities for that encoder only |
---|
| 106 | (e.g., an uncompressed stream may use video frames as packets). They |
---|
| 107 | do not contain boundary information - strung together they appear to |
---|
| 108 | be streams of random bytes with no landmarks. |
---|
| 109 | |
---|
| 110 | |
---|
| 111 | |
---|
| 112 | |
---|
| 113 | |
---|
| 114 | Pfeiffer Informational [Page 2] |
---|
| 115 | |
---|
| 116 | RFC 3533 OGG May 2003 |
---|
| 117 | |
---|
| 118 | |
---|
| 119 | Please note that the term "packet" is not used in this document to |
---|
| 120 | signify entities for transport over a network. |
---|
| 121 | |
---|
| 122 | 3. Requirements for a generic encapsulation format |
---|
| 123 | |
---|
| 124 | The design idea behind Ogg was to provide a generic, linear media |
---|
| 125 | transport format to enable both file-based storage and stream-based |
---|
| 126 | transmission of one or several interleaved media streams independent |
---|
| 127 | of the encoding format of the media data. Such an encapsulation |
---|
| 128 | format needs to provide: |
---|
| 129 | |
---|
| 130 | o framing for logical bitstreams. |
---|
| 131 | |
---|
| 132 | o interleaving of different logical bitstreams. |
---|
| 133 | |
---|
| 134 | o detection of corruption. |
---|
| 135 | |
---|
| 136 | o recapture after a parsing error. |
---|
| 137 | |
---|
| 138 | o position landmarks for direct random access of arbitrary positions |
---|
| 139 | in the bitstream. |
---|
| 140 | |
---|
| 141 | o streaming capability (i.e., no seeking is needed to build a 100% |
---|
| 142 | complete bitstream). |
---|
| 143 | |
---|
| 144 | o small overhead (i.e., use no more than approximately 1-2% of |
---|
| 145 | bitstream bandwidth for packet boundary marking, high-level |
---|
| 146 | framing, sync and seeking). |
---|
| 147 | |
---|
| 148 | o simplicity to enable fast parsing. |
---|
| 149 | |
---|
| 150 | o simple concatenation mechanism of several physical bitstreams. |
---|
| 151 | |
---|
| 152 | All of these design considerations have been taken into consideration |
---|
| 153 | for Ogg. Ogg supports framing and interleaving of logical |
---|
| 154 | bitstreams, seeking landmarks, detection of corruption, and stream |
---|
| 155 | resynchronisation after a parsing error with no more than |
---|
| 156 | approximately 1-2% overhead. It is a generic framework to perform |
---|
| 157 | encapsulation of time-continuous bitstreams. It does not know any |
---|
| 158 | specifics about the codec data that it encapsulates and is thus |
---|
| 159 | independent of any media codec. |
---|
| 160 | |
---|
| 161 | 4. The Ogg bitstream format |
---|
| 162 | |
---|
| 163 | A physical Ogg bitstream consists of multiple logical bitstreams |
---|
| 164 | interleaved in so-called "Pages". Whole pages are taken in order |
---|
| 165 | from multiple logical bitstreams multiplexed at the page level. The |
---|
| 166 | logical bitstreams are identified by a unique serial number in the |
---|
| 167 | |
---|
| 168 | |
---|
| 169 | |
---|
| 170 | Pfeiffer Informational [Page 3] |
---|
| 171 | |
---|
| 172 | RFC 3533 OGG May 2003 |
---|
| 173 | |
---|
| 174 | |
---|
| 175 | header of each page of the physical bitstream. This unique serial |
---|
| 176 | number is created randomly and does not have any connection to the |
---|
| 177 | content or encoder of the logical bitstream it represents. Pages of |
---|
| 178 | all logical bitstreams are concurrently interleaved, but they need |
---|
| 179 | not be in a regular order - they are only required to be consecutive |
---|
| 180 | within the logical bitstream. Ogg demultiplexing reconstructs the |
---|
| 181 | original logical bitstreams from the physical bitstream by taking the |
---|
| 182 | pages in order from the physical bitstream and redirecting them into |
---|
| 183 | the appropriate logical decoding entity. |
---|
| 184 | |
---|
| 185 | Each Ogg page contains only one type of data as it belongs to one |
---|
| 186 | logical bitstream only. Pages are of variable size and have a page |
---|
| 187 | header containing encapsulation and error recovery information. Each |
---|
| 188 | logical bitstream in a physical Ogg bitstream starts with a special |
---|
| 189 | start page (bos=beginning of stream) and ends with a special page |
---|
| 190 | (eos=end of stream). |
---|
| 191 | |
---|
| 192 | The bos page contains information to uniquely identify the codec type |
---|
| 193 | and MAY contain information to set up the decoding process. The bos |
---|
| 194 | page SHOULD also contain information about the encoded media - for |
---|
| 195 | example, for audio, it should contain the sample rate and number of |
---|
| 196 | channels. By convention, the first bytes of the bos page contain |
---|
| 197 | magic data that uniquely identifies the required codec. It is the |
---|
| 198 | responsibility of anyone fielding a new codec to make sure it is |
---|
| 199 | possible to reliably distinguish his/her codec from all other codecs |
---|
| 200 | in use. There is no fixed way to detect the end of the codec- |
---|
| 201 | identifying marker. The format of the bos page is dependent on the |
---|
| 202 | codec and therefore MUST be given in the encapsulation specification |
---|
| 203 | of that logical bitstream type. Ogg also allows but does not require |
---|
| 204 | secondary header packets after the bos page for logical bitstreams |
---|
| 205 | and these must also precede any data packets in any logical |
---|
| 206 | bitstream. These subsequent header packets are framed into an |
---|
| 207 | integral number of pages, which will not contain any data packets. |
---|
| 208 | So, a physical bitstream begins with the bos pages of all logical |
---|
| 209 | bitstreams containing one initial header packet per page, followed by |
---|
| 210 | the subsidiary header packets of all streams, followed by pages |
---|
| 211 | containing data packets. |
---|
| 212 | |
---|
| 213 | The encapsulation specification for one or more logical bitstreams is |
---|
| 214 | called a "media mapping". An example for a media mapping is "Ogg |
---|
| 215 | Vorbis", which uses the Ogg framework to encapsulate Vorbis-encoded |
---|
| 216 | audio data for stream-based storage (such as files) and transport |
---|
| 217 | (such as TCP streams or pipes). Ogg Vorbis provides the name and |
---|
| 218 | revision of the Vorbis codec, the audio rate and the audio quality on |
---|
| 219 | the Ogg Vorbis bos page. It also uses two additional header pages |
---|
| 220 | per logical bitstream. The Ogg Vorbis bos page starts with the byte |
---|
| 221 | 0x01, followed by "vorbis" (a total of 7 bytes of identifier). |
---|
| 222 | |
---|
| 223 | |
---|
| 224 | |
---|
| 225 | |
---|
| 226 | Pfeiffer Informational [Page 4] |
---|
| 227 | |
---|
| 228 | RFC 3533 OGG May 2003 |
---|
| 229 | |
---|
| 230 | |
---|
| 231 | Ogg knows two types of multiplexing: concurrent multiplexing (so- |
---|
| 232 | called "Grouping") and sequential multiplexing (so-called |
---|
| 233 | "Chaining"). Grouping defines how to interleave several logical |
---|
| 234 | bitstreams page-wise in the same physical bitstream. Grouping is for |
---|
| 235 | example needed for interleaving a video stream with several |
---|
| 236 | synchronised audio tracks using different codecs in different logical |
---|
| 237 | bitstreams. Chaining on the other hand, is defined to provide a |
---|
| 238 | simple mechanism to concatenate physical Ogg bitstreams, as is often |
---|
| 239 | needed for streaming applications. |
---|
| 240 | |
---|
| 241 | In grouping, all bos pages of all logical bitstreams MUST appear |
---|
| 242 | together at the beginning of the Ogg bitstream. The media mapping |
---|
| 243 | specifies the order of the initial pages. For example, the grouping |
---|
| 244 | of a specific Ogg video and Ogg audio bitstream may specify that the |
---|
| 245 | physical bitstream MUST begin with the bos page of the logical video |
---|
| 246 | bitstream, followed by the bos page of the audio bitstream. Unlike |
---|
| 247 | bos pages, eos pages for the logical bitstreams need not all occur |
---|
| 248 | contiguously. Eos pages may be 'nil' pages, that is, pages |
---|
| 249 | containing no content but simply a page header with position |
---|
| 250 | information and the eos flag set in the page header. Each grouped |
---|
| 251 | logical bitstream MUST have a unique serial number within the scope |
---|
| 252 | of the physical bitstream. |
---|
| 253 | |
---|
| 254 | In chaining, complete logical bitstreams are concatenated. The |
---|
| 255 | bitstreams do not overlap, i.e., the eos page of a given logical |
---|
| 256 | bitstream is immediately followed by the bos page of the next. Each |
---|
| 257 | chained logical bitstream MUST have a unique serial number within the |
---|
| 258 | scope of the physical bitstream. |
---|
| 259 | |
---|
| 260 | It is possible to consecutively chain groups of concurrently |
---|
| 261 | multiplexed bitstreams. The groups, when unchained, MUST stand on |
---|
| 262 | their own as a valid concurrently multiplexed bitstream. The |
---|
| 263 | following diagram shows a schematic example of such a physical |
---|
| 264 | bitstream that obeys all the rules of both grouped and chained |
---|
| 265 | multiplexed bitstreams. |
---|
| 266 | |
---|
| 267 | physical bitstream with pages of |
---|
| 268 | different logical bitstreams grouped and chained |
---|
| 269 | ------------------------------------------------------------- |
---|
| 270 | |*A*|*B*|*C*|A|A|C|B|A|B|#A#|C|...|B|C|#B#|#C#|*D*|D|...|#D#| |
---|
| 271 | ------------------------------------------------------------- |
---|
| 272 | bos bos bos eos eos eos bos eos |
---|
| 273 | |
---|
| 274 | In this example, there are two chained physical bitstreams, the first |
---|
| 275 | of which is a grouped stream of three logical bitstreams A, B, and C. |
---|
| 276 | The second physical bitstream is chained after the end of the grouped |
---|
| 277 | bitstream, which ends after the last eos page of all its grouped |
---|
| 278 | logical bitstreams. As can be seen, grouped bitstreams begin |
---|
| 279 | |
---|
| 280 | |
---|
| 281 | |
---|
| 282 | Pfeiffer Informational [Page 5] |
---|
| 283 | |
---|
| 284 | RFC 3533 OGG May 2003 |
---|
| 285 | |
---|
| 286 | |
---|
| 287 | together - all of the bos pages MUST appear before any data pages. |
---|
| 288 | It can also be seen that pages of concurrently multiplexed bitstreams |
---|
| 289 | need not conform to a regular order. And it can be seen that a |
---|
| 290 | grouped bitstream can end long before the other bitstreams in the |
---|
| 291 | group end. |
---|
| 292 | |
---|
| 293 | Ogg does not know any specifics about the codec data except that each |
---|
| 294 | logical bitstream belongs to a different codec, the data from the |
---|
| 295 | codec comes in order and has position markers (so-called "Granule |
---|
| 296 | positions"). Ogg does not have a concept of 'time': it only knows |
---|
| 297 | about sequentially increasing, unitless position markers. An |
---|
| 298 | application can only get temporal information through higher layers |
---|
| 299 | which have access to the codec APIs to assign and convert granule |
---|
| 300 | positions or time. |
---|
| 301 | |
---|
| 302 | A specific definition of a media mapping using Ogg may put further |
---|
| 303 | constraints on its specific use of the Ogg bitstream format. For |
---|
| 304 | example, a specific media mapping may require that all the eos pages |
---|
| 305 | for all grouped bitstreams need to appear in direct sequence. An |
---|
| 306 | example for a media mapping is the specification of "Ogg Vorbis". |
---|
| 307 | Another example is the upcoming "Ogg Theora" specification which |
---|
| 308 | encapsulates Theora-encoded video data and usually comes multiplexed |
---|
| 309 | with a Vorbis stream for an Ogg containing synchronised audio and |
---|
| 310 | video. As Ogg does not specify temporal relationships between the |
---|
| 311 | encapsulated concurrently multiplexed bitstreams, the temporal |
---|
| 312 | synchronisation between the audio and video stream will be specified |
---|
| 313 | in this media mapping. To enable streaming, pages from various |
---|
| 314 | logical bitstreams will typically be interleaved in chronological |
---|
| 315 | order. |
---|
| 316 | |
---|
| 317 | 5. The encapsulation process |
---|
| 318 | |
---|
| 319 | The process of multiplexing different logical bitstreams happens at |
---|
| 320 | the level of pages as described above. The bitstreams provided by |
---|
| 321 | encoders are however handed over to Ogg as so-called "Packets" with |
---|
| 322 | packet boundaries dependent on the encoding format. The process of |
---|
| 323 | encapsulating packets into pages will be described now. |
---|
| 324 | |
---|
| 325 | From Ogg's perspective, packets can be of any arbitrary size. A |
---|
| 326 | specific media mapping will define how to group or break up packets |
---|
| 327 | from a specific media encoder. As Ogg pages have a maximum size of |
---|
| 328 | about 64 kBytes, sometimes a packet has to be distributed over |
---|
| 329 | several pages. To simplify that process, Ogg divides each packet |
---|
| 330 | into 255 byte long chunks plus a final shorter chunk. These chunks |
---|
| 331 | are called "Ogg Segments". They are only a logical construct and do |
---|
| 332 | not have a header for themselves. |
---|
| 333 | |
---|
| 334 | |
---|
| 335 | |
---|
| 336 | |
---|
| 337 | |
---|
| 338 | Pfeiffer Informational [Page 6] |
---|
| 339 | |
---|
| 340 | RFC 3533 OGG May 2003 |
---|
| 341 | |
---|
| 342 | |
---|
| 343 | A group of contiguous segments is wrapped into a variable length page |
---|
| 344 | preceded by a header. A segment table in the page header tells about |
---|
| 345 | the "Lacing values" (sizes) of the segments included in the page. A |
---|
| 346 | flag in the page header tells whether a page contains a packet |
---|
| 347 | continued from a previous page. Note that a lacing value of 255 |
---|
| 348 | implies that a second lacing value follows in the packet, and a value |
---|
| 349 | of less than 255 marks the end of the packet after that many |
---|
| 350 | additional bytes. A packet of 255 bytes (or a multiple of 255 bytes) |
---|
| 351 | is terminated by a lacing value of 0. Note also that a 'nil' (zero |
---|
| 352 | length) packet is not an error; it consists of nothing more than a |
---|
| 353 | lacing value of zero in the header. |
---|
| 354 | |
---|
| 355 | The encoding is optimized for speed and the expected case of the |
---|
| 356 | majority of packets being between 50 and 200 bytes large. This is a |
---|
| 357 | design justification rather than a recommendation. This encoding |
---|
| 358 | both avoids imposing a maximum packet size as well as imposing |
---|
| 359 | minimum overhead on small packets. In contrast, e.g., simply using |
---|
| 360 | two bytes at the head of every packet and having a max packet size of |
---|
| 361 | 32 kBytes would always penalize small packets (< 255 bytes, the |
---|
| 362 | typical case) with twice the segmentation overhead. Using the lacing |
---|
| 363 | values as suggested, small packets see the minimum possible byte- |
---|
| 364 | aligned overhead (1 byte) and large packets (>512 bytes) see a fairly |
---|
| 365 | constant ~0.5% overhead on encoding space. |
---|
| 366 | |
---|
| 367 | |
---|
| 368 | |
---|
| 369 | |
---|
| 370 | |
---|
| 371 | |
---|
| 372 | |
---|
| 373 | |
---|
| 374 | |
---|
| 375 | |
---|
| 376 | |
---|
| 377 | |
---|
| 378 | |
---|
| 379 | |
---|
| 380 | |
---|
| 381 | |
---|
| 382 | |
---|
| 383 | |
---|
| 384 | |
---|
| 385 | |
---|
| 386 | |
---|
| 387 | |
---|
| 388 | |
---|
| 389 | |
---|
| 390 | |
---|
| 391 | |
---|
| 392 | |
---|
| 393 | |
---|
| 394 | Pfeiffer Informational [Page 7] |
---|
| 395 | |
---|
| 396 | RFC 3533 OGG May 2003 |
---|
| 397 | |
---|
| 398 | |
---|
| 399 | The following diagram shows a schematic example of a media mapping |
---|
| 400 | using Ogg and grouped logical bitstreams: |
---|
| 401 | |
---|
| 402 | logical bitstream with packet boundaries |
---|
| 403 | ----------------------------------------------------------------- |
---|
| 404 | > | packet_1 | packet_2 | packet_3 | < |
---|
| 405 | ----------------------------------------------------------------- |
---|
| 406 | |
---|
| 407 | |segmentation (logically only) |
---|
| 408 | v |
---|
| 409 | |
---|
| 410 | packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs) |
---|
| 411 | ------------------------------ -------------------- ------------ |
---|
| 412 | .. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 | .. |
---|
| 413 | ------------------------------ -------------------- ------------ |
---|
| 414 | |
---|
| 415 | | page encapsulation |
---|
| 416 | v |
---|
| 417 | |
---|
| 418 | page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data) |
---|
| 419 | ------------------------ ---------------- ------------------------ |
---|
| 420 | |H|------------------- | |H|----------- | |H|------------------- | |
---|
| 421 | |D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ... |
---|
| 422 | |R|------------------- | |R|----------- | |R|------------------- | |
---|
| 423 | ------------------------ ---------------- ------------------------ |
---|
| 424 | |
---|
| 425 | | |
---|
| 426 | pages of | |
---|
| 427 | other --------| | |
---|
| 428 | logical ------- |
---|
| 429 | bitstreams | MUX | |
---|
| 430 | ------- |
---|
| 431 | | |
---|
| 432 | v |
---|
| 433 | |
---|
| 434 | page_1 page_2 page_3 |
---|
| 435 | ------ ------ ------- ----- ------- |
---|
| 436 | ... || | || | || | || | || | ... |
---|
| 437 | ------ ------ ------- ----- ------- |
---|
| 438 | physical Ogg bitstream |
---|
| 439 | |
---|
| 440 | In this example we take a snapshot of the encapsulation process of |
---|
| 441 | one logical bitstream. We can see part of that bitstream's |
---|
| 442 | subdivision into packets as provided by the codec. The Ogg |
---|
| 443 | encapsulation process chops up the packets into segments. The |
---|
| 444 | packets in this example are rather large such that packet_1 is split |
---|
| 445 | into 5 segments - 4 segments with 255 bytes and a final smaller one. |
---|
| 446 | Packet_2 is split into 4 segments - 3 segments with 255 bytes and a |
---|
| 447 | |
---|
| 448 | |
---|
| 449 | |
---|
| 450 | Pfeiffer Informational [Page 8] |
---|
| 451 | |
---|
| 452 | RFC 3533 OGG May 2003 |
---|
| 453 | |
---|
| 454 | |
---|
| 455 | final very small one - and packet_3 is split into two segments. The |
---|
| 456 | encapsulation process then creates pages, which are quite small in |
---|
| 457 | this example. Page_1 consists of the first three segments of |
---|
| 458 | packet_1, page_2 contains the remaining 2 segments from packet_1, and |
---|
| 459 | page_3 contains the first three pages of packet_2. Finally, this |
---|
| 460 | logical bitstream is multiplexed into a physical Ogg bitstream with |
---|
| 461 | pages of other logical bitstreams. |
---|
| 462 | |
---|
| 463 | 6. The Ogg page format |
---|
| 464 | |
---|
| 465 | A physical Ogg bitstream consists of a sequence of concatenated |
---|
| 466 | pages. Pages are of variable size, usually 4-8 kB, maximum 65307 |
---|
| 467 | bytes. A page header contains all the information needed to |
---|
| 468 | demultiplex the logical bitstreams out of the physical bitstream and |
---|
| 469 | to perform basic error recovery and landmarks for seeking. Each page |
---|
| 470 | is a self-contained entity such that the page decode mechanism can |
---|
| 471 | recognize, verify, and handle single pages at a time without |
---|
| 472 | requiring the overall bitstream. |
---|
| 473 | |
---|
| 474 | The Ogg page header has the following format: |
---|
| 475 | |
---|
| 476 | 0 1 2 3 |
---|
| 477 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte |
---|
| 478 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 479 | | capture_pattern: Magic number for page start "OggS" | 0-3 |
---|
| 480 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 481 | | version | header_type | granule_position | 4-7 |
---|
| 482 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 483 | | | 8-11 |
---|
| 484 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 485 | | | bitstream_serial_number | 12-15 |
---|
| 486 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 487 | | | page_sequence_number | 16-19 |
---|
| 488 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 489 | | | CRC_checksum | 20-23 |
---|
| 490 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 491 | | |page_segments | segment_table | 24-27 |
---|
| 492 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 493 | | ... | 28- |
---|
| 494 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
---|
| 495 | |
---|
| 496 | The LSb (least significant bit) comes first in the Bytes. Fields |
---|
| 497 | with more than one byte length are encoded LSB (least significant |
---|
| 498 | byte) first. |
---|
| 499 | |
---|
| 500 | |
---|
| 501 | |
---|
| 502 | |
---|
| 503 | |
---|
| 504 | |
---|
| 505 | |
---|
| 506 | Pfeiffer Informational [Page 9] |
---|
| 507 | |
---|
| 508 | RFC 3533 OGG May 2003 |
---|
| 509 | |
---|
| 510 | |
---|
| 511 | The fields in the page header have the following meaning: |
---|
| 512 | |
---|
| 513 | 1. capture_pattern: a 4 Byte field that signifies the beginning of a |
---|
| 514 | page. It contains the magic numbers: |
---|
| 515 | |
---|
| 516 | 0x4f 'O' |
---|
| 517 | |
---|
| 518 | 0x67 'g' |
---|
| 519 | |
---|
| 520 | 0x67 'g' |
---|
| 521 | |
---|
| 522 | 0x53 'S' |
---|
| 523 | |
---|
| 524 | It helps a decoder to find the page boundaries and regain |
---|
| 525 | synchronisation after parsing a corrupted stream. Once the |
---|
| 526 | capture pattern is found, the decoder verifies page sync and |
---|
| 527 | integrity by computing and comparing the checksum. |
---|
| 528 | |
---|
| 529 | 2. stream_structure_version: 1 Byte signifying the version number of |
---|
| 530 | the Ogg file format used in this stream (this document specifies |
---|
| 531 | version 0). |
---|
| 532 | |
---|
| 533 | 3. header_type_flag: the bits in this 1 Byte field identify the |
---|
| 534 | specific type of this page. |
---|
| 535 | |
---|
| 536 | * bit 0x01 |
---|
| 537 | |
---|
| 538 | set: page contains data of a packet continued from the previous |
---|
| 539 | page |
---|
| 540 | |
---|
| 541 | unset: page contains a fresh packet |
---|
| 542 | |
---|
| 543 | * bit 0x02 |
---|
| 544 | |
---|
| 545 | set: this is the first page of a logical bitstream (bos) |
---|
| 546 | |
---|
| 547 | unset: this page is not a first page |
---|
| 548 | |
---|
| 549 | * bit 0x04 |
---|
| 550 | |
---|
| 551 | set: this is the last page of a logical bitstream (eos) |
---|
| 552 | |
---|
| 553 | unset: this page is not a last page |
---|
| 554 | |
---|
| 555 | 4. granule_position: an 8 Byte field containing position information. |
---|
| 556 | For example, for an audio stream, it MAY contain the total number |
---|
| 557 | of PCM samples encoded after including all frames finished on this |
---|
| 558 | page. For a video stream it MAY contain the total number of video |
---|
| 559 | |
---|
| 560 | |
---|
| 561 | |
---|
| 562 | Pfeiffer Informational [Page 10] |
---|
| 563 | |
---|
| 564 | RFC 3533 OGG May 2003 |
---|
| 565 | |
---|
| 566 | |
---|
| 567 | frames encoded after this page. This is a hint for the decoder |
---|
| 568 | and gives it some timing and position information. Its meaning is |
---|
| 569 | dependent on the codec for that logical bitstream and specified in |
---|
| 570 | a specific media mapping. A special value of -1 (in two's |
---|
| 571 | complement) indicates that no packets finish on this page. |
---|
| 572 | |
---|
| 573 | 5. bitstream_serial_number: a 4 Byte field containing the unique |
---|
| 574 | serial number by which the logical bitstream is identified. |
---|
| 575 | |
---|
| 576 | 6. page_sequence_number: a 4 Byte field containing the sequence |
---|
| 577 | number of the page so the decoder can identify page loss. This |
---|
| 578 | sequence number is increasing on each logical bitstream |
---|
| 579 | separately. |
---|
| 580 | |
---|
| 581 | 7. CRC_checksum: a 4 Byte field containing a 32 bit CRC checksum of |
---|
| 582 | the page (including header with zero CRC field and page content). |
---|
| 583 | The generator polynomial is 0x04c11db7. |
---|
| 584 | |
---|
| 585 | 8. number_page_segments: 1 Byte giving the number of segment entries |
---|
| 586 | encoded in the segment table. |
---|
| 587 | |
---|
| 588 | 9. segment_table: number_page_segments Bytes containing the lacing |
---|
| 589 | values of all segments in this page. Each Byte contains one |
---|
| 590 | lacing value. |
---|
| 591 | |
---|
| 592 | The total header size in bytes is given by: |
---|
| 593 | header_size = number_page_segments + 27 [Byte] |
---|
| 594 | |
---|
| 595 | The total page size in Bytes is given by: |
---|
| 596 | page_size = header_size + sum(lacing_values: 1..number_page_segments) |
---|
| 597 | [Byte] |
---|
| 598 | |
---|
| 599 | 7. Security Considerations |
---|
| 600 | |
---|
| 601 | The Ogg encapsulation format is a container format and only |
---|
| 602 | encapsulates content (such as Vorbis-encoded audio). It does not |
---|
| 603 | provide for any generic encryption or signing of itself or its |
---|
| 604 | contained content bitstreams. However, it encapsulates any kind of |
---|
| 605 | content bitstream as long as there is a codec for it, and is thus |
---|
| 606 | able to contain encrypted and signed content data. It is also |
---|
| 607 | possible to add an external security mechanism that encrypts or signs |
---|
| 608 | an Ogg physical bitstream and thus provides content confidentiality |
---|
| 609 | and authenticity. |
---|
| 610 | |
---|
| 611 | As Ogg encapsulates binary data, it is possible to include executable |
---|
| 612 | content in an Ogg bitstream. This can be an issue with applications |
---|
| 613 | that are implemented using the Ogg format, especially when Ogg is |
---|
| 614 | used for streaming or file transfer in a networking scenario. As |
---|
| 615 | |
---|
| 616 | |
---|
| 617 | |
---|
| 618 | Pfeiffer Informational [Page 11] |
---|
| 619 | |
---|
| 620 | RFC 3533 OGG May 2003 |
---|
| 621 | |
---|
| 622 | |
---|
| 623 | such, Ogg does not pose a threat there. However, an application |
---|
| 624 | decoding Ogg and its encapsulated content bitstreams has to ensure |
---|
| 625 | correct handling of manipulated bitstreams, of buffer overflows and |
---|
| 626 | the like. |
---|
| 627 | |
---|
| 628 | 8. References |
---|
| 629 | |
---|
| 630 | [1] Walleij, L., "The application/ogg Media Type", RFC 3534, May |
---|
| 631 | 2003. |
---|
| 632 | |
---|
| 633 | [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement |
---|
| 634 | Levels", BCP 14, RFC 2119, March 1997. |
---|
| 635 | |
---|
| 636 | |
---|
| 637 | |
---|
| 638 | |
---|
| 639 | |
---|
| 640 | |
---|
| 641 | |
---|
| 642 | |
---|
| 643 | |
---|
| 644 | |
---|
| 645 | |
---|
| 646 | |
---|
| 647 | |
---|
| 648 | |
---|
| 649 | |
---|
| 650 | |
---|
| 651 | |
---|
| 652 | |
---|
| 653 | |
---|
| 654 | |
---|
| 655 | |
---|
| 656 | |
---|
| 657 | |
---|
| 658 | |
---|
| 659 | |
---|
| 660 | |
---|
| 661 | |
---|
| 662 | |
---|
| 663 | |
---|
| 664 | |
---|
| 665 | |
---|
| 666 | |
---|
| 667 | |
---|
| 668 | |
---|
| 669 | |
---|
| 670 | |
---|
| 671 | |
---|
| 672 | |
---|
| 673 | |
---|
| 674 | Pfeiffer Informational [Page 12] |
---|
| 675 | |
---|
| 676 | RFC 3533 OGG May 2003 |
---|
| 677 | |
---|
| 678 | |
---|
| 679 | Appendix A. Glossary of terms and abbreviations |
---|
| 680 | |
---|
| 681 | bos page: The initial page (beginning of stream) of a logical |
---|
| 682 | bitstream which contains information to identify the codec type |
---|
| 683 | and other decoding-relevant information. |
---|
| 684 | |
---|
| 685 | chaining (or sequential multiplexing): Concatenation of two or more |
---|
| 686 | complete physical Ogg bitstreams. |
---|
| 687 | |
---|
| 688 | eos page: The final page (end of stream) of a logical bitstream. |
---|
| 689 | |
---|
| 690 | granule position: An increasing position number for a specific |
---|
| 691 | logical bitstream stored in the page header. Its meaning is |
---|
| 692 | dependent on the codec for that logical bitstream and specified in |
---|
| 693 | a specific media mapping. |
---|
| 694 | |
---|
| 695 | grouping (or concurrent multiplexing): Interleaving of pages of |
---|
| 696 | several logical bitstreams into one complete physical Ogg |
---|
| 697 | bitstream under the restriction that all bos pages of all grouped |
---|
| 698 | logical bitstreams MUST appear before any data pages. |
---|
| 699 | |
---|
| 700 | lacing value: An entry in the segment table of a page header |
---|
| 701 | representing the size of the related segment. |
---|
| 702 | |
---|
| 703 | logical bitstream: A sequence of bits being the result of an encoded |
---|
| 704 | media stream. |
---|
| 705 | |
---|
| 706 | media mapping: A specific use of the Ogg encapsulation format |
---|
| 707 | together with a specific (set of) codec(s). |
---|
| 708 | |
---|
| 709 | (Ogg) packet: A subpart of a logical bitstream that is created by the |
---|
| 710 | encoder for that bitstream and represents a meaningful entity for |
---|
| 711 | the encoder, but only a sequence of bits to the Ogg encapsulation. |
---|
| 712 | |
---|
| 713 | (Ogg) page: A physical bitstream consists of a sequence of Ogg pages |
---|
| 714 | containing data of one logical bitstream only. It usually |
---|
| 715 | contains a group of contiguous segments of one packet only, but |
---|
| 716 | sometimes packets are too large and need to be split over several |
---|
| 717 | pages. |
---|
| 718 | |
---|
| 719 | physical (Ogg) bitstream: The sequence of bits resulting from an Ogg |
---|
| 720 | encapsulation of one or several logical bitstreams. It consists |
---|
| 721 | of a sequence of pages from the logical bitstreams with the |
---|
| 722 | restriction that the pages of one logical bitstream MUST come in |
---|
| 723 | their correct temporal order. |
---|
| 724 | |
---|
| 725 | |
---|
| 726 | |
---|
| 727 | |
---|
| 728 | |
---|
| 729 | |
---|
| 730 | Pfeiffer Informational [Page 13] |
---|
| 731 | |
---|
| 732 | RFC 3533 OGG May 2003 |
---|
| 733 | |
---|
| 734 | |
---|
| 735 | (Ogg) segment: The Ogg encapsulation process splits each packet into |
---|
| 736 | chunks of 255 bytes plus a last fractional chunk of less than 255 |
---|
| 737 | bytes. These chunks are called segments. |
---|
| 738 | |
---|
| 739 | Appendix B. Acknowledgements |
---|
| 740 | |
---|
| 741 | The author gratefully acknowledges the work that Christopher |
---|
| 742 | Montgomery and the Xiph.Org foundation have done in defining the Ogg |
---|
| 743 | multimedia project and as part of it the open file format described |
---|
| 744 | in this document. The author hopes that providing this document to |
---|
| 745 | the Internet community will help in promoting the Ogg multimedia |
---|
| 746 | project at http://www.xiph.org/. Many thanks also for the many |
---|
| 747 | technical and typo corrections that C. Montgomery and the Ogg |
---|
| 748 | community provided as feedback to this RFC. |
---|
| 749 | |
---|
| 750 | Author's Address |
---|
| 751 | |
---|
| 752 | Silvia Pfeiffer |
---|
| 753 | CSIRO, Australia |
---|
| 754 | Locked Bag 17 |
---|
| 755 | North Ryde, NSW 2113 |
---|
| 756 | Australia |
---|
| 757 | |
---|
| 758 | Phone: +61 2 9325 3141 |
---|
| 759 | EMail: Silvia.Pfeiffer@csiro.au |
---|
| 760 | URI: http://www.cmis.csiro.au/Silvia.Pfeiffer/ |
---|
| 761 | |
---|
| 762 | |
---|
| 763 | |
---|
| 764 | |
---|
| 765 | |
---|
| 766 | |
---|
| 767 | |
---|
| 768 | |
---|
| 769 | |
---|
| 770 | |
---|
| 771 | |
---|
| 772 | |
---|
| 773 | |
---|
| 774 | |
---|
| 775 | |
---|
| 776 | |
---|
| 777 | |
---|
| 778 | |
---|
| 779 | |
---|
| 780 | |
---|
| 781 | |
---|
| 782 | |
---|
| 783 | |
---|
| 784 | |
---|
| 785 | |
---|
| 786 | Pfeiffer Informational [Page 14] |
---|
| 787 | |
---|
| 788 | RFC 3533 OGG May 2003 |
---|
| 789 | |
---|
| 790 | |
---|
| 791 | Full Copyright Statement |
---|
| 792 | |
---|
| 793 | Copyright (C) The Internet Society (2003). All Rights Reserved. |
---|
| 794 | |
---|
| 795 | This document and translations of it may be copied and furnished to |
---|
| 796 | others, and derivative works that comment on or otherwise explain it |
---|
| 797 | or assist in its implementation may be prepared, copied, published |
---|
| 798 | and distributed, in whole or in part, without restriction of any |
---|
| 799 | kind, provided that the above copyright notice and this paragraph are |
---|
| 800 | included on all such copies and derivative works. However, this |
---|
| 801 | document itself may not be modified in any way, such as by removing |
---|
| 802 | the copyright notice or references to the Internet Society or other |
---|
| 803 | Internet organizations, except as needed for the purpose of |
---|
| 804 | developing Internet standards in which case the procedures for |
---|
| 805 | copyrights defined in the Internet Standards process must be |
---|
| 806 | followed, or as required to translate it into languages other than |
---|
| 807 | English. |
---|
| 808 | |
---|
| 809 | The limited permissions granted above are perpetual and will not be |
---|
| 810 | revoked by the Internet Society or its successors or assigns. |
---|
| 811 | |
---|
| 812 | This document and the information contained herein is provided on an |
---|
| 813 | "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING |
---|
| 814 | TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING |
---|
| 815 | BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION |
---|
| 816 | HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF |
---|
| 817 | MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. |
---|
| 818 | |
---|
| 819 | Acknowledgement |
---|
| 820 | |
---|
| 821 | Funding for the RFC Editor function is currently provided by the |
---|
| 822 | Internet Society. |
---|
| 823 | |
---|
| 824 | |
---|
| 825 | |
---|
| 826 | |
---|
| 827 | |
---|
| 828 | |
---|
| 829 | |
---|
| 830 | |
---|
| 831 | |
---|
| 832 | |
---|
| 833 | |
---|
| 834 | |
---|
| 835 | |
---|
| 836 | |
---|
| 837 | |
---|
| 838 | |
---|
| 839 | |
---|
| 840 | |
---|
| 841 | |
---|
| 842 | Pfeiffer Informational [Page 15] |
---|
| 843 | |
---|