Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/libvorbis-1.2.0/doc/stereo.html @ 16

Last change on this file since 16 was 16, checked in by landauf, 18 years ago
added libvorbis
File size: 16.2 KB

Rev	Line
[16]	1	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
	2	<html>
	3	<head>
	4
	5	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
	6	<title>Ogg Vorbis Documentation</title>
	7
	8	<style type="text/css">
	9	body {
	10	margin: 0 18px 0 18px;
	11	padding-bottom: 30px;
	12	font-family: Verdana, Arial, Helvetica, sans-serif;
	13	color: #333333;
	14	font-size: .8em;
	15	}
	16
	17	a {
	18	color: #3366cc;
	19	}
	20
	21	img {
	22	border: 0;
	23	}
	24
	25	#xiphlogo {
	26	margin: 30px 0 16px 0;
	27	}
	28
	29	#content p {
	30	line-height: 1.4;
	31	}
	32
	33	h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
	34	font-weight: bold;
	35	color: #ff9900;
	36	margin: 1.3em 0 8px 0;
	37	}
	38
	39	h1 {
	40	font-size: 1.3em;
	41	}
	42
	43	h2 {
	44	font-size: 1.2em;
	45	}
	46
	47	h3 {
	48	font-size: 1.1em;
	49	}
	50
	51	li {
	52	line-height: 1.4;
	53	}
	54
	55	#copyright {
	56	margin-top: 30px;
	57	line-height: 1.5em;
	58	text-align: center;
	59	font-size: .8em;
	60	color: #888888;
	61	clear: both;
	62	}
	63	</style>
	64
	65	</head>
	66
	67	<body>
	68
	69	<div id="xiphlogo">
	70	<a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
	71	</div>
	72
	73	<h1>Ogg Vorbis stereo-specific channel coupling discussion</h1>
	74
	75	<h2>Abstract</h2>
	76
	77	<p>The Vorbis audio CODEC provides a channel coupling
	78	mechanisms designed to reduce effective bitrate by both eliminating
	79	interchannel redundancy and eliminating stereo image information
	80	labeled inaudible or undesirable according to spatial psychoacoustic
	81	models. This document describes both the mechanical coupling
	82	mechanisms available within the Vorbis specification, as well as the
	83	specific stereo coupling models used by the reference
	84	<tt>libvorbis</tt> codec provided by xiph.org.</p>
	85
	86	<h2>Mechanisms</h2>
	87
	88	<p>In encoder release beta 4 and earlier, Vorbis supported multiple
	89	channel encoding, but the channels were encoded entirely separately
	90	with no cross-analysis or redundancy elimination between channels.
	91	This multichannel strategy is very similar to the mp3's <em>dual
	92	stereo</em> mode and Vorbis uses the same name for its analogous
	93	uncoupled multichannel modes.</p>
	94
	95	<p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and
	96	later implement a coupled channel strategy. Vorbis has two specific
	97	mechanisms that may be used alone or in conjunction to implement
	98	channel coupling. The first is <em>channel interleaving</em> via
	99	residue backend type 2, and the second is <em>square polar
	100	mapping</em>. These two general mechanisms are particularly well
	101	suited to coupling due to the structure of Vorbis encoding, as we'll
	102	explore below, and using both we can implement both totally
	103	<em>lossless stereo image coupling</em> [bit-for-bit decode-identical
	104	to uncoupled modes], as well as various lossy models that seek to
	105	eliminate inaudible or unimportant aspects of the stereo image in
	106	order to enhance bitrate. The exact coupling implementation is
	107	generalized to allow the encoder a great deal of flexibility in
	108	implementation of a stereo or surround model without requiring any
	109	significant complexity increase over the combinatorially simpler
	110	mid/side joint stereo of mp3 and other current audio codecs.</p>
	111
	112	<p>A particular Vorbis bitstream may apply channel coupling directly to
	113	more than a pair of channels; polar mapping is hierarchical such that
	114	polar coupling may be extrapolated to an arbitrary number of channels
	115	and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
	116	surround. However, the scope of this document restricts itself to the
	117	stereo coupling case.</p>
	118
	119	<h3>Square Polar Mapping</h3>
	120
	121	<h4>maximal correlation</h4>
	122
	123	<p>Recall that the basic structure of a a Vorbis I stream first generates
	124	from input audio a spectral 'floor' function that serves as an
	125	MDCT-domain whitening filter. This floor is meant to represent the
	126	rough envelope of the frequency spectrum, using whatever metric the
	127	encoder cares to define. This floor is subtracted from the log
	128	frequency spectrum, effectively normalizing the spectrum by frequency.
	129	Each input channel is associated with a unique floor function.</p>
	130
	131	<p>The basic idea behind any stereo coupling is that the left and right
	132	channels usually correlate. This correlation is even stronger if one
	133	first accounts for energy differences in any given frequency band
	134	across left and right; think for example of individual instruments
	135	mixed into different portions of the stereo image, or a stereo
	136	recording with a dominant feature not perfectly in the center. The
	137	floor functions, each specific to a channel, provide the perfect means
	138	of normalizing left and right energies across the spectrum to maximize
	139	correlation before coupling. This feature of the Vorbis format is not
	140	a convenient accident.</p>
	141
	142	<p>Because we strive to maximally correlate the left and right channels
	143	and generally succeed in doing so, left and right residue is typically
	144	nearly identical. We could use channel interleaving (discussed below)
	145	alone to efficiently remove the redundancy between the left and right
	146	channels as a side effect of entropy encoding, but a polar
	147	representation gives benefits when left/right correlation is
	148	strong.</p>
	149
	150	<h4>point and diffuse imaging</h4>
	151
	152	<p>The first advantage of a polar representation is that it effectively
	153	separates the spatial audio information into a 'point image'
	154	(magnitude) at a given frequency and located somewhere in the sound
	155	field, and a 'diffuse image' (angle) that fills a large amount of
	156	space simultaneously. Even if we preserve only the magnitude (point)
	157	data, a detailed and carefully chosen floor function in each channel
	158	provides us with a free, fine-grained, frequency relative intensity
	159	stereo*. Angle information represents diffuse sound fields, such as
	160	reverberation that fills the entire space simultaneously.</p>
	161
	162	<p>*<em>Because the Vorbis model supports a number of different possible
	163	stereo models and these models may be mixed, we do not use the term
	164	'intensity stereo' talking about Vorbis; instead we use the terms
	165	'point stereo', 'phase stereo' and subcategories of each.</em></p>
	166
	167	<p>The majority of a stereo image is representable by polar magnitude
	168	alone, as strong sounds tend to be produced at near-point sources;
	169	even non-diffuse, fast, sharp echoes track very accurately using
	170	magnitude representation almost alone (for those experimenting with
	171	Vorbis tuning, this strategy works much better with the precise,
	172	piecewise control of floor 1; the continuous approximation of floor 0
	173	results in unstable imaging). Reverberation and diffuse sounds tend
	174	to contain less energy and be psychoacoustically dominated by the
	175	point sources embedded in them. Thus, we again tend to concentrate
	176	more represented energy into a predictably smaller number of numbers.
	177	Separating representation of point and diffuse imaging also allows us
	178	to model and manipulate point and diffuse qualities separately.</p>
	179
	180	<h4>controlling bit leakage and symbol crosstalk</h4>
	181
	182	<p>Because polar
	183	representation concentrates represented energy into fewer large
	184	values, we reduce bit 'leakage' during cascading (multistage VQ
	185	encoding) as a secondary benefit. A single large, monolithic VQ
	186	codebook is more efficient than a cascaded book due to entropy
	187	'crosstalk' among symbols between different stages of a multistage cascade.
	188	Polar representation is a way of further concentrating entropy into
	189	predictable locations so that codebook design can take steps to
	190	improve multistage codebook efficiency. It also allows us to cascade
	191	various elements of the stereo image independently.</p>
	192
	193	<h4>eliminating trigonometry and rounding</h4>
	194
	195	<p>Rounding and computational complexity are potential problems with a
	196	polar representation. As our encoding process involves quantization,
	197	mixing a polar representation and quantization makes it potentially
	198	impossible, depending on implementation, to construct a coupled stereo
	199	mechanism that results in bit-identical decompressed output compared
	200	to an uncoupled encoding should the encoder desire it.</p>
	201
	202	<p>Vorbis uses a mapping that preserves the most useful qualities of
	203	polar representation, relies only on addition/subtraction (during
	204	decode; high quality encoding still requires some trig), and makes it
	205	trivial before or after quantization to represent an angle/magnitude
	206	through a one-to-one mapping from possible left/right value
	207	permutations. We do this by basing our polar representation on the
	208	unit square rather than the unit-circle.</p>
	209
	210	<p>Given a magnitude and angle, we recover left and right using the
	211	following function (note that A/B may be left/right or right/left
	212	depending on the coupling definition used by the encoder):</p>
	213
	214	<pre>
	215	if(magnitude>0)
	216	if(angle>0){
	217	A=magnitude;
	218	B=magnitude-angle;
	219	}else{
	220	B=magnitude;
	221	A=magnitude+angle;
	222	}
	223	else
	224	if(angle>0){
	225	A=magnitude;
	226	B=magnitude+angle;
	227	}else{
	228	B=magnitude;
	229	A=magnitude-angle;
	230	}
	231	}
	232	</pre>
	233
	234	<p>The function is antisymmetric for positive and negative magnitudes in
	235	order to eliminate a redundant value when quantizing. For example, if
	236	we're quantizing to integer values, we can visualize a magnitude of 5
	237	and an angle of -2 as follows:</p>
	238
	239	<p><img src="squarepolar.png" alt="square polar"/></p>
	240
	241	<p>This representation loses or replicates no values; if the range of A
	242	and B are integral -5 through 5, the number of possible Cartesian
	243	permutations is 121. Represented in square polar notation, the
	244	possible values are:</p>
	245
	246	<pre>
	247	0, 0
	248
	249	-1,-2 -1,-1 -1, 0 -1, 1
	250
	251	1,-2 1,-1 1, 0 1, 1
	252
	253	-2,-4 -2,-3 -2,-2 -2,-1 -2, 0 -2, 1 -2, 2 -2, 3
	254
	255	2,-4 2,-3 ... following the pattern ...
	256
	257	... 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 5, 7 5, 8 5, 9
	258
	259	</pre>
	260
	261	<p>...for a grand total of 121 possible values, the same number as in
	262	Cartesian representation (note that, for example, <tt>5,-10</tt> is
	263	the same as <tt>-5,10</tt>, so there's no reason to represent
	264	both. 2,10 cannot happen, and there's no reason to account for it.)
	265	It's also obvious that this mapping is exactly reversible.</p>
	266
	267	<h3>Channel interleaving</h3>
	268
	269	<p>We can remap and A/B vector using polar mapping into a magnitude/angle
	270	vector, and it's clear that, in general, this concentrates energy in
	271	the magnitude vector and reduces the amount of information to encode
	272	in the angle vector. Encoding these vectors independently with
	273	residue backend #0 or residue backend #1 will result in bitrate
	274	savings. However, there are still implicit correlations between the
	275	magnitude and angle vectors. The most obvious is that the amplitude
	276	of the angle is bounded by its corresponding magnitude value.</p>
	277
	278	<p>Entropy coding the results, then, further benefits from the entropy
	279	model being able to compress magnitude and angle simultaneously. For
	280	this reason, Vorbis implements residue backend #2 which pre-interleaves
	281	a number of input vectors (in the stereo case, two, A and B) into a
	282	single output vector (with the elements in the order of
	283	A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus
	284	each vector to be coded by the vector quantization backend consists of
	285	matching magnitude and angle values.</p>
	286
	287	<p>The astute reader, at this point, will notice that in the theoretical
	288	case in which we can use monolithic codebooks of arbitrarily large
	289	size, we can directly interleave and encode left and right without
	290	polar mapping; in fact, the polar mapping does not appear to lend any
	291	benefit whatsoever to the efficiency of the entropy coding. In fact,
	292	it is perfectly possible and reasonable to build a Vorbis encoder that
	293	dispenses with polar mapping entirely and merely interleaves the
	294	channel. Libvorbis based encoders may configure such an encoding and
	295	it will work as intended.</p>
	296
	297	<p>However, when we leave the ideal/theoretical domain, we notice that
	298	polar mapping does give additional practical benefits, as discussed in
	299	the above section on polar mapping and summarized again here:</p>
	300
	301	<ul>
	302	<li>Polar mapping aids in controlling entropy 'leakage' between stages
	303	of a cascaded codebook.</li>
	304	<li>Polar mapping separates the stereo image
	305	into point and diffuse components which may be analyzed and handled
	306	differently.</li>
	307	</ul>
	308
	309	<h2>Stereo Models</h2>
	310
	311	<h3>Dual Stereo</h3>
	312
	313	<p>Dual stereo refers to stereo encoding where the channels are entirely
	314	separate; they are analyzed and encoded as entirely distinct entities.
	315	This terminology is familiar from mp3.</p>
	316
	317	<h3>Lossless Stereo</h3>
	318
	319	<p>Using polar mapping and/or channel interleaving, it's possible to
	320	couple Vorbis channels losslessly, that is, construct a stereo
	321	coupling encoding that both saves space but also decodes
	322	bit-identically to dual stereo. OggEnc 1.0 and later uses this
	323	mode in all high-bitrate encoding.</p>
	324
	325	<p>Overall, this stereo mode is overkill; however, it offers a safe
	326	alternative to users concerned about the slightest possible
	327	degradation to the stereo image or archival quality audio.</p>
	328
	329	<h3>Phase Stereo</h3>
	330
	331	<p>Phase stereo is the least aggressive means of gracefully dropping
	332	resolution from the stereo image; it affects only diffuse imaging.</p>
	333
	334	<p>It's often quoted that the human ear is deaf to signal phase above
	335	about 4kHz; this is nearly true and a passable rule of thumb, but it
	336	can be demonstrated that even an average user can tell the difference
	337	between high frequency in-phase and out-of-phase noise. Obviously
	338	then, the statement is not entirely true. However, it's also the case
	339	that one must resort to nearly such an extreme demonstration before
	340	finding the counterexample.</p>
	341
	342	<p>'Phase stereo' is simply a more aggressive quantization of the polar
	343	angle vector; above 4kHz it's generally quite safe to quantize noise
	344	and noisy elements to only a handful of allowed phases, or to thin the
	345	phase with respect to the magnitude. The phases of high amplitude
	346	pure tones may or may not be preserved more carefully (they are
	347	relatively rare and L/R tend to be in phase, so there is generally
	348	little reason not to spend a few more bits on them)</p>
	349
	350	<h4>example: eight phase stereo</h4>
	351
	352	<p>Vorbis may implement phase stereo coupling by preserving the entirety
	353	of the magnitude vector (essential to fine amplitude and energy
	354	resolution overall) and quantizing the angle vector to one of only
	355	four possible values. Given that the magnitude vector may be positive
	356	or negative, this results in left and right phase having eight
	357	possible permutation, thus 'eight phase stereo':</p>
	358
	359	<p><img src="eightphase.png" alt="eight phase"/></p>
	360
	361	<p>Left and right may be in phase (positive or negative), the most common
	362	case by far, or out of phase by 90 or 180 degrees.</p>
	363
	364	<h4>example: four phase stereo</h4>
	365
	366	<p>Similarly, four phase stereo takes the quantization one step further;
	367	it allows only in-phase and 180 degree out-out-phase signals:</p>
	368
	369	<p><img src="fourphase.png" alt="four phase"/></p>
	370
	371	<h3>example: point stereo</h3>
	372
	373	<p>Point stereo eliminates the possibility of out-of-phase signal
	374	entirely. Any diffuse quality to a sound source tends to collapse
	375	inward to a point somewhere within the stereo image. A practical
	376	example would be balanced reverberations within a large, live space;
	377	normally the sound is diffuse and soft, giving a sonic impression of
	378	volume. In point-stereo, the reverberations would still exist, but
	379	sound fairly firmly centered within the image (assuming the
	380	reverberation was centered overall; if the reverberation is stronger
	381	to the left, then the point of localization in point stereo would be
	382	to the left). This effect is most noticeable at low and mid
	383	frequencies and using headphones (which grant perfect stereo
	384	separation). Point stereo is is a graceful but generally easy to
	385	detect degradation to the sound quality and is thus used in frequency
	386	ranges where it is least noticeable.</p>
	387
	388	<h3>Mixed Stereo</h3>
	389
	390	<p>Mixed stereo is the simultaneous use of more than one of the above
	391	stereo encoding models, generally using more aggressive modes in
	392	higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p>
	393
	394	<p>It is also the case that near-DC frequencies should be encoded using
	395	lossless coupling to avoid frame blocking artifacts.</p>
	396
	397	<h3>Vorbis Stereo Modes</h3>
	398
	399	<p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes
	400	constructed out of lossless and point stereo. Phase stereo was used
	401	in the rc2 encoder, but is not currently used for simplicity's sake. It
	402	will likely be re-added to the stereo model in the future.</p>
	403
	404	<div id="copyright">
	405	The Xiph Fish Logo is a
	406	trademark (™) of Xiph.Org.<br/>
	407
	408	These pages © 1994 - 2005 Xiph.Org. All rights reserved.
	409	</div>
	410
	411	</body>
	412	</html>
	413
	414
	415
	416
	417
	418

Note: See TracBrowser for help on using the repository browser.

Download in other formats: