Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_33_1/libs/numeric/interval/doc/rounding.htm @ 12

Last change on this file since 12 was 12, checked in by landauf, 17 years ago

added boost

File size: 25.5 KB
Line 
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2    "http://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6  <link rel="stylesheet" type="text/css" href="../../../../boost.css">
7  <title>Rounding Policies</title>
8</head>
9
10<body lang="en">
11<h1>Rounding Policies</h1>
12
13<p>In order to be as general as possible, the library uses a class to compute
14all the necessary functions rounded upward or downward. This class is the
15first parameter of <code>policies</code>, it is also the type named
16<code>rounding</code> in the policy definition of <code>interval</code>.</p>
17
18<p>By default, it is <code>interval_lib::rounded_math&lt;T&gt;</code>. The
19class <code>interval_lib::rounded_math</code> is already specialized for the
20standard floating types (<code>float</code> , <code>double</code> and
21<code>long double</code>). So if the base type of your intervals is not one
22of these, a good solution would probably be to provide a specialization of
23this class. But if the default specialization of
24<code>rounded_math&lt;T&gt;</code> for <code>float</code>,
25<code>double</code>, or <code>long double</code> is not what you seek, or you
26do not want to specialize <code>interval_lib::rounded_math&lt;T&gt;</code>
27(say because you prefer to work in your own namespace) you can also define
28your own rounding policy and pass it directly to
29<code>interval_lib::policies</code>.</p>
30
31<h2>Requirements</h2>
32
33<p>Here comes what the class is supposed to provide. The domains are written
34next to their respective functions (as you can see, the functions do not have
35to worry about invalid values, but they have to handle infinite
36arguments).</p>
37<pre>/* Rounding requirements */
38struct rounding {
39  // defaut constructor, destructor
40  rounding();
41  ~rounding();
42  // mathematical operations
43  T add_down(T, T); // [-&#x221e;;+&#x221e;][-&#x221e;;+&#x221e;]
44  T add_up  (T, T); // [-&#x221e;;+&#x221e;][-&#x221e;;+&#x221e;]
45  T sub_down(T, T); // [-&#x221e;;+&#x221e;][-&#x221e;;+&#x221e;]
46  T sub_up  (T, T); // [-&#x221e;;+&#x221e;][-&#x221e;;+&#x221e;]
47  T mul_down(T, T); // [-&#x221e;;+&#x221e;][-&#x221e;;+&#x221e;]
48  T mul_up  (T, T); // [-&#x221e;;+&#x221e;][-&#x221e;;+&#x221e;]
49  T div_down(T, T); // [-&#x221e;;+&#x221e;]([-&#x221e;;+&#x221e;]-{0})
50  T div_up  (T, T); // [-&#x221e;;+&#x221e;]([-&#x221e;;+&#x221e;]-{0})
51  T sqrt_down(T);   // ]0;+&#x221e;]
52  T sqrt_up  (T);   // ]0;+&#x221e;]
53  T exp_down(T);    // [-&#x221e;;+&#x221e;]
54  T exp_up  (T);    // [-&#x221e;;+&#x221e;]
55  T log_down(T);    // ]0;+&#x221e;]
56  T log_up  (T);    // ]0;+&#x221e;]
57  T cos_down(T);    // [0;2&#x3c0;]
58  T cos_up  (T);    // [0;2&#x3c0;]
59  T tan_down(T);    // ]-&#x3c0;/2;&#x3c0;/2[
60  T tan_up  (T);    // ]-&#x3c0;/2;&#x3c0;/2[
61  T asin_down(T);   // [-1;1]
62  T asin_up  (T);   // [-1;1]
63  T acos_down(T);   // [-1;1]
64  T acos_up  (T);   // [-1;1]
65  T atan_down(T);   // [-&#x221e;;+&#x221e;]
66  T atan_up  (T);   // [-&#x221e;;+&#x221e;]
67  T sinh_down(T);   // [-&#x221e;;+&#x221e;]
68  T sinh_up  (T);   // [-&#x221e;;+&#x221e;]
69  T cosh_down(T);   // [-&#x221e;;+&#x221e;]
70  T cosh_up  (T);   // [-&#x221e;;+&#x221e;]
71  T tanh_down(T);   // [-&#x221e;;+&#x221e;]
72  T tanh_up  (T);   // [-&#x221e;;+&#x221e;]
73  T asinh_down(T);  // [-&#x221e;;+&#x221e;]
74  T asinh_up  (T);  // [-&#x221e;;+&#x221e;]
75  T acosh_down(T);  // [1;+&#x221e;]
76  T acosh_up  (T);  // [1;+&#x221e;]
77  T atanh_down(T);  // [-1;1]
78  T atanh_up  (T);  // [-1;1]
79  T median(T, T);   // [-&#x221e;;+&#x221e;][-&#x221e;;+&#x221e;]
80  T int_down(T);    // [-&#x221e;;+&#x221e;]
81  T int_up  (T);    // [-&#x221e;;+&#x221e;]
82  // conversion functions
83  T conv_down(U);
84  T conv_up  (U);
85  // unprotected rounding class
86  typedef ... unprotected_rounding;
87};</pre>
88
89<p>The constructor and destructor of the rounding class have a very important
90semantic requirement: they are responsible for setting and resetting the
91rounding modes of the computation on T. For instance, if T is a standard
92floating point type and floating point computation is performed according to
93the Standard IEEE 754, the constructor can save the current rounding state,
94each <code>_up</code> (resp. <code>_down</code>) function will round up
95(resp. down), and the destructor will restore the saved rounding state.
96Indeed this is the behavior of the default rounding policy.</p>
97
98<p>The meaning of all the mathematical functions up until
99<code>atanh_up</code> is clear: each function returns number representable in
100the type <code>T</code> which is a lower bound (for<code> _down</code>) or
101upper bound (for <code>_up</code>) on the true mathematical result of the
102corresponding function. The function <code>median</code> computes the average
103of its two arguments rounded to its nearest representable number. The
104functions <code>int_down</code> and <code>int_up</code> compute the nearest
105integer smaller or bigger than their argument. Finally,
106<code>conv_down</code> and <code>conv_up</code> are responsible of the
107conversions of values of other types to the base number type: the first one
108must round down the value and the second one must round it up.</p>
109
110<p>The type <code>unprotected_rounding</code> allows to remove all controls.
111For reasons why one might to do this, see the <a
112href="#Protection">protection</a> paragraph below.</p>
113
114<h2>Overview of the provided classes</h2>
115
116<p>A lot of classes are provided. The classes are organized by level. At the
117bottom is the class <code>rounding_control</code>. At the next level come
118<code>rounded_arith_exact</code>, <code>rounded_arith_std</code> and
119<code>rounded_arith_opp</code>. Then there are
120<code>rounded_transc_dummy</code>, <code>rounded_transc_exact</code>,
121<code>rounded_transc_std</code> and <code>rounded_transc_opp</code>. And
122finally are <code>save_state</code> and <code>save_state_nothing</code>. Each
123of these classes provide a set of members that are required by the classes of
124the next level. For example, a <code>rounded_transc_...</code> class needs
125the members of a <code>rounded_arith_...</code> class.</p>
126
127<p>When they exist in two versions <code>_std</code> and <code>_opp</code>,
128the first one does switch the rounding mode each time, and the second one
129tries to keep it oriented toward plus infinity. The main purpose of the
130<code>_opp</code> version is to speed up the computations through the use of
131the "opposite trick" (see the <a href="#perf">performance notes</a>). This
132version requires the rounding mode to be upward before entering any
133computation functions of the class. It guarantees that the rounding mode will
134still be upward at the exit of the functions.</p>
135
136<p>Please note that it is really a very bad idea to mix the <code>_opp</code>
137version with the <code>_std</code> since they do not have compatible
138properties.</p>
139
140<p>There is a third version named <code>_exact</code> which computes the
141functions without changing the rounding mode. It is an "exact" version
142because it is intended for a base type that produces exact results.</p>
143
144<p>The last version is the <code>_dummy</code> version. It does not do any
145computations but still produces compatible results.</p>
146
147<p>Please note that it is possible to use the "exact" version for an inexact
148base type, e.g. <code>float</code> or <code>double</code>. In that case, the
149inclusion property is no longer guaranteed, but this can be useful to speed
150up the computation when the inclusion property is not desired strictly. For
151instance, in computer graphics, a small error due to floating-point roundoff
152is acceptable as long as an approximate version of the inclusion property
153holds.</p>
154
155<p>Here comes what each class defines. Later, when they will be described
156more thoroughly, these members will not be repeated. Please come back here in
157order to see them. Inheritance is also used to avoid repetitions.</p>
158<pre>template &lt;class T&gt;
159struct rounding_control
160{
161  typedef ... rounding_mode;
162  void set_rounding_mode(rounding_mode);
163  void get_rounding_mode(rounding_mode&amp;);
164  void downward ();
165  void upward   ();
166  void to_nearest();
167  T to_int(T);
168  T force_rounding(T);
169};
170
171template &lt;class T, class Rounding&gt;
172struct rounded_arith_... : Rounding
173{
174  void init();
175  T add_down(T, T);
176  T add_up  (T, T);
177  T sub_down(T, T);
178  T sub_up  (T, T);
179  T mul_down(T, T);
180  T mul_up  (T, T);
181  T div_down(T, T);
182  T div_up  (T, T);
183  T sqrt_down(T);
184  T sqrt_up  (T);
185  T median(T, T);
186  T int_down(T);
187  T int_up  (T);
188};
189
190template &lt;class T, class Rounding&gt;
191struct rounded_transc_... : Rounding
192{
193  T exp_down(T);
194  T exp_up  (T);
195  T log_down(T);
196  T log_up  (T);
197  T cos_down(T);
198  T cos_up  (T);
199  T tan_down(T);
200  T tan_up  (T);
201  T asin_down(T);
202  T asin_up  (T);
203  T acos_down(T);
204  T acos_up  (T);
205  T atan_down(T);
206  T atan_up  (T);
207  T sinh_down(T);
208  T sinh_up  (T);
209  T cosh_down(T);
210  T cosh_up  (T);
211  T tanh_down(T);
212  T tanh_up  (T);
213  T asinh_down(T);
214  T asinh_up  (T);
215  T acosh_down(T);
216  T acosh_up  (T);
217  T atanh_down(T);
218  T atanh_up  (T);
219};
220
221template &lt;class Rounding&gt;
222struct save_state_... : Rounding
223{
224  save_state_...();
225  ~save_state_...();
226  typedef ... unprotected_rounding;
227};</pre>
228
229<h2>Synopsis.</h2>
230<pre>namespace boost {
231namespace numeric {
232namespace interval_lib {
233
234<span style="color: #FF0000">/* basic rounding control */</span>
235template &lt;class T&gt;  struct rounding_control;
236
237<span style="color: #FF0000">/* arithmetic functions rounding */</span>
238template &lt;class T, class Rounding = rounding_control&lt;T&gt; &gt; struct rounded_arith_exact;
239template &lt;class T, class Rounding = rounding_control&lt;T&gt; &gt; struct rounded_arith_std;
240template &lt;class T, class Rounding = rounding_control&lt;T&gt; &gt; struct rounded_arith_opp;
241
242<span style="color: #FF0000">/* transcendental functions rounding */</span>
243template &lt;class T, class Rounding&gt; struct rounded_transc_dummy;
244template &lt;class T, class Rounding = rounded_arith_exact&lt;T&gt; &gt; struct rounded_transc_exact;
245template &lt;class T, class Rounding = rounded_arith_std&lt;T&gt; &gt; struct rounded_transc_std;
246template &lt;class T, class Rounding = rounded_arith_opp&lt;T&gt; &gt; struct rounded_transc_opp;
247
248<span style="color: #FF0000">/* rounding-state-saving classes */</span>
249template &lt;class Rounding&gt; struct save_state;
250template &lt;class Rounding&gt; struct save_state_nothing;
251
252<span style="color: #FF0000">/* default policy for type T */</span>
253template &lt;class T&gt;  struct rounded_math;
254template &lt;&gt;  struct rounded_math&lt;float&gt;;
255template &lt;&gt;  struct rounded_math&lt;double&gt;;
256
257<span style="color: #FF0000">/* some metaprogramming to convert a protected to unprotected rounding */</span>
258template &lt;class I&gt; struct unprotect;
259
260} // namespace interval_lib
261} // namespace numeric
262} // namespace boost</pre>
263
264<h2>Description of the provided classes</h2>
265
266<p>We now describe each class in the order they appear in the definition of a
267rounding policy (this outermost-to-innermost order is the reverse order from
268the synopsis).</p>
269
270<h3 id="Protection">Protection control</h3>
271
272<p>Protection refers to the fact that the interval operations will be
273surrounded by rounding mode controls. Unprotecting a class means to remove
274all the rounding controls. Each rounding policy provides a type
275<code>unprotected_rounding</code>. The required type
276<code>unprotected_rounding</code> gives another rounding class that enables
277to work when nested inside rounding. For example, the first three lines below
278should all produce the same result (because the first operation is the
279rounding constructor, and the last is its destructor, which take care of
280setting the rounding modes); and the last line is allowed to have an
281undefined behavior (since no rounding constructor or destructor is ever
282called).</p>
283<pre>T c; { rounding rnd; c = rnd.add_down(a, b); }
284T c; { rounding rnd1; { rounding rnd2; c = rnd2.add_down(a, b); } }
285T c; { rounding rnd1; { rounding::unprotected_rounding rnd2; c = rnd2.add_down(a, b); } }
286T d; { rounding::unprotected_rounding rnd; d = rnd.add_down(a, b); }</pre>
287
288<p>Naturally <code>rounding::unprotected_rounding</code> may simply be
289<code>rounding</code> itself. But it can improve performance if it is a
290simplified version with empty constructor and destructor. In order to avoid
291undefined behaviors, in the library, an object of type
292<code>rounding::unprotected_rounding</code> is guaranteed to be created only
293when an object of type <code>rounding</code> is already alive. See the <a
294href="#perf">performance notes</a> for some additional details.</p>
295
296<p>The support library defines a metaprogramming class template
297<code>unprotect</code> which takes an interval type <code>I</code> and
298returns an interval type <code>unprotect&lt;I&gt;::type</code>  where the
299rounding policy has been unprotected. Some  information about the types:
300<code>interval&lt;T, interval_lib::policies&lt;Rounding, _&gt;
301&gt;::traits_type::rounding</code> <b>is</b> the same type as
302<code>Rounding</code>, and <code>unprotect&lt;interval&lt;T,
303interval_lib::policies&lt;Rounding, _&gt; &gt; &gt;::type</code> <b>is</b>
304the same type as <code>interval&lt;T,
305interval_lib::policies&lt;Rounding::unprotected, _&gt; &gt;</code>.</p>
306
307<h3>State saving</h3>
308
309<p>First comes <code>save_state</code>. This class is responsible for saving
310the current rounding mode and calling init in its constructor, and for
311restoring the saved rounding mode in its destructor. This class also defines
312the <code>unprotected_rounding</code> type.</p>
313
314<p>If the rounding mode does not require any state-saving or initialization,
315<code>save_state_nothing</code> can be used instead of
316<code>save_state</code>.</p>
317
318<h3>Transcendental functions</h3>
319
320<p>The classes <code>rounded_transc_exact</code>,
321<code>rounded_transc_std</code> and <code>rounded_transc_opp</code> expect
322the std namespace to provide the functions exp log cos tan acos asin atan
323cosh sinh tanh acosh asinh atanh. For the <code>_std</code> and
324<code>_opp</code> versions, all these functions should respect the current
325rounding mode fixed by a call to downward or upward.</p>
326
327<p><strong>Please note:</strong> Unfortunately, the latter is rarely the
328case. It is the reason why a class <code>rounded_transc_dummy</code>  is
329provided which does not depend on the functions from the std namespace. There
330is no magic, however. The functions of <code>rounded_transc_dummy</code> do
331not compute anything. They only return valid values. For example,
332<code>cos_down</code> always returns -1. In this way, we do verify the
333inclusion property for the default implementation, even if this has strictly
334no value for the user. In order to have useful values, another policy should
335be used explicitely, which will most likely lead to a violation of the
336inclusion property. In this way, we ensure that the violation is clearly
337pointed out to the user who then knows what he stands against. This class
338could have been used as the default transcendental rounding class, but it was
339decided it would be better for the compilation to fail due to missing
340declarations rather than succeed thanks to valid but unusable functions.</p>
341
342<h3>Basic arithmetic functions</h3>
343
344<p>The classes <code>rounded_arith_std</code> and
345<code>rounded_arith_opp</code> expect the operators + - * / and the function
346<code>std::sqrt</code> to respect the current rounding mode.</p>
347
348<p>The class <code>rounded_arith_exact</code> requires
349<code>std::floor</code> and <code>std::ceil</code> to be defined since it can
350not rely on <code>to_int</code>.</p>
351
352<h3>Rounding control</h3>
353
354<p>The functions defined by each of the previous classes did not need any
355explanation. For example, the behavior of <code>add_down</code> is to compute
356the sum of two numbers rounded downward. For <code>rounding_control</code>,
357the situation is a bit more complex.</p>
358
359<p>The basic function is <code>force_rounding</code> which returns its
360argument correctly rounded accordingly to the current rounding mode if it was
361not already the case. This function is necessary to handle delayed rounding.
362Indeed, depending on the way the computations are done, the intermediate
363results may be internaly stored in a more precise format and it can lead to a
364wrong rounding. So the function enforces the rounding. <a
365href="#extreg">Here</a> is an example of what happens when the rounding is
366not enforced.</p>
367
368<p>The function <code>get_rounding_mode</code> returns the current rounding
369mode, <code>set_rounding_mode</code> sets the rounding mode back to a
370previous value returned by <code>get_rounding_mode</code>.
371<code>downward</code>, <code>upward</code> and <code>to_nearest</code> sets
372the rounding mode in one of the three directions. This rounding mode should
373be global to all the functions that use the type <code>T</code>. For example,
374after a call to <code>downward</code>, <code>force_rounding(x+y)</code> is
375expected to return the sum rounded toward -&#x221e;.</p>
376
377<p>The function <code>to_int</code> computes the nearest integer accordingly
378to the current rounding mode.</p>
379
380<p>The non-specialized version of <code>rounding_control</code> does not do
381anything. The functions for the rounding mode are empty, and
382<code>to_int</code> and <code>force_rounding</code> are identity functions.
383The <code>pi_</code> constant functions return suitable integers (for
384example, <code>pi_up</code> returns <code>T(4)</code>).</p>
385
386<p>The class template <code>rounding_control</code> is specialized for
387<code>float</code>, <code>double</code> and <code>long double</code> in order
388to best use the floating point unit of the computer.</p>
389
390<h2>Template class <tt>rounded_math</tt></h2>
391
392<p>The default policy (aka <code>rounded_math&lt;T&gt;</code>) is simply
393defined as:</p>
394<pre>template &lt;class T&gt; struct rounded_math&lt;T&gt; : save_state_nothing&lt;rounded_arith_exact&lt;T&gt; &gt; {};</pre>
395
396<p>and the specializations for <code>float</code>, <code>double</code> and
397<code>long double</code> use <code>rounded_arith_opp</code>, as in:</p>
398<pre>template &lt;&gt; struct rounded_math&lt;float&gt;       : save_state&lt;rounded_arith_opp&lt;float&gt; &gt;       {};
399template &lt;&gt; struct rounded_math&lt;double&gt;      : save_state&lt;rounded_arith_opp&lt;double&gt; &gt;      {};
400template &lt;&gt; struct rounded_math&lt;long double&gt; : save_state&lt;rounded_arith_opp&lt;long double&gt; &gt; {};</pre>
401
402<h2 id="perf">Performance Issues</h2>
403
404<p>This paragraph deals mostly with the performance of the library with
405intervals using the floating-point unit (FPU) of the computer. Let's consider
406the sum of [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] as an example. The
407result is [<code>down</code>(<i>a</i>+<i>c</i>),
408<code>up</code>(<i>b</i>+<i>d</i>)], where <code>down</code> and
409<code>up</code> indicate the rounding mode needed.</p>
410
411<h3>Rounding Mode Switch</h3>
412
413<p>If the FPU is able to use a different rounding mode for each operation,
414there is no problem. For example, it's the case for the Alpha processor: each
415floating-point instruction can specify a different rounding mode. However,
416the IEEE-754 Standard does not require such a behavior. So most of the FPUs
417only provide some instructions to set the rounding mode for all subsequent
418operations. And generally, these instructions need to flush the pipeline of
419the FPU.</p>
420
421<p>In this situation, the time needed to sum [<i>a</i>,<i>b</i>] and
422[<i>c</i>,<i>d</i>] is far worse than the time needed to calculate
423<i>a</i>+<i>b</i> and <i>c</i>+<i>d</i> since the two additions cannot be
424parallelized. Consequently, the objective is to diminish the number of
425rounding mode switches.</p>
426
427<p>If this library is not used to provide exact computations, but only for
428pair arithmetic, the solution is quite simple: do not use rounding. In that
429case, doing the sum [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] will be as
430fast as computing <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i>. Everything is
431perfect.</p>
432
433<p>However, if exact computations are required, such a solution is totally
434unthinkable. So, are we penniless? No, there is still a trick available.
435Indeed, down(<i>a</i>+<i>c</i>) = -up(-<i>a</i>-<i>c</i>) if the unary minus
436is an exact operation. It is now possible to calculate the whole sum with the
437same rounding mode. Generally, the cost of the mode switching is worse than
438the cost of the sign changes.</p>
439
440<p>Let's recapitulate. Before, when doing an addition, there were three
441rounding mode switches (down, up and restore). Now, with this little trick,
442there are only two switches (up and restore). It is better, but still a
443bottleneck when many operations are nested. Indeed, the generated code for
444[<i>a</i>,<i>b</i>] + [<i>c</i>,<i>d</i>] + [<i>e</i>,<i>f</i>] will probably
445look like:</p>
446<pre>up();
447t1 = -(-a - c);
448t2 = b + d;
449restore();
450up();
451x = -(-t1 - e);
452y = t2 + f;
453restore();</pre>
454
455<p>If you think it is possible to do much better, you are right. For example,
456this is better (and probably optimal):</p>
457<pre>up();
458x = -(-a - c - e);
459y = b + d + f;
460restore();</pre>
461
462<p>Such a code will be generated by a compiler if the computations are made
463without initialization and restoration of the rounding mode. However, it
464would be far too easy if there were no drawback: because the rounding mode is
465not restored in the meantime, operations on floating-point numbers must be
466prohibited. This method can only be used if all the operations are operations
467on intervals (or operations between an interval and a floating point
468number).</p>
469
470<h4 id="perfexp">Example</h4>
471
472<p>Here is an example of the Horner scheme to compute the value of a polynom.
473The rounding mode switches are disabled for the whole computation.</p>
474<pre>// I is an interval class, the polynom is a simple array
475template&lt;class I&gt;
476I horner(const I&amp; x, const I p[], int n) {
477
478  // save and initialize the rounding mode
479  typename I::traits_type::rounding rnd;
480
481  // define the unprotected version of the interval type
482  typedef typename boost::numeric::interval_lib::unprotect&lt;I&gt;::type R;
483 
484  const R&amp; a = x;
485  R y = p[n - 1];
486  for(int i = n - 2; i &gt;= 0; i--) {
487    y = y * a + (const R&amp;)(p[i]);
488  }
489  return y;
490
491  // restore the rounding mode with the destruction of rnd
492}</pre>
493
494<p>Please note that a rounding object is especially created in order to
495compensate for the protection loss. Each interval of type I is converted in
496an interval of type R before any operations. If this conversion is not done,
497the result is still correct, but the interest of this whole optimization has
498disappeared. Whenever possible, it is good to convert to <code>const
499R&amp;</code> instead of <code>R</code>: indeed, the function could already
500be called inside an unprotection block so the types <code>R</code> and
501<code>I</code> would be the same interval, no need for a conversion.</p>
502
503<h4>Uninteresting remark</h4>
504
505<p>It was said at the beginning that the Alpha processors can use a specific
506rounding mode for each operation. However, due to the instruction format, the
507rounding toward plus infinity is not available. Only the rounding toward
508minus infinity can be used. So the trick using the change of sign becomes
509essential, but there is no need to save and restore the rounding mode on both
510sides of an operation.</p>
511
512<h3 id="extreg">Extended Registers</h3>
513
514<p>There is another problem besides the cost of the rounding mode switch.
515Some FPUs use extended registers (for example, float computations will be
516done with double registers, or double computations with long double
517registers). Consequently, many problems can arise.</p>
518
519<p>The first one is due to to the extended precision of the mantissa. The
520rounding is also done on this extended precision. And consequently, we still
521have down(<i>a</i>+<i>b</i>) = -up(-<i>a</i>-<i>b</i>) in the extended
522registers. But back to the standard precision, we now have
523down(<i>a</i>+<i>b</i>) &lt; -up(-<i>a</i>-<i>b</i>) instead of an equality.
524A solution could be not to use this method. But there still are other
525problems, with the comparisons between numbers for example.</p>
526
527<p>Naturally, there is also a problem with the extended precision of the
528exponent. To illustrate this problem, let <i>m</i> be the biggest number
529before +<i>inf</i>. If we calculate 2*[<i>m</i>,<i>m</i>], the answer should
530be [<i>m</i>,<i>inf</i>]. But due to the extended registers, the FPU will
531first store [<i>2m</i>,<i>2m</i>] and then convert it to
532[<i>inf</i>,<i>inf</i>] at the end of the calculus (when the rounding mode is
533toward +<i>inf</i>). So the answer is no more accurate.</p>
534
535<p>There is only one solution: to force the FPU to convert the extended
536values back to standard precision after each operation. Some FPUs provide an
537instruction able to do this conversion (for example the PowerPC processors).
538But for the FPUs that do not provide it (the x86 processors), the only
539solution is to write the values to memory and read them back. Such an
540operation is obviously very expensive.</p>
541
542<h2>Some Examples</h2>
543
544<p>Here come several cases:</p>
545<ul>
546  <li>if you need precise computations with the <code>float</code> or
547    <code>double</code> types, use the default
548    <code>rounded_math&lt;T&gt;</code>;</li>
549  <li>for fast wide intervals without any rounding nor precision, use
550    <code>save_state_nothing&lt;rounded_transc_exact&lt;T&gt;
551  &gt;</code>;</li>
552  <li>for an exact type (like int or rational with a little help for infinite
553    and NaN values) without support for transcendental functions, the
554    solution could be <code>save_state_nothing&lt;rounded_transc_dummy&lt;T,
555    rounded_arith_exact&lt;T&gt; &gt; &gt;</code> or directly
556    <code>save_state_nothing&lt;rounded_arith_exact&lt;T&gt; &gt;</code>;</li>
557  <li>if it is a more complex case than the previous ones, the best thing is
558    probably to directly define your own policy.</li>
559</ul>
560<hr>
561
562<p>Revised: 2004-02-17<br>
563Copyright (c) Guillaume Melquiond, Sylvain Pion, Hervé Brönnimann, 2002.
564Polytechnic University.<br>
565Copyright (c) Guillaume Melquiond, 2004. ENS Lyon.</p>
566</body>
567</html>
Note: See TracBrowser for help on using the repository browser.