1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" |
---|
2 | "http://www.w3.org/TR/html4/loose.dtd"> |
---|
3 | <html> |
---|
4 | <head> |
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
6 | <link rel="stylesheet" type="text/css" href="../../../../boost.css"> |
---|
7 | <title>Rounding Policies</title> |
---|
8 | </head> |
---|
9 | |
---|
10 | <body lang="en"> |
---|
11 | <h1>Rounding Policies</h1> |
---|
12 | |
---|
13 | <p>In order to be as general as possible, the library uses a class to compute |
---|
14 | all the necessary functions rounded upward or downward. This class is the |
---|
15 | first parameter of <code>policies</code>, it is also the type named |
---|
16 | <code>rounding</code> in the policy definition of <code>interval</code>.</p> |
---|
17 | |
---|
18 | <p>By default, it is <code>interval_lib::rounded_math<T></code>. The |
---|
19 | class <code>interval_lib::rounded_math</code> is already specialized for the |
---|
20 | standard floating types (<code>float</code> , <code>double</code> and |
---|
21 | <code>long double</code>). So if the base type of your intervals is not one |
---|
22 | of these, a good solution would probably be to provide a specialization of |
---|
23 | this class. But if the default specialization of |
---|
24 | <code>rounded_math<T></code> for <code>float</code>, |
---|
25 | <code>double</code>, or <code>long double</code> is not what you seek, or you |
---|
26 | do not want to specialize <code>interval_lib::rounded_math<T></code> |
---|
27 | (say because you prefer to work in your own namespace) you can also define |
---|
28 | your own rounding policy and pass it directly to |
---|
29 | <code>interval_lib::policies</code>.</p> |
---|
30 | |
---|
31 | <h2>Requirements</h2> |
---|
32 | |
---|
33 | <p>Here comes what the class is supposed to provide. The domains are written |
---|
34 | next to their respective functions (as you can see, the functions do not have |
---|
35 | to worry about invalid values, but they have to handle infinite |
---|
36 | arguments).</p> |
---|
37 | <pre>/* Rounding requirements */ |
---|
38 | struct rounding { |
---|
39 | // defaut constructor, destructor |
---|
40 | rounding(); |
---|
41 | ~rounding(); |
---|
42 | // mathematical operations |
---|
43 | T add_down(T, T); // [-∞;+∞][-∞;+∞] |
---|
44 | T add_up (T, T); // [-∞;+∞][-∞;+∞] |
---|
45 | T sub_down(T, T); // [-∞;+∞][-∞;+∞] |
---|
46 | T sub_up (T, T); // [-∞;+∞][-∞;+∞] |
---|
47 | T mul_down(T, T); // [-∞;+∞][-∞;+∞] |
---|
48 | T mul_up (T, T); // [-∞;+∞][-∞;+∞] |
---|
49 | T div_down(T, T); // [-∞;+∞]([-∞;+∞]-{0}) |
---|
50 | T div_up (T, T); // [-∞;+∞]([-∞;+∞]-{0}) |
---|
51 | T sqrt_down(T); // ]0;+∞] |
---|
52 | T sqrt_up (T); // ]0;+∞] |
---|
53 | T exp_down(T); // [-∞;+∞] |
---|
54 | T exp_up (T); // [-∞;+∞] |
---|
55 | T log_down(T); // ]0;+∞] |
---|
56 | T log_up (T); // ]0;+∞] |
---|
57 | T cos_down(T); // [0;2π] |
---|
58 | T cos_up (T); // [0;2π] |
---|
59 | T tan_down(T); // ]-π/2;π/2[ |
---|
60 | T tan_up (T); // ]-π/2;π/2[ |
---|
61 | T asin_down(T); // [-1;1] |
---|
62 | T asin_up (T); // [-1;1] |
---|
63 | T acos_down(T); // [-1;1] |
---|
64 | T acos_up (T); // [-1;1] |
---|
65 | T atan_down(T); // [-∞;+∞] |
---|
66 | T atan_up (T); // [-∞;+∞] |
---|
67 | T sinh_down(T); // [-∞;+∞] |
---|
68 | T sinh_up (T); // [-∞;+∞] |
---|
69 | T cosh_down(T); // [-∞;+∞] |
---|
70 | T cosh_up (T); // [-∞;+∞] |
---|
71 | T tanh_down(T); // [-∞;+∞] |
---|
72 | T tanh_up (T); // [-∞;+∞] |
---|
73 | T asinh_down(T); // [-∞;+∞] |
---|
74 | T asinh_up (T); // [-∞;+∞] |
---|
75 | T acosh_down(T); // [1;+∞] |
---|
76 | T acosh_up (T); // [1;+∞] |
---|
77 | T atanh_down(T); // [-1;1] |
---|
78 | T atanh_up (T); // [-1;1] |
---|
79 | T median(T, T); // [-∞;+∞][-∞;+∞] |
---|
80 | T int_down(T); // [-∞;+∞] |
---|
81 | T int_up (T); // [-∞;+∞] |
---|
82 | // conversion functions |
---|
83 | T conv_down(U); |
---|
84 | T conv_up (U); |
---|
85 | // unprotected rounding class |
---|
86 | typedef ... unprotected_rounding; |
---|
87 | };</pre> |
---|
88 | |
---|
89 | <p>The constructor and destructor of the rounding class have a very important |
---|
90 | semantic requirement: they are responsible for setting and resetting the |
---|
91 | rounding modes of the computation on T. For instance, if T is a standard |
---|
92 | floating point type and floating point computation is performed according to |
---|
93 | the Standard IEEE 754, the constructor can save the current rounding state, |
---|
94 | each <code>_up</code> (resp. <code>_down</code>) function will round up |
---|
95 | (resp. down), and the destructor will restore the saved rounding state. |
---|
96 | Indeed this is the behavior of the default rounding policy.</p> |
---|
97 | |
---|
98 | <p>The meaning of all the mathematical functions up until |
---|
99 | <code>atanh_up</code> is clear: each function returns number representable in |
---|
100 | the type <code>T</code> which is a lower bound (for<code> _down</code>) or |
---|
101 | upper bound (for <code>_up</code>) on the true mathematical result of the |
---|
102 | corresponding function. The function <code>median</code> computes the average |
---|
103 | of its two arguments rounded to its nearest representable number. The |
---|
104 | functions <code>int_down</code> and <code>int_up</code> compute the nearest |
---|
105 | integer smaller or bigger than their argument. Finally, |
---|
106 | <code>conv_down</code> and <code>conv_up</code> are responsible of the |
---|
107 | conversions of values of other types to the base number type: the first one |
---|
108 | must round down the value and the second one must round it up.</p> |
---|
109 | |
---|
110 | <p>The type <code>unprotected_rounding</code> allows to remove all controls. |
---|
111 | For reasons why one might to do this, see the <a |
---|
112 | href="#Protection">protection</a> paragraph below.</p> |
---|
113 | |
---|
114 | <h2>Overview of the provided classes</h2> |
---|
115 | |
---|
116 | <p>A lot of classes are provided. The classes are organized by level. At the |
---|
117 | bottom is the class <code>rounding_control</code>. At the next level come |
---|
118 | <code>rounded_arith_exact</code>, <code>rounded_arith_std</code> and |
---|
119 | <code>rounded_arith_opp</code>. Then there are |
---|
120 | <code>rounded_transc_dummy</code>, <code>rounded_transc_exact</code>, |
---|
121 | <code>rounded_transc_std</code> and <code>rounded_transc_opp</code>. And |
---|
122 | finally are <code>save_state</code> and <code>save_state_nothing</code>. Each |
---|
123 | of these classes provide a set of members that are required by the classes of |
---|
124 | the next level. For example, a <code>rounded_transc_...</code> class needs |
---|
125 | the members of a <code>rounded_arith_...</code> class.</p> |
---|
126 | |
---|
127 | <p>When they exist in two versions <code>_std</code> and <code>_opp</code>, |
---|
128 | the first one does switch the rounding mode each time, and the second one |
---|
129 | tries to keep it oriented toward plus infinity. The main purpose of the |
---|
130 | <code>_opp</code> version is to speed up the computations through the use of |
---|
131 | the "opposite trick" (see the <a href="#perf">performance notes</a>). This |
---|
132 | version requires the rounding mode to be upward before entering any |
---|
133 | computation functions of the class. It guarantees that the rounding mode will |
---|
134 | still be upward at the exit of the functions.</p> |
---|
135 | |
---|
136 | <p>Please note that it is really a very bad idea to mix the <code>_opp</code> |
---|
137 | version with the <code>_std</code> since they do not have compatible |
---|
138 | properties.</p> |
---|
139 | |
---|
140 | <p>There is a third version named <code>_exact</code> which computes the |
---|
141 | functions without changing the rounding mode. It is an "exact" version |
---|
142 | because it is intended for a base type that produces exact results.</p> |
---|
143 | |
---|
144 | <p>The last version is the <code>_dummy</code> version. It does not do any |
---|
145 | computations but still produces compatible results.</p> |
---|
146 | |
---|
147 | <p>Please note that it is possible to use the "exact" version for an inexact |
---|
148 | base type, e.g. <code>float</code> or <code>double</code>. In that case, the |
---|
149 | inclusion property is no longer guaranteed, but this can be useful to speed |
---|
150 | up the computation when the inclusion property is not desired strictly. For |
---|
151 | instance, in computer graphics, a small error due to floating-point roundoff |
---|
152 | is acceptable as long as an approximate version of the inclusion property |
---|
153 | holds.</p> |
---|
154 | |
---|
155 | <p>Here comes what each class defines. Later, when they will be described |
---|
156 | more thoroughly, these members will not be repeated. Please come back here in |
---|
157 | order to see them. Inheritance is also used to avoid repetitions.</p> |
---|
158 | <pre>template <class T> |
---|
159 | struct rounding_control |
---|
160 | { |
---|
161 | typedef ... rounding_mode; |
---|
162 | void set_rounding_mode(rounding_mode); |
---|
163 | void get_rounding_mode(rounding_mode&); |
---|
164 | void downward (); |
---|
165 | void upward (); |
---|
166 | void to_nearest(); |
---|
167 | T to_int(T); |
---|
168 | T force_rounding(T); |
---|
169 | }; |
---|
170 | |
---|
171 | template <class T, class Rounding> |
---|
172 | struct rounded_arith_... : Rounding |
---|
173 | { |
---|
174 | void init(); |
---|
175 | T add_down(T, T); |
---|
176 | T add_up (T, T); |
---|
177 | T sub_down(T, T); |
---|
178 | T sub_up (T, T); |
---|
179 | T mul_down(T, T); |
---|
180 | T mul_up (T, T); |
---|
181 | T div_down(T, T); |
---|
182 | T div_up (T, T); |
---|
183 | T sqrt_down(T); |
---|
184 | T sqrt_up (T); |
---|
185 | T median(T, T); |
---|
186 | T int_down(T); |
---|
187 | T int_up (T); |
---|
188 | }; |
---|
189 | |
---|
190 | template <class T, class Rounding> |
---|
191 | struct rounded_transc_... : Rounding |
---|
192 | { |
---|
193 | T exp_down(T); |
---|
194 | T exp_up (T); |
---|
195 | T log_down(T); |
---|
196 | T log_up (T); |
---|
197 | T cos_down(T); |
---|
198 | T cos_up (T); |
---|
199 | T tan_down(T); |
---|
200 | T tan_up (T); |
---|
201 | T asin_down(T); |
---|
202 | T asin_up (T); |
---|
203 | T acos_down(T); |
---|
204 | T acos_up (T); |
---|
205 | T atan_down(T); |
---|
206 | T atan_up (T); |
---|
207 | T sinh_down(T); |
---|
208 | T sinh_up (T); |
---|
209 | T cosh_down(T); |
---|
210 | T cosh_up (T); |
---|
211 | T tanh_down(T); |
---|
212 | T tanh_up (T); |
---|
213 | T asinh_down(T); |
---|
214 | T asinh_up (T); |
---|
215 | T acosh_down(T); |
---|
216 | T acosh_up (T); |
---|
217 | T atanh_down(T); |
---|
218 | T atanh_up (T); |
---|
219 | }; |
---|
220 | |
---|
221 | template <class Rounding> |
---|
222 | struct save_state_... : Rounding |
---|
223 | { |
---|
224 | save_state_...(); |
---|
225 | ~save_state_...(); |
---|
226 | typedef ... unprotected_rounding; |
---|
227 | };</pre> |
---|
228 | |
---|
229 | <h2>Synopsis.</h2> |
---|
230 | <pre>namespace boost { |
---|
231 | namespace numeric { |
---|
232 | namespace interval_lib { |
---|
233 | |
---|
234 | <span style="color: #FF0000">/* basic rounding control */</span> |
---|
235 | template <class T> struct rounding_control; |
---|
236 | |
---|
237 | <span style="color: #FF0000">/* arithmetic functions rounding */</span> |
---|
238 | template <class T, class Rounding = rounding_control<T> > struct rounded_arith_exact; |
---|
239 | template <class T, class Rounding = rounding_control<T> > struct rounded_arith_std; |
---|
240 | template <class T, class Rounding = rounding_control<T> > struct rounded_arith_opp; |
---|
241 | |
---|
242 | <span style="color: #FF0000">/* transcendental functions rounding */</span> |
---|
243 | template <class T, class Rounding> struct rounded_transc_dummy; |
---|
244 | template <class T, class Rounding = rounded_arith_exact<T> > struct rounded_transc_exact; |
---|
245 | template <class T, class Rounding = rounded_arith_std<T> > struct rounded_transc_std; |
---|
246 | template <class T, class Rounding = rounded_arith_opp<T> > struct rounded_transc_opp; |
---|
247 | |
---|
248 | <span style="color: #FF0000">/* rounding-state-saving classes */</span> |
---|
249 | template <class Rounding> struct save_state; |
---|
250 | template <class Rounding> struct save_state_nothing; |
---|
251 | |
---|
252 | <span style="color: #FF0000">/* default policy for type T */</span> |
---|
253 | template <class T> struct rounded_math; |
---|
254 | template <> struct rounded_math<float>; |
---|
255 | template <> struct rounded_math<double>; |
---|
256 | |
---|
257 | <span style="color: #FF0000">/* some metaprogramming to convert a protected to unprotected rounding */</span> |
---|
258 | template <class I> struct unprotect; |
---|
259 | |
---|
260 | } // namespace interval_lib |
---|
261 | } // namespace numeric |
---|
262 | } // namespace boost</pre> |
---|
263 | |
---|
264 | <h2>Description of the provided classes</h2> |
---|
265 | |
---|
266 | <p>We now describe each class in the order they appear in the definition of a |
---|
267 | rounding policy (this outermost-to-innermost order is the reverse order from |
---|
268 | the synopsis).</p> |
---|
269 | |
---|
270 | <h3 id="Protection">Protection control</h3> |
---|
271 | |
---|
272 | <p>Protection refers to the fact that the interval operations will be |
---|
273 | surrounded by rounding mode controls. Unprotecting a class means to remove |
---|
274 | all the rounding controls. Each rounding policy provides a type |
---|
275 | <code>unprotected_rounding</code>. The required type |
---|
276 | <code>unprotected_rounding</code> gives another rounding class that enables |
---|
277 | to work when nested inside rounding. For example, the first three lines below |
---|
278 | should all produce the same result (because the first operation is the |
---|
279 | rounding constructor, and the last is its destructor, which take care of |
---|
280 | setting the rounding modes); and the last line is allowed to have an |
---|
281 | undefined behavior (since no rounding constructor or destructor is ever |
---|
282 | called).</p> |
---|
283 | <pre>T c; { rounding rnd; c = rnd.add_down(a, b); } |
---|
284 | T c; { rounding rnd1; { rounding rnd2; c = rnd2.add_down(a, b); } } |
---|
285 | T c; { rounding rnd1; { rounding::unprotected_rounding rnd2; c = rnd2.add_down(a, b); } } |
---|
286 | T d; { rounding::unprotected_rounding rnd; d = rnd.add_down(a, b); }</pre> |
---|
287 | |
---|
288 | <p>Naturally <code>rounding::unprotected_rounding</code> may simply be |
---|
289 | <code>rounding</code> itself. But it can improve performance if it is a |
---|
290 | simplified version with empty constructor and destructor. In order to avoid |
---|
291 | undefined behaviors, in the library, an object of type |
---|
292 | <code>rounding::unprotected_rounding</code> is guaranteed to be created only |
---|
293 | when an object of type <code>rounding</code> is already alive. See the <a |
---|
294 | href="#perf">performance notes</a> for some additional details.</p> |
---|
295 | |
---|
296 | <p>The support library defines a metaprogramming class template |
---|
297 | <code>unprotect</code> which takes an interval type <code>I</code> and |
---|
298 | returns an interval type <code>unprotect<I>::type</code> where the |
---|
299 | rounding policy has been unprotected. Some information about the types: |
---|
300 | <code>interval<T, interval_lib::policies<Rounding, _> |
---|
301 | >::traits_type::rounding</code> <b>is</b> the same type as |
---|
302 | <code>Rounding</code>, and <code>unprotect<interval<T, |
---|
303 | interval_lib::policies<Rounding, _> > >::type</code> <b>is</b> |
---|
304 | the same type as <code>interval<T, |
---|
305 | interval_lib::policies<Rounding::unprotected, _> ></code>.</p> |
---|
306 | |
---|
307 | <h3>State saving</h3> |
---|
308 | |
---|
309 | <p>First comes <code>save_state</code>. This class is responsible for saving |
---|
310 | the current rounding mode and calling init in its constructor, and for |
---|
311 | restoring the saved rounding mode in its destructor. This class also defines |
---|
312 | the <code>unprotected_rounding</code> type.</p> |
---|
313 | |
---|
314 | <p>If the rounding mode does not require any state-saving or initialization, |
---|
315 | <code>save_state_nothing</code> can be used instead of |
---|
316 | <code>save_state</code>.</p> |
---|
317 | |
---|
318 | <h3>Transcendental functions</h3> |
---|
319 | |
---|
320 | <p>The classes <code>rounded_transc_exact</code>, |
---|
321 | <code>rounded_transc_std</code> and <code>rounded_transc_opp</code> expect |
---|
322 | the std namespace to provide the functions exp log cos tan acos asin atan |
---|
323 | cosh sinh tanh acosh asinh atanh. For the <code>_std</code> and |
---|
324 | <code>_opp</code> versions, all these functions should respect the current |
---|
325 | rounding mode fixed by a call to downward or upward.</p> |
---|
326 | |
---|
327 | <p><strong>Please note:</strong> Unfortunately, the latter is rarely the |
---|
328 | case. It is the reason why a class <code>rounded_transc_dummy</code> is |
---|
329 | provided which does not depend on the functions from the std namespace. There |
---|
330 | is no magic, however. The functions of <code>rounded_transc_dummy</code> do |
---|
331 | not compute anything. They only return valid values. For example, |
---|
332 | <code>cos_down</code> always returns -1. In this way, we do verify the |
---|
333 | inclusion property for the default implementation, even if this has strictly |
---|
334 | no value for the user. In order to have useful values, another policy should |
---|
335 | be used explicitely, which will most likely lead to a violation of the |
---|
336 | inclusion property. In this way, we ensure that the violation is clearly |
---|
337 | pointed out to the user who then knows what he stands against. This class |
---|
338 | could have been used as the default transcendental rounding class, but it was |
---|
339 | decided it would be better for the compilation to fail due to missing |
---|
340 | declarations rather than succeed thanks to valid but unusable functions.</p> |
---|
341 | |
---|
342 | <h3>Basic arithmetic functions</h3> |
---|
343 | |
---|
344 | <p>The classes <code>rounded_arith_std</code> and |
---|
345 | <code>rounded_arith_opp</code> expect the operators + - * / and the function |
---|
346 | <code>std::sqrt</code> to respect the current rounding mode.</p> |
---|
347 | |
---|
348 | <p>The class <code>rounded_arith_exact</code> requires |
---|
349 | <code>std::floor</code> and <code>std::ceil</code> to be defined since it can |
---|
350 | not rely on <code>to_int</code>.</p> |
---|
351 | |
---|
352 | <h3>Rounding control</h3> |
---|
353 | |
---|
354 | <p>The functions defined by each of the previous classes did not need any |
---|
355 | explanation. For example, the behavior of <code>add_down</code> is to compute |
---|
356 | the sum of two numbers rounded downward. For <code>rounding_control</code>, |
---|
357 | the situation is a bit more complex.</p> |
---|
358 | |
---|
359 | <p>The basic function is <code>force_rounding</code> which returns its |
---|
360 | argument correctly rounded accordingly to the current rounding mode if it was |
---|
361 | not already the case. This function is necessary to handle delayed rounding. |
---|
362 | Indeed, depending on the way the computations are done, the intermediate |
---|
363 | results may be internaly stored in a more precise format and it can lead to a |
---|
364 | wrong rounding. So the function enforces the rounding. <a |
---|
365 | href="#extreg">Here</a> is an example of what happens when the rounding is |
---|
366 | not enforced.</p> |
---|
367 | |
---|
368 | <p>The function <code>get_rounding_mode</code> returns the current rounding |
---|
369 | mode, <code>set_rounding_mode</code> sets the rounding mode back to a |
---|
370 | previous value returned by <code>get_rounding_mode</code>. |
---|
371 | <code>downward</code>, <code>upward</code> and <code>to_nearest</code> sets |
---|
372 | the rounding mode in one of the three directions. This rounding mode should |
---|
373 | be global to all the functions that use the type <code>T</code>. For example, |
---|
374 | after a call to <code>downward</code>, <code>force_rounding(x+y)</code> is |
---|
375 | expected to return the sum rounded toward -∞.</p> |
---|
376 | |
---|
377 | <p>The function <code>to_int</code> computes the nearest integer accordingly |
---|
378 | to the current rounding mode.</p> |
---|
379 | |
---|
380 | <p>The non-specialized version of <code>rounding_control</code> does not do |
---|
381 | anything. The functions for the rounding mode are empty, and |
---|
382 | <code>to_int</code> and <code>force_rounding</code> are identity functions. |
---|
383 | The <code>pi_</code> constant functions return suitable integers (for |
---|
384 | example, <code>pi_up</code> returns <code>T(4)</code>).</p> |
---|
385 | |
---|
386 | <p>The class template <code>rounding_control</code> is specialized for |
---|
387 | <code>float</code>, <code>double</code> and <code>long double</code> in order |
---|
388 | to best use the floating point unit of the computer.</p> |
---|
389 | |
---|
390 | <h2>Template class <tt>rounded_math</tt></h2> |
---|
391 | |
---|
392 | <p>The default policy (aka <code>rounded_math<T></code>) is simply |
---|
393 | defined as:</p> |
---|
394 | <pre>template <class T> struct rounded_math<T> : save_state_nothing<rounded_arith_exact<T> > {};</pre> |
---|
395 | |
---|
396 | <p>and the specializations for <code>float</code>, <code>double</code> and |
---|
397 | <code>long double</code> use <code>rounded_arith_opp</code>, as in:</p> |
---|
398 | <pre>template <> struct rounded_math<float> : save_state<rounded_arith_opp<float> > {}; |
---|
399 | template <> struct rounded_math<double> : save_state<rounded_arith_opp<double> > {}; |
---|
400 | template <> struct rounded_math<long double> : save_state<rounded_arith_opp<long double> > {};</pre> |
---|
401 | |
---|
402 | <h2 id="perf">Performance Issues</h2> |
---|
403 | |
---|
404 | <p>This paragraph deals mostly with the performance of the library with |
---|
405 | intervals using the floating-point unit (FPU) of the computer. Let's consider |
---|
406 | the sum of [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] as an example. The |
---|
407 | result is [<code>down</code>(<i>a</i>+<i>c</i>), |
---|
408 | <code>up</code>(<i>b</i>+<i>d</i>)], where <code>down</code> and |
---|
409 | <code>up</code> indicate the rounding mode needed.</p> |
---|
410 | |
---|
411 | <h3>Rounding Mode Switch</h3> |
---|
412 | |
---|
413 | <p>If the FPU is able to use a different rounding mode for each operation, |
---|
414 | there is no problem. For example, it's the case for the Alpha processor: each |
---|
415 | floating-point instruction can specify a different rounding mode. However, |
---|
416 | the IEEE-754 Standard does not require such a behavior. So most of the FPUs |
---|
417 | only provide some instructions to set the rounding mode for all subsequent |
---|
418 | operations. And generally, these instructions need to flush the pipeline of |
---|
419 | the FPU.</p> |
---|
420 | |
---|
421 | <p>In this situation, the time needed to sum [<i>a</i>,<i>b</i>] and |
---|
422 | [<i>c</i>,<i>d</i>] is far worse than the time needed to calculate |
---|
423 | <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i> since the two additions cannot be |
---|
424 | parallelized. Consequently, the objective is to diminish the number of |
---|
425 | rounding mode switches.</p> |
---|
426 | |
---|
427 | <p>If this library is not used to provide exact computations, but only for |
---|
428 | pair arithmetic, the solution is quite simple: do not use rounding. In that |
---|
429 | case, doing the sum [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] will be as |
---|
430 | fast as computing <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i>. Everything is |
---|
431 | perfect.</p> |
---|
432 | |
---|
433 | <p>However, if exact computations are required, such a solution is totally |
---|
434 | unthinkable. So, are we penniless? No, there is still a trick available. |
---|
435 | Indeed, down(<i>a</i>+<i>c</i>) = -up(-<i>a</i>-<i>c</i>) if the unary minus |
---|
436 | is an exact operation. It is now possible to calculate the whole sum with the |
---|
437 | same rounding mode. Generally, the cost of the mode switching is worse than |
---|
438 | the cost of the sign changes.</p> |
---|
439 | |
---|
440 | <p>Let's recapitulate. Before, when doing an addition, there were three |
---|
441 | rounding mode switches (down, up and restore). Now, with this little trick, |
---|
442 | there are only two switches (up and restore). It is better, but still a |
---|
443 | bottleneck when many operations are nested. Indeed, the generated code for |
---|
444 | [<i>a</i>,<i>b</i>] + [<i>c</i>,<i>d</i>] + [<i>e</i>,<i>f</i>] will probably |
---|
445 | look like:</p> |
---|
446 | <pre>up(); |
---|
447 | t1 = -(-a - c); |
---|
448 | t2 = b + d; |
---|
449 | restore(); |
---|
450 | up(); |
---|
451 | x = -(-t1 - e); |
---|
452 | y = t2 + f; |
---|
453 | restore();</pre> |
---|
454 | |
---|
455 | <p>If you think it is possible to do much better, you are right. For example, |
---|
456 | this is better (and probably optimal):</p> |
---|
457 | <pre>up(); |
---|
458 | x = -(-a - c - e); |
---|
459 | y = b + d + f; |
---|
460 | restore();</pre> |
---|
461 | |
---|
462 | <p>Such a code will be generated by a compiler if the computations are made |
---|
463 | without initialization and restoration of the rounding mode. However, it |
---|
464 | would be far too easy if there were no drawback: because the rounding mode is |
---|
465 | not restored in the meantime, operations on floating-point numbers must be |
---|
466 | prohibited. This method can only be used if all the operations are operations |
---|
467 | on intervals (or operations between an interval and a floating point |
---|
468 | number).</p> |
---|
469 | |
---|
470 | <h4 id="perfexp">Example</h4> |
---|
471 | |
---|
472 | <p>Here is an example of the Horner scheme to compute the value of a polynom. |
---|
473 | The rounding mode switches are disabled for the whole computation.</p> |
---|
474 | <pre>// I is an interval class, the polynom is a simple array |
---|
475 | template<class I> |
---|
476 | I horner(const I& x, const I p[], int n) { |
---|
477 | |
---|
478 | // save and initialize the rounding mode |
---|
479 | typename I::traits_type::rounding rnd; |
---|
480 | |
---|
481 | // define the unprotected version of the interval type |
---|
482 | typedef typename boost::numeric::interval_lib::unprotect<I>::type R; |
---|
483 | |
---|
484 | const R& a = x; |
---|
485 | R y = p[n - 1]; |
---|
486 | for(int i = n - 2; i >= 0; i--) { |
---|
487 | y = y * a + (const R&)(p[i]); |
---|
488 | } |
---|
489 | return y; |
---|
490 | |
---|
491 | // restore the rounding mode with the destruction of rnd |
---|
492 | }</pre> |
---|
493 | |
---|
494 | <p>Please note that a rounding object is especially created in order to |
---|
495 | compensate for the protection loss. Each interval of type I is converted in |
---|
496 | an interval of type R before any operations. If this conversion is not done, |
---|
497 | the result is still correct, but the interest of this whole optimization has |
---|
498 | disappeared. Whenever possible, it is good to convert to <code>const |
---|
499 | R&</code> instead of <code>R</code>: indeed, the function could already |
---|
500 | be called inside an unprotection block so the types <code>R</code> and |
---|
501 | <code>I</code> would be the same interval, no need for a conversion.</p> |
---|
502 | |
---|
503 | <h4>Uninteresting remark</h4> |
---|
504 | |
---|
505 | <p>It was said at the beginning that the Alpha processors can use a specific |
---|
506 | rounding mode for each operation. However, due to the instruction format, the |
---|
507 | rounding toward plus infinity is not available. Only the rounding toward |
---|
508 | minus infinity can be used. So the trick using the change of sign becomes |
---|
509 | essential, but there is no need to save and restore the rounding mode on both |
---|
510 | sides of an operation.</p> |
---|
511 | |
---|
512 | <h3 id="extreg">Extended Registers</h3> |
---|
513 | |
---|
514 | <p>There is another problem besides the cost of the rounding mode switch. |
---|
515 | Some FPUs use extended registers (for example, float computations will be |
---|
516 | done with double registers, or double computations with long double |
---|
517 | registers). Consequently, many problems can arise.</p> |
---|
518 | |
---|
519 | <p>The first one is due to to the extended precision of the mantissa. The |
---|
520 | rounding is also done on this extended precision. And consequently, we still |
---|
521 | have down(<i>a</i>+<i>b</i>) = -up(-<i>a</i>-<i>b</i>) in the extended |
---|
522 | registers. But back to the standard precision, we now have |
---|
523 | down(<i>a</i>+<i>b</i>) < -up(-<i>a</i>-<i>b</i>) instead of an equality. |
---|
524 | A solution could be not to use this method. But there still are other |
---|
525 | problems, with the comparisons between numbers for example.</p> |
---|
526 | |
---|
527 | <p>Naturally, there is also a problem with the extended precision of the |
---|
528 | exponent. To illustrate this problem, let <i>m</i> be the biggest number |
---|
529 | before +<i>inf</i>. If we calculate 2*[<i>m</i>,<i>m</i>], the answer should |
---|
530 | be [<i>m</i>,<i>inf</i>]. But due to the extended registers, the FPU will |
---|
531 | first store [<i>2m</i>,<i>2m</i>] and then convert it to |
---|
532 | [<i>inf</i>,<i>inf</i>] at the end of the calculus (when the rounding mode is |
---|
533 | toward +<i>inf</i>). So the answer is no more accurate.</p> |
---|
534 | |
---|
535 | <p>There is only one solution: to force the FPU to convert the extended |
---|
536 | values back to standard precision after each operation. Some FPUs provide an |
---|
537 | instruction able to do this conversion (for example the PowerPC processors). |
---|
538 | But for the FPUs that do not provide it (the x86 processors), the only |
---|
539 | solution is to write the values to memory and read them back. Such an |
---|
540 | operation is obviously very expensive.</p> |
---|
541 | |
---|
542 | <h2>Some Examples</h2> |
---|
543 | |
---|
544 | <p>Here come several cases:</p> |
---|
545 | <ul> |
---|
546 | <li>if you need precise computations with the <code>float</code> or |
---|
547 | <code>double</code> types, use the default |
---|
548 | <code>rounded_math<T></code>;</li> |
---|
549 | <li>for fast wide intervals without any rounding nor precision, use |
---|
550 | <code>save_state_nothing<rounded_transc_exact<T> |
---|
551 | ></code>;</li> |
---|
552 | <li>for an exact type (like int or rational with a little help for infinite |
---|
553 | and NaN values) without support for transcendental functions, the |
---|
554 | solution could be <code>save_state_nothing<rounded_transc_dummy<T, |
---|
555 | rounded_arith_exact<T> > ></code> or directly |
---|
556 | <code>save_state_nothing<rounded_arith_exact<T> ></code>;</li> |
---|
557 | <li>if it is a more complex case than the previous ones, the best thing is |
---|
558 | probably to directly define your own policy.</li> |
---|
559 | </ul> |
---|
560 | <hr> |
---|
561 | |
---|
562 | <p>Revised: 2004-02-17<br> |
---|
563 | Copyright (c) Guillaume Melquiond, Sylvain Pion, Hervé Brönnimann, 2002. |
---|
564 | Polytechnic University.<br> |
---|
565 | Copyright (c) Guillaume Melquiond, 2004. ENS Lyon.</p> |
---|
566 | </body> |
---|
567 | </html> |
---|