Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/filesystem/doc/i18n.html @ 29

Last change on this file since 29 was 29, checked in by landauf, 17 years ago

updated boost from 1_33_1 to 1_34_1

File size: 23.7 KB
Line 
1<html>
2
3<head>
4<meta http-equiv="Content-Language" content="en-us">
5<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
6<meta name="ProgId" content="FrontPage.Editor.Document">
7<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
8<title>1.34 (Internationalization) Changes</title>
9</head>
10
11<body bgcolor="#FFFFFF">
12
13<h1>1.34 (Internationalization) Changes</h1>
14<h2>Introduction</h2>
15<p>This release is a major upgrade for the Filesystem Library, in preparation
16for submission to the C++ Standards Committee. Features of this release
17include:</p>
18<ul>
19  <li><a href="#Internationalization">Internationalization</a>, provided by
20  class templates <i>basic_path</i>, <i>basic_filesystem_error</i>, <i>
21  basic_directory_iterator</i>, and <i>basic_directory_entry</i>.<br>
22&nbsp;</li>
23  <li><a href="#Simplification">Simplification</a> of the path interface,
24  including elimination of distinction between native and generic formats,
25  and separation of name checking functionality from general path functionality.
26  Also simplification of <i>basic_filesystem_error</i>.<br>
27&nbsp;</li>
28  <li><a href="#Rationalization">Rationalization</a> of predicate function
29  design, including the addition of several new functions.<br>
30&nbsp;</li>
31  <li>Clearer specification by reference to [<a href="design.htm#POSIX-01">POSIX-01</a>],
32  the ISO/IEEE Single Unix Standard, with provisions for Windows and other
33  operating systems.<br>
34&nbsp;</li>
35  <li><a href="#Preservation">Preservation</a> of existing user code whenever
36  possible.<br>
37&nbsp;</li>
38  <li><a href="#More_efficient">More efficient operations</a> when iterating over directories.<br>
39&nbsp;</li>
40  <li>A
41  <a href="tr2_proposal.html#Class-template-basic_recursive_directory_iterator">recursive
42  directory iterator</a> is now provided. </li>
43</ul>
44<p><a href="#Rationale">Rationale</a> for some of the changes is also provided.</p>
45<h2><a name="Internationalization">Internationalization</a></h2>
46<p>Cass templates <i>basic_path</i>, <i>basic_filesystem_error</i>, and <i>
47basic_directory_iterator</i> provide the basic mechanisms for
48internationalization, in ways very similar to the C++ Standard Library's <i>
49basic_string</i> and similar class templates. The following typedefs are also
50provided:</p>
51<blockquote>
52  <pre>typedef basic_path&lt;std::string, ...&gt; path;
53typedef basic_path&lt;std::wstring, ...&gt; wpath;
54
55typedef basic_filesystem_error&lt;path&gt; filesystem_error;
56typedef basic_filesystem_error&lt;wpath&gt; wfilesystem_error;
57
58typedef basic_directory_iterator&lt;path&gt; directory_iterator;
59typedef basic_directory_iterator&lt;wpath&gt; wdirectory_iterator;</pre>
60</blockquote>
61<p>The string type used by Boost.Filesystem <i>basic_path</i> (std::string,
62std::wstring, or whatever) is called the <i>internal</i> string type. The string
63type used by the operating system for paths (often char*, sometimes wchar_t*) is
64called the <i>external</i> string type. Conversion between internal and external
65types is performed by path traits classes. The specific conversions for <i>path</i> 
66and <i>wpath</i> is implementation defined, with normative encouragement to use
67the operating system's preferred file system encoding. For many modern POSIX-based
68file systems the <i>wpath</i> external encoding is <a href="design.htm#Kuhn">
69UTF-8</a>, while for modern Windows file systems such as NTFS it is
70<a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a>.</p>
71<p>The <a href="tr2_proposal.html#Operations-functions">operational functions</a> in
72<a href="../../../boost/filesystem/operations.hpp">operations.hpp</a> are provided with overloads for
73<i>path</i>, <i>wpath</i>, and user-defined <i>basic_path</i>'s. A
74<a href="tr2_proposal.html#Requirements-on-implementations">&quot;do-the-right-thing&quot; rule</a> 
75applies to implementations, ensuring that the correct overload will be chosen.</p>
76<h2><a name="Simplification">Simplification</a> of path interface</h2>
77<p>Prior versions of the library required users of class <i>path</i> to identify
78the format (native or generic) and name error-checking policy, either via a
79second constructor argument or via a default mechanism. That approach caused
80complaints, particularly from users not needing the name checking features. The
81interface has now been simplified:</p>
82<ul>
83  <li>The distinction between native and generic formats has been eliminated.
84  See <a href="#distinction">rationale</a>. Two argument forms of path
85  constructors are now deprecated, with the second argument having no effect.
86  These constructors are only provided to ease the transition of existing code.<br>
87&nbsp;</li>
88  <li>Path name checking functionality has been moved out of class path and into
89  separate free-functions. This still provides name checking for those who need
90  it, but with much less impact on those who don't need it.</li>
91</ul>
92<p>Additionally,
93<a href="tr2_proposal.html#Class-template-basic_filesystem_error">basic_filesystem_error</a> has been put
94on a diet and generally simplified.</p>
95<p>Error codes have been simplified and aligned with [POSIX-01]. A supporting
96header <a href="../../../boost/filesystem/cerrno.hpp">
97&lt;boost/filesystem/cerrno.hpp&gt;</a> is also provided.</p>
98<p><code>&quot;//:&quot;</code> has been introduced as a path escape prefix to identify
99native paths. Rationale: simplifies basic_path constructor interfaces, easier
100use for platforms needing explicit native format identification.</p>
101<h2><a name="Rationalization">Rationalization</a> of predicate functions</h2>
102<p>In discussions and bug reports on the Boost developers mailing list, it
103became obvious that Boost.Filesystem's exists(), symbolic_link_exists(), and
104is_directory() predicate functions were poorly specified. There were suggestions
105to add an is_accessible() function, but Peter Dimov argued that this amounted to
106papering over the lack of a clear specification and would likely lead to future
107problems.</p>
108<p>Peter suggested that an interesting way to analyze the problem was to ask
109what the expectations were for true and false values of the various predicates.
110See the <a href="#table">table</a> below.</p>
111<h3>status()</h3>
112<p>As part of the predicate discussions, particularly with Rob Stewart, it
113became obvious that sometimes applications need access to raw status information
114without any possibility of an exception being thrown. The
115<a href="tr2_proposal.html#Status-functions">status()</a> function was added to meet this
116need. It also proved clearer to specify the semantics of predicate functions in
117terms of status().</p>
118<h3><a name="is_file">is_file</a>()</h3>
119<p>About the same time, Jeff Garland suggested that an
120<a href="tr2_proposal.html#Predicate-functions">is_file()</a> predicate would
121compliment <a href="tr2_proposal.html#Predicate-functions">is_directory()</a>. In working on the analysis below, it became obvious
122that the expectations for is_file() were different from the expectations for !is_directory(),
123so is_file() was added. </p>
124<h3><a name="is_other">is_other</a>()</h3>
125<p>On some operating systems, it is possible to have a directory entry which is
126not for either a directory or a file. The
127<a href="tr2_proposal.html#Predicate-functions">is_other()</a> 
128function identifies such cases.</p>
129<h3>Should predicates throw on errors?</h3>
130<p>Some conditions reported by operating systems as errors (see
131<a href="#Footnote">footnote</a>) clearly simply indicate that the predicate is
132false, rather than indicating serious failure. But other errors represent
133serious hardware or network problems, or permissions problems.</p>
134<p>Some people, particularly Rob Stewart, argue that in a function like
135<a href="tr2_proposal.html#Predicate-functions">is_directory()</a>, any error should simply cause the function to return false. If
136there is actually an underlying problem, it will be detected it due course when
137a directory_iterator or fstream operation is attempted.</p>
138<p>That view is was rejected because of the following considerations:</p>
139<ul>
140  <li>As a general principle, the earlier errors can be reported, the better.
141  The rationale being that it is often much cheaper to fix errors sooner rather
142  than later. I've also had a lot of negative experiences where failure to
143  detect errors early caused a lot of pain and unhappy customers. Some of these
144  were directly caused by ignoring error returns from file system operations.<br>
145  &nbsp;</li>
146  <li>Analysis of existing programs indicated that as much as 30% of the use of
147  a predicate was not followed by directory_iterator or fstream operations on
148  the path in question. Instead, the applications performed reporting or
149  fall-back operations that would not fail, and thus were either misleading or
150  completely wrong if the <i>false</i> return value was in fact caused by
151  hardware or network failure, or permissions problems.</li>
152</ul>
153<p>However, the discussion did identify that there are valid cases where
154non-throwing behavior is a requirement, and a programmer may prefer to deal with
155file or directory attributes and errors at a very low, bit-mask, level. Function <a href="#status">status()</a> 
156was proposed to meet those needs.</p>
157<h3><a name="Expectations">Expectations</a> <a name="table">table</a></h3>
158<p>In the table below, <i>p</i> is a non-empty path.</p>
159<p>Unless otherwise specified, all functions throw on hardware or general
160failure errors, permission or access errors, symbolic link loop errors, and
161invalid path errors. If an O/S fails to distinguish between error types,
162predicate operations return false on such ambiguous errors.</p>
163<p><i><b>Expectations</b></i> identify operations that are expected to succeed
164or fail, assuming no hardware, permission, or access right errors, and no race
165conditions.</p>
166<table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
167  <tr>
168    <td width="22%" align="center"><b><i>Expression</i></b></td>
169    <td width="48%" align="center"><b><i>Expectations</i></b></td>
170    <td width="108%" align="center"><b><i>Semantics</i></b></td>
171  </tr>
172  <tr>
173    <td width="22%">is_directory(p)</td>
174    <td width="48%">Returns true if p is found and is a directory, else false.<br>
175    If true, then directory_iterator(p) would succeed.<br>
176    If false, then directory_iterator(p) would fail.</td>
177    <td width="108%">Throws: if <a href="#status">status()</a> &amp; error_flag<br>
178    Returns: status() &amp; directory_flag</td>
179  </tr>
180  <tr>
181    <td width="22%">is_file(p)</td>
182    <td width="48%">Returns true if p is found and is not a directory, else
183    false.<br>
184    If true, then ifstream(p) would succeed.<br>
185    False, however, does not imply ifstream(p) would fail (because some
186    operating systems allow directories to be opened as files, but stat() does
187    set the &quot;regular file&quot; flag.)</td>
188    <td width="108%">Throws: if status() &amp; error_flag<br>
189    Returns: status() &amp; file_flag</td>
190  </tr>
191  <tr>
192    <td width="22%">exists(p) </td>
193    <td width="48%">Returns is_directory(p) || is_file(p) || is_other(p)</td>
194    <td width="108%">Throws: if status() &amp; error_flag<br>
195    Returns: status() &amp;&nbsp;&nbsp; (directory_flag|file_flag|other_flag)</td>
196  </tr>
197  <tr>
198    <td width="22%">is_symlink(p)</td>
199    <td width="48%">Returns true if p is found by shallow (non-transitive)
200    search, and is a symbolic link, else false.<br>
201    If true, and p points to q, then for any filesystem function f except those
202    specified as working shallowly on symlinks themselves, f(p) calls f(q), and
203    returns any value returned by f(q).</td>
204    <td width="108%">Throws: if <a href="#status">symlink_status</a>() &amp; 
205    error_flag<br>
206    Returns: symlink_status() &amp; symlink_flag</td>
207  </tr>
208  <tr>
209    <td width="22%">!exists(p) &amp;&amp; ((p.has_branch_path() &amp;&amp; exists( p.branch_path())
210    || (!p.has_branch_path() &amp;&amp; !p.has_root_path()))<br>
211    <i>In other words, if the path does not exist, and (the branch does exist,
212    or (there is no branch and no root)).</i></td>
213    <td width="48%">If true, create_directory(p) would succeed.<br>
214    If true, ofstream(p) would succeed.<br>
215    &nbsp;</td>
216    <td width="108%">&nbsp;</td>
217  </tr>
218  <tr>
219    <td width="22%">directory_iterator it(p)</td>
220    <td width="48%">If it != directory_iterator(), assert(exists(*it)||is_symlink(*it)).
221    Note: exists(*it) may throw, and likewise status(*it) may return error_flag
222    - there is no guarantee of accessibility.</td>
223    <td width="108%">&nbsp;</td>
224  </tr>
225</table>
226<h3><a name="Conclusion">Conclusion</a></h3>
227<p>Predicate operations is_directory(), is_file(), is_symlink(), and exists()
228with the indicated semantics form a self-consistent set that meets expectations.</p>
229<h2><a name="Preservation">Preservation</a> of existing user code</h2>
230<p>Although the change to a template based approach required a complete overhaul
231of the implementation code, the  interface as used by existing applications is mostly unchanged.
232Conversion problems which would
233otherwise affect user code have been reduced by providing deprecated
234functions to ease transition. The deprecated functions are:</p>
235<blockquote>
236  <pre>// class basic_path - 2nd constructor argument ignored:
237basic_path( const string_type &amp; str, name_check );
238basic_path( const typename string_type::value_type * s, name_check );
239
240// class basic_path - old names provided for renamed functions:
241string_type native_file_string() const;
242string_type native_directory_string() const;
243
244// class basic_path - now defined such that these no longer have any real effect:
245static bool default_name_check_writable() { return false; }
246static void default_name_check( name_check ) {}
247static name_check default_name_check() { return 0; }
248
249// non-deducible operations functions assume class path
250inline path current_path()
251inline const path &amp; initial_path()
252
253// the new basic_directory_entry provides leaf()
254// to cover the common existing use case itr-&gt;leaf()
255typename Path::string_type leaf() const;</pre>
256</blockquote>
257<p>If you do not want  the deprecated functions to be included, define the macro BOOST_FILESYSTEM_NO_DEPRECATED.</p>
258<p>The greatest impact on existing code is the change of directory iterator
259value type from <code>path</code> to <code>directory_entry</code>. To ease the
260most common directory iterator use case, <code>basic_directory_entry</code> 
261provides an automatic conversion to <code>basic_path</code>, and this also
262serves to prevent breakage of a lot of existing code. See the
263<a href="#More_efficient">next section</a> for discussion of rationale.</p>
264<blockquote>
265  <pre>// the new basic_directory_entry provides:
266operator const path_type &amp;() const;</pre>
267  </blockquote>
268<h2><a name="More_efficient">More efficient</a> operations when iterating over
269directories</h2>
270<p>Several common real-world operating systems (BSD derivatives, Linux, Windows)
271provide status information during directory iteration. Caching of this status
272information results in three to six times faster operation for typical predicate
273operations. (For a directory containing 15,047 files, iteration in 1 second vs 6
274seconds on a freshly booted system, and 0.3 seconds vs 0.9 seconds after prior use of
275the directory.</p>
276<p>The efficiency gains from caching such status information were considered too
277significant to ignore. Because the possibility of race-conditions differs
278depending on whether the cached information is used or an actual system call is
279performed, it was considered necessary to provide explicit functions utilizing
280the cached information, rather than implicitly using the cache behind the
281scenes.</p>
282<p>Three options were explored for exposing the cached status information, with
283full implementations of each. After initial implementation of option 1 exposed
284the problems noted below, option 2 was tested as a possible engineering
285tradeoff. Option 3
286was finally chosen as the cleanest design.</p>
287<table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
288  <tr>
289    <td width="8%" align="center"><b><i>Option</i></b></td>
290    <td width="25%" align="center"><i><b>How cache accessed</b></i></td>
291    <td width="94%" align="center"><i><b>Pros and Cons</b></i></td>
292  </tr>
293  <tr>
294    <td width="8%" valign="top" align="center"><i><b>1</b></i></td>
295    <td width="25%" valign="top">Predicate function overloads<br>
296    (basic_directory_iterator value_type is path)</td>
297    <td width="94%">
298    <ul>
299      <li>Very Questionable design (friendship abuse, overload abuse, etc)</li>
300      <li>User cannot reuse cache</li>
301      <li>Readability problem; easy to miss difference between f(*it) and f(it)</li>
302      <li>Write-ability problem (error prone?)</li>
303      <li>Most common iterator use is brief: *it</li>
304      <li>Preserves existing code</li>
305    </ul>
306    </td>
307  </tr>
308  <tr>
309    <td width="8%" valign="top" align="center"><b><i>2</i></b></td>
310    <td width="25%" valign="top">Predicate member functions of basic_directory_<span style="background-color: #FFFF00">iterator</span><br>
311    (basic_directory_iterator value_type is path)</td>
312    <td width="94%">
313    <ul>
314      <li>Somewhat cleaner design (although added iterator functions is unusual)</li>
315      <li>User cannot reuse cache</li>
316      <li>Readability and write-ability is OK: f(*it) and it.f() sufficiently
317      different</li>
318      <li>Most common iterator use is brief: *it</li>
319      <li>Preserves existing code</li>
320    </ul>
321    </td>
322  </tr>
323  <tr>
324    <td width="8%" valign="top" align="center"><b><i>3</i></b></td>
325    <td width="25%" valign="top">Predicate member functions of basic_directory_<span style="background-color: #FFFF00">entry</span><br>
326    (basic_directory_iterator value_type is basic_directory_entry)<br>
327&nbsp;</td>
328    <td width="94%">
329    <ul>
330      <li>Cleanest design.</li>
331      <li>User can reuse cache.</li>
332      <li>Readability and write-ability is OK: f(*it) and it-&gt;f() sufficiently
333      different.</li>
334      <li>Most common iterator use is longer: it-&gt;path(), but by providing
335      &quot;operator const basic_path &amp;&quot; it is still possible to write a bare *it.</li>
336      <li>Breaks some existing code. The &quot;operator const basic_path &amp;&quot; 
337      conversion eliminates breakage of the most common use case, while
338      providing a (deprecated) leaf() prevents breakage of the second most
339      common use case.</li>
340    </ul>
341    </td>
342  </tr>
343  </table>
344<h2><a name="Rationale">Rationale</a></h2>
345<h3>Elimination of the native versus generic <a name="distinction">distinction</a></h3>
346<p>Elimination of user confusion and general design simplification was the
347original motivation for elimination of the distinction between native and
348generic paths.</p>
349<p>During design work, a further technical argument was discovered. Consider the
350path <code>&quot;c:foo/bar&quot;</code>. On many POSIX systems, <code>&quot;c:foo&quot;</code> is a
351valid directory name, so we have a two element path and there is no issue of
352native versus generic format. On Windows system, however, <code>&quot;c:&quot;</code> is a
353drive specification, so we have a three element path. All calls to the operating
354system will result in <code>&quot;c:&quot;</code> being considered a drive specification;
355there is no way that fact-of-life can be changed by claiming the format is
356generic. The native versus generic distinction is thus useless and misleading
357for POSIX, Windows, and probably most other operating systems.</p>
358<p>If paths for a particular operating system did require a distinction be made,
359it could be done by requiring that native paths be prefixed with some unique
360implementation-defined identification. For example, <code>&quot;native-path:&quot;</code>.
361This would only be required for operating systems where (1) the distinction
362mattered, and (2) there was no lexical way to distinguish the two forms. For
363example, a native operating system that used the same syntax as the Filesystem
364Library's generic POSIX-like format, but processed the elements right-to-left
365instead of left-to-right.</p>
366<h3>Preservation of <a name="existing-code">existing code</a></h3>
367<p>Allowing existing user code to continue to work with the updated version of
368the library has obvious benefits in terms of preserving the effort users have
369applied to both learning the library and writing code which uses the library.</p>
370<p>There is an additional motivation; other than the name checking portion of
371class path,&nbsp; the existing interface has proven to be useful and robust, so
372there is no reason to fiddle with it.</p>
373<h3><a name="Single_path_design">Single path design</a></h3>
374<p>During preliminary internationalization discussion on the Boost developer's
375list, a design was considered for a single path class which could hold either
376narrow or wide character based paths. That design was rejected because:</p>
377<ul>
378  <li>The design was, for many applications, an over-generalization with runtime
379  memory and speed costs which would have to be paid for even when not needed.<br>
380&nbsp;</li>
381  <li>There was concern that the design would be confusing to users, given that
382  the standard library already uses single-value-type strings, rather than
383  strings which morph value types as needed.<br>
384&nbsp;</li>
385  <li>There were technical issues with conversions when a narrow path was
386  appended to a wide path, and visa versa. The concern was that double
387  conversions could cause incorrect results, that conversions best left to the
388  operating system would be performed, and that the technical complexity was too
389  great in relation to perceived benefits. User-defined types would only make
390  the problem worse.<br>
391&nbsp;</li>
392</ul>
393<h3>No versions of <a href="tr2_proposal.html#Status-functions">status()</a> which throw exceptions on
394errors</h3>
395<p>The rationale for not including versions of status()
396which throw exceptions on errors is that (1) the primary purpose of this
397function is to perform queries at a very low-level, where exceptions are usually
398unwanted, and (2) exceptions on errors are already provided by the predicate
399functions. There would be little or no efficiency gain from providing a throwing
400version of status().</p>
401<h3>Symlink identifying version of <a href="tr2_proposal.html#Status-functions">status()</a> function</h3>
402<p>A symlink identifying version of the status() function is distinguished by a
403second argument. Often separately named functions are more appropriate than
404overloading when behavior
405differs, which is the case here, while overloads are more appropriate when
406behavior is the same but argument types differ (Iain Hanson). Overloading was
407chosen in this particular case because a subjective judgment that a single
408function name with an optional &quot;symlink&quot; second argument produced more
409understandable code. The original implementation of the function used the name &quot;symlink_status&quot;,
410but that just didn't read right in real code.</p>
411<h3>POSIX wpath_traits defaults to locale(&quot;&quot;), but allows imbuing of locale</h3>
412<p>Vladimir Prus pointed out that for Linux (and presumably other POSIX
413operating systems) that need to convert wide character paths to narrow
414characters, the default conversion should not depend on the operating system
415alone, but on the std::locale(&quot;&quot;) default. For example, the usual encoding
416for Russian on Linux (and Russian web sites) is KOI8-R (RFC1489). The ability to safely specify a different locale
417is also provided, to meet unforeseen needs.</p>
418<hr>
419<p>Revised
420<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->03 June, 2007<!--webbot bot="Timestamp" endspan i-checksum="19946" --></p>
421<p>© Copyright Beman Dawes, 2005</p>
422<p>Distributed under the Boost Software License, Version 1.0.
423(See accompanying file <a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
424copy at <a href="http://www.boost.org/LICENSE_1_0.txt">www.boost.org/LICENSE_1_0.txt</a>)</p>
425
426</body>
427
428</html>
Note: See TracBrowser for help on using the repository browser.