1 | <html> |
---|
2 | |
---|
3 | <head> |
---|
4 | <meta http-equiv="Content-Language" content="en-us"> |
---|
5 | <meta name="GENERATOR" content="Microsoft FrontPage 5.0"> |
---|
6 | <meta name="ProgId" content="FrontPage.Editor.Document"> |
---|
7 | <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> |
---|
8 | <title>Boost Filesystem Library Design</title> |
---|
9 | </head> |
---|
10 | |
---|
11 | <body bgcolor="#FFFFFF"> |
---|
12 | |
---|
13 | <h1> |
---|
14 | <img border="0" src="../../../boost.png" align="center" width="277" height="86">Filesystem |
---|
15 | Library Design</h1> |
---|
16 | |
---|
17 | <p><a href="#Introduction">Introduction</a><br> |
---|
18 | <a href="#Requirements">Requirements</a><br> |
---|
19 | <a href="#Realities">Realities</a><br> |
---|
20 | <a href="#Rationale">Rationale</a><br> |
---|
21 | <a href="#Abandoned_Designs">Abandoned_Designs</a><br> |
---|
22 | <a href="#References">References</a></p> |
---|
23 | |
---|
24 | <h2><a name="Introduction">Introduction</a></h2> |
---|
25 | |
---|
26 | <p>The primary motivation for beginning work on the Filesystem Library was |
---|
27 | frustration with Boost administrative tools. Scripts were written in |
---|
28 | Python, Perl, Bash, and Windows command languages. There was no single |
---|
29 | scripting language familiar and acceptable to all Boost administrators. Yet they |
---|
30 | were all skilled C++ programmers - why couldn't C++ be used as the scripting |
---|
31 | language?</p> |
---|
32 | |
---|
33 | <p>The key feature C++ lacked for script-like applications was the ability to |
---|
34 | perform portable filesystem operations on directories and their contents. The |
---|
35 | Filesystem Library was developed to fill that void.</p> |
---|
36 | |
---|
37 | <p>The intent is not to compete with traditional scripting languages, but to |
---|
38 | provide a solution for situations where C++ is already the language |
---|
39 | of choice..</p> |
---|
40 | |
---|
41 | <h2><a name="Requirements">Requirements</a></h2> |
---|
42 | <ul> |
---|
43 | <li>Be able to write portable script-style filesystem operations in modern |
---|
44 | C++.<br> |
---|
45 | <br> |
---|
46 | Rationale: This is a common programming need. It is both an |
---|
47 | embarrassment and a hardship that this is not possible with either the current |
---|
48 | C++ or Boost libraries. The need is particularly acute |
---|
49 | when C++ is the only toolset allowed in the tool chain. File system |
---|
50 | operations are provided by many languages used on multiple platforms, |
---|
51 | such as Perl and Python, as well as by many platform specific scripting |
---|
52 | languages. All operating systems provide some form of API for filesystem |
---|
53 | operations, and the POSIX bindings are increasingly available even on |
---|
54 | operating systems not normally associated with POSIX, such as the Mac, z/OS, |
---|
55 | or OS/390.<br> |
---|
56 | </li> |
---|
57 | <li>Work within the <a href="#Realities">realities</a> described below.<br> |
---|
58 | <br> |
---|
59 | Rationale: This isn't a research project. The need is for something that works on |
---|
60 | today's platforms, including some of the embedded operating systems |
---|
61 | with limited file systems. Because of the emphasis on portability, such a |
---|
62 | library would be much more useful if standardized. That means being able to |
---|
63 | work with a much wider range of platforms that just Unix or Windows and their |
---|
64 | clones.<br> |
---|
65 | </li> |
---|
66 | <li>Avoid dangerous programming practices. Particularly, all-too-easy-to-ignore error notifications |
---|
67 | and use of global variables. If a dangerous feature is provided, identify it as such.<br> |
---|
68 | <br> |
---|
69 | Rationale: Normally this would be covered by "the usual Boost requirements...", |
---|
70 | but it is mentioned explicitly because the equivalent native platform and |
---|
71 | scripting language interfaces often depend on all-too-easy-to-ignore error |
---|
72 | notifications and global variables like "current |
---|
73 | working directory".<br> |
---|
74 | </li> |
---|
75 | <li>Structure the library so that it is still useful even if some functionality |
---|
76 | does not map well onto a given platform or directory tree. Particularly, much |
---|
77 | useful functionality should be portable even to flat |
---|
78 | (non-hierarchical) filesystems.<br> |
---|
79 | <br> |
---|
80 | Rationale: Much functionality which does not |
---|
81 | require a hierarchical directory structure is still useful on flat-structure |
---|
82 | filesystems. There are many systems, particularly embedded systems, |
---|
83 | where even very limited functionality is still useful.</li> |
---|
84 | </ul> |
---|
85 | <ul> |
---|
86 | <li>Interface smoothly with current C++ Standard Library input/output |
---|
87 | facilities. For example, paths should be |
---|
88 | easy to use in std::basic_fstream constructors.<br> |
---|
89 | <br> |
---|
90 | Rationale: One of the most common uses of file system functionality is to |
---|
91 | manipulate paths for eventual use in input/output operations. |
---|
92 | Thus the need to interface smoothly with standard library I/O.<br> |
---|
93 | </li> |
---|
94 | <li>Suitable for eventual standardization. The implication of this requirement |
---|
95 | is that the interface be close to minimal, and that great care be take |
---|
96 | regarding portability.<br> |
---|
97 | <br> |
---|
98 | Rationale: The lack of file system operations is a serious hole |
---|
99 | in the current standard, with no other known candidates to fill that hole. |
---|
100 | Libraries with elaborate interfaces and difficult to port specifications are much less likely to be accepted for |
---|
101 | standardization.<br> |
---|
102 | </li> |
---|
103 | <li>The usual Boost <a href="../../../more/lib_guide.htm">requirements and |
---|
104 | guidelines</a> apply.<br> |
---|
105 | </li> |
---|
106 | <li>Encourage, but do not require, portability in path names.<br> |
---|
107 | <br> |
---|
108 | Rationale: For paths which originate from user input it is unreasonable to |
---|
109 | require portable path syntax.<br> |
---|
110 | </li> |
---|
111 | <li>Avoid giving the illusion of portability where portability in fact does not |
---|
112 | exist.<br> |
---|
113 | <br> |
---|
114 | Rationale: Leaving important behavior unspecified or "implementation defined" does a |
---|
115 | great disservice to programmers using a library because it makes it appear |
---|
116 | that code relying on the behavior is portable, when in fact there is nothing |
---|
117 | portable about it. The only case where such under-specification is acceptable is when both users and implementors know from |
---|
118 | other sources exactly what behavior is required, yet for some reason it isn't |
---|
119 | possible to specify it exactly.</li> |
---|
120 | </ul> |
---|
121 | <h2><a name="Realities">Realities</a></h2> |
---|
122 | <ul> |
---|
123 | <li>Some operating systems have a single directory tree root, others have |
---|
124 | multiple roots.<br> |
---|
125 | </li> |
---|
126 | <li>Some file systems provide both a long and short form of filenames.<br> |
---|
127 | </li> |
---|
128 | <li>Some file systems have different syntax for file paths and directory |
---|
129 | paths.<br> |
---|
130 | </li> |
---|
131 | <li>Some file systems have different rules for valid file names and valid |
---|
132 | directory names.<br> |
---|
133 | </li> |
---|
134 | <li>Some file systems (ISO-9660, level 1, for example) use very restricted |
---|
135 | (so-called 8.3) file names.<br> |
---|
136 | </li> |
---|
137 | <li>Some operating systems allow file systems with different |
---|
138 | characteristics to be "mounted" within a directory tree. Thus a |
---|
139 | ISO-9660 or Windows |
---|
140 | file system may end up as a sub-tree of a POSIX directory tree.<br> |
---|
141 | </li> |
---|
142 | <li>Wide-character versions of directory and file operations are available on some operating |
---|
143 | systems, and not available on others.<br> |
---|
144 | </li> |
---|
145 | <li>There is no law that says directory hierarchies have to be specified in |
---|
146 | terms of left-to-right decent from the root.<br> |
---|
147 | </li> |
---|
148 | <li>Some file systems have a concept of file "version number" or "generation |
---|
149 | number". Some don't.<br> |
---|
150 | </li> |
---|
151 | <li>Not all operating systems use single character separators in path names. Some use |
---|
152 | paired notations. A typical fully-specified OpenVMS filename |
---|
153 | might look something like this:<br> |
---|
154 | <br> |
---|
155 | <code> DISK$SCRATCH:[GEORGE.PROJECT1.DAT]BIG_DATA_FILE.NTP;5<br> |
---|
156 | </code><br> |
---|
157 | The general OpenVMS format is:<br> |
---|
158 | <br> |
---|
159 | |
---|
160 | <i>Device:[directories.dot.separated]filename.extension;version_number</i><br> |
---|
161 | </li> |
---|
162 | <li>For common file systems, determining if two descriptors are for same |
---|
163 | entity is extremely difficult or impossible. For example, the concept of |
---|
164 | equality can be different for each portion of a path - some portions may be |
---|
165 | case or locale sensitive, others not. Case sensitivity is a property of the |
---|
166 | pathname itself, and not the platform. Determining collating sequence is even |
---|
167 | worse.<br> |
---|
168 | </li> |
---|
169 | <li>Race-conditions may occur. Directory trees, directories, files, and file attributes are in effect shared between all threads, processes, and computers which have access to the |
---|
170 | filesystem. That may well include computers on the other side of the |
---|
171 | world or in orbit around the world. This implies that file system operations |
---|
172 | may fail in unexpected ways. For example:<br> |
---|
173 | <br> |
---|
174 | <code> assert( exists("foo") == exists("foo") ); |
---|
175 | // may fail!<br> |
---|
176 | assert( is_directory("foo") == is_directory("foo"); |
---|
177 | // may fail!<br> |
---|
178 | </code><br> |
---|
179 | In the first example, the file may have been deleted between calls to |
---|
180 | exists(). In the second example, the file may have been deleted and then |
---|
181 | replaced by a directory of the same name between the calls to is_directory().<br> |
---|
182 | </li> |
---|
183 | <li>Even though an application may be portable, it still will have to traffic |
---|
184 | in system specific paths occasionally; user provided input is a common |
---|
185 | example.<br> |
---|
186 | </li> |
---|
187 | <li><a name="symbolic-link-use-case">Symbolic</a> links cause canonical and |
---|
188 | normal form of some paths to represent different files or directories. For |
---|
189 | example, given the directory hierarchy <code>/a/b/c</code>, with a symbolic |
---|
190 | link in <code>/a</code> named <code>x</code> pointing to <code>b/c</code>, |
---|
191 | then under POSIX Pathname Resolution rules a path of <code>"/a/x/.."</code> |
---|
192 | should resolve to <code>"/a/b"</code>. If <code>"/a/x/.."</code> were first |
---|
193 | normalized to <code>"/a"</code>, it would resolve incorrectly. (Case supplied |
---|
194 | by Walter Landry.)</li> |
---|
195 | </ul> |
---|
196 | |
---|
197 | <h2><a name="Rationale">Rationale</a></h2> |
---|
198 | |
---|
199 | <p>The <a href="#Requirements">Requirements</a> and <a href="#Realities"> |
---|
200 | Realities</a> above drove much of the C++ interface design. In particular, |
---|
201 | the desire to make script-like code straightforward caused a great deal of |
---|
202 | effort to go into ensuring that apparently simple expressions like <i>exists( "foo" |
---|
203 | )</i> work as expected.</p> |
---|
204 | |
---|
205 | <p>See the <a href="faq.htm">FAQ</a> for the rationale behind many detailed |
---|
206 | design decisions.</p> |
---|
207 | |
---|
208 | <p>Several key insights went into the <i>path</i> class design:</p> |
---|
209 | <ul> |
---|
210 | <li>Decoupling of the input formats, internal conceptual (<i>vector<string></i> |
---|
211 | or other sequence) |
---|
212 | model, and output formats.</li> |
---|
213 | <li>Providing two input formats (generic and O/S specific) broke a major |
---|
214 | design deadlock.</li> |
---|
215 | <li>Providing several output formats solved another set of previously |
---|
216 | intractable problems.</li> |
---|
217 | <li>Several non-obvious functions (particularly decomposition and composition) |
---|
218 | are required to support portable code. (Peter Dimov, Thomas Witt, Glen |
---|
219 | Knowles, others.)</li> |
---|
220 | </ul> |
---|
221 | |
---|
222 | <p>Error checking was a particularly difficult area. One key insight was that |
---|
223 | with file and directory names, portability isn't a universal truth. |
---|
224 | Rather, the programmer must think out the question "What operating systems do I |
---|
225 | want this path to be portable to?" By providing support for several |
---|
226 | answers to that question, the Filesystem Library alerts programmers of the need |
---|
227 | to ask it in the first place.</p> |
---|
228 | <h2><a name="Abandoned_Designs">Abandoned Designs</a></h2> |
---|
229 | <h3>operations.hpp</h3> |
---|
230 | <p>Dietmar Kühl's original dir_it design and implementation supported |
---|
231 | wide-character file and directory names. It was abandoned after extensive |
---|
232 | discussions among Library Working Group members failed to identify portable |
---|
233 | semantics for wide-character names on systems not providing native support. See |
---|
234 | <a href="faq.htm#wide-character_names">FAQ</a>.</p> |
---|
235 | <p>Previous iterations of the interface design used explicitly named functions providing a |
---|
236 | large number of convenience operations, with no compile-time or run-time |
---|
237 | options. There were so many function names that they were very confusing to use, |
---|
238 | and the interface was much larger. Any benefits seemed theoretical rather than |
---|
239 | real. </p> |
---|
240 | <p>Designs based on compile time (rather than runtime) flag and option selection |
---|
241 | (via policy, enum, or int template parameters) became so complicated that they |
---|
242 | were abandoned, often after investing quite a bit of time and effort. The need |
---|
243 | to qualify attribute or option names with namespaces, even aliases, made use in |
---|
244 | template parameters ugly; that wasn't fully appreciated until actually writing |
---|
245 | real code.</p> |
---|
246 | <p>Yet another set of convenience functions ( for example, <i>remove</i> with |
---|
247 | permissive, prune, recurse, and other options, plus predicate, and possibly |
---|
248 | other, filtering features) were abandoned because the details became both |
---|
249 | complex and contentious.</p> |
---|
250 | |
---|
251 | <p>What is left is a toolkit of low-level operations from which the user can |
---|
252 | create more complex convenience operations, plus a very small number of |
---|
253 | convenience functions which were found to be useful enough to justify inclusion.</p> |
---|
254 | |
---|
255 | <h3>path.hpp</h3> |
---|
256 | |
---|
257 | <p>There were so many abandoned path designs, I've lost track. Policy-based |
---|
258 | class templates in several flavors, constructor supplied runtime policies, |
---|
259 | operation specific runtime policies, they were all considered, often |
---|
260 | implemented, and ultimately abandoned as far too complicated for any small |
---|
261 | benefits observed.</p> |
---|
262 | |
---|
263 | <p>Additional design considerations apply to <a href="i18n.html"> |
---|
264 | Internationalization</a>. </p> |
---|
265 | |
---|
266 | <h3>error checking</h3> |
---|
267 | |
---|
268 | <p>A number of designs for the error checking machinery were abandoned, some |
---|
269 | after experiments with implementations. Totally automatic error checking was |
---|
270 | attempted in particular. But automatic error checking tended to make the overall |
---|
271 | library design much more complicated.</p> |
---|
272 | |
---|
273 | <p>Some designs associated error checking mechanisms with paths. Some with |
---|
274 | operations functions. A policy-based error checking template design was |
---|
275 | partially implemented, then abandoned as too complicated for everyday |
---|
276 | script-like programs.</p> |
---|
277 | |
---|
278 | <p>The final design, which depends partially on explicit error checking function |
---|
279 | calls, is much simpler and straightforward, although it does depend to |
---|
280 | some extent on programmer discipline. But it should allow programmers who |
---|
281 | are concerned about portability to be reasonably sure that their programs will |
---|
282 | work correctly on their choice of target systems.</p> |
---|
283 | |
---|
284 | <h2><a name="References">References</a></h2> |
---|
285 | |
---|
286 | <table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> |
---|
287 | <tr> |
---|
288 | <td width="13%" valign="top">[<a name="IBM-01">IBM-01</a>]</td> |
---|
289 | <td width="87%">IBM Corporation, <i>z/OS V1R3.0 C/C++ Run-Time |
---|
290 | Library Reference</i>, SA22-7821-02, 2001, |
---|
291 | <a href="http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/"> |
---|
292 | www-1.ibm.com/servers/eserver/zseries/zos/bkserv/</a></td> |
---|
293 | </tr> |
---|
294 | <tr> |
---|
295 | <td width="13%" valign="top">[<a name="ISO-9660">ISO-9660</a>]</td> |
---|
296 | <td width="87%">International Standards Organization, 1988</td> |
---|
297 | </tr> |
---|
298 | <tr> |
---|
299 | <td width="13%" valign="top">[<a name="Kuhn">Kuhn</a>]</td> |
---|
300 | <td width="87%">UTF-8 and Unicode FAQ for Unix/Linux, |
---|
301 | <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html"> |
---|
302 | www.cl.cam.ac.uk/~mgk25/unicode.html</a></td> |
---|
303 | </tr> |
---|
304 | <tr> |
---|
305 | <td width="13%" valign="top">[<a name="MSDN">MSDN</a>] </td> |
---|
306 | <td width="87%">Microsoft Platform SDK for Windows, Storage Start |
---|
307 | Page, |
---|
308 | <a href="http://msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp"> |
---|
309 | msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp</a></td> |
---|
310 | </tr> |
---|
311 | <tr> |
---|
312 | <td width="13%" valign="top">[<a name="POSIX-01">POSIX-01</a>]</td> |
---|
313 | <td width="87%">IEEE Std 1003.1-2001, ISO/IEC 9945:2002, and The Open Group Base Specifications, Issue 6. Also known as The |
---|
314 | Single Unix<font face="Times New Roman">® Specification, Version 3. |
---|
315 | Available from each of the organizations involved in its creation. For |
---|
316 | example, read online or download from |
---|
317 | <a href="http://www.unix.org/single_unix_specification/"> |
---|
318 | www.unix.org/single_unix_specification/</a>.</font> The ISO JTC1/SC22/WG15 - POSIX |
---|
319 | homepage is <a href="http://www.open-std.org/jtc1/sc22/WG15/"> |
---|
320 | www.open-std.org/jtc1/sc22/WG15/</a></td> |
---|
321 | </tr> |
---|
322 | <tr> |
---|
323 | <td width="13%" valign="top">[<a name="URI">URI</a>]</td> |
---|
324 | <td width="87%">RFC-2396, Uniform Resource Identifiers (URI): Generic |
---|
325 | Syntax, <a href="http://www.ietf.org/rfc/rfc2396.txt"> |
---|
326 | www.ietf.org/rfc/rfc2396.txt</a></td> |
---|
327 | </tr> |
---|
328 | <tr> |
---|
329 | <td width="13%" valign="top">[<a name="UTF-16">UTF-16</a>]</td> |
---|
330 | <td width="87%">Wikipedia, UTF-16, |
---|
331 | <a href="http://en.wikipedia.org/wiki/UTF-16"> |
---|
332 | en.wikipedia.org/wiki/UTF-16</a></td> |
---|
333 | </tr> |
---|
334 | <tr> |
---|
335 | <td width="13%" valign="top">[<a name="Wulf-Shaw-73">Wulf-Shaw-73</a>]</td> |
---|
336 | <td width="87%">William Wulf, Mary Shaw, <i>Global |
---|
337 | Variable Considered Harmful</i>, ACM SIGPLAN Notices, 8, 2, 1973, pp. 23-34</td> |
---|
338 | </tr> |
---|
339 | </table> |
---|
340 | |
---|
341 | <hr> |
---|
342 | <p>Revised |
---|
343 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->02 August, 2005<!--webbot bot="Timestamp" endspan i-checksum="34600" --></p> |
---|
344 | |
---|
345 | <p>© Copyright Beman Dawes, 2002</p> |
---|
346 | <p> Use, modification, and distribution are subject to the Boost Software |
---|
347 | License, Version 1.0. (See accompanying file <a href="../../../LICENSE_1_0.txt"> |
---|
348 | LICENSE_1_0.txt</a> or copy at <a href="http://www.boost.org/LICENSE_1_0.txt"> |
---|
349 | www.boost.org/LICENSE_1_0.txt</a>)</p> |
---|
350 | |
---|
351 | </body> |
---|
352 | |
---|
353 | </html> |
---|