1 | <html> |
---|
2 | <head> |
---|
3 | <title>Rationale</title> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
6 | </head> |
---|
7 | |
---|
8 | <body> |
---|
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
10 | <tr> |
---|
11 | <td width="10"> |
---|
12 | </td> |
---|
13 | <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Rationale</b></font></td> |
---|
14 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
15 | </tr> |
---|
16 | </table> |
---|
17 | <br> |
---|
18 | <table border="0"> |
---|
19 | <tr> |
---|
20 | <td width="10"></td> |
---|
21 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
22 | <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
23 | <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
24 | </tr> |
---|
25 | </table> |
---|
26 | <p><img src="theme/lens.gif" width="15" height="16"> <strong>Virtual functions: |
---|
27 | From static to dynamic C++</strong></p> |
---|
28 | <p>Rules straddle the border between static and dynamic C++. In effect, a rule |
---|
29 | transforms compile-time polymorphism (using templates) into run-time polymorphism |
---|
30 | (using virtual functions). This is necessary due to C++'s inability to automatically |
---|
31 | declare a variable of a type deduced from an arbitrarily complex expression |
---|
32 | in the right-hand side (rhs) of an assignment. Basically, we want to do something |
---|
33 | like:</p> |
---|
34 | <pre><code><font color="#000000"> <span class=identifier>T </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>an_arbitrarily_complex_expression</span><span class=special>;</span></font></code></pre> |
---|
35 | <p>without having to know or care about the resulting type of the right-hand side |
---|
36 | (rhs) of the assignment expression. Apart from this, we also need a facility |
---|
37 | to forward declare an unknown type:</p> |
---|
38 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>T </span><span class=identifier>rule</span><span class=special>; |
---|
39 | </span><span class=special>... |
---|
40 | </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></font></code></pre> |
---|
41 | <p>These limitations lead us to this implementation of rules. This comes at the |
---|
42 | expense of the overhead of a virtual function call, once through each invocation |
---|
43 | of a rule.</p> |
---|
44 | <p><img src="theme/lens.gif" width="15" height="16"> <strong>Multiple declaration |
---|
45 | </strong> </p> |
---|
46 | <p>Some BNF variants allow multiple declarations of a <tt>rule</tt>. The declarations |
---|
47 | are taken as alternatives. Example:</p> |
---|
48 | <pre> |
---|
49 | <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a</span><span class=special>; </span><span class=identifier> |
---|
50 | r </span><span class=special>= </span><span class=identifier>b</span><span class=special>;</span></code></pre> |
---|
51 | <p> is equivalent to: </p> |
---|
52 | <pre> |
---|
53 | <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></code></pre> |
---|
54 | <p>Spirit v1.3 allowed this behavior. However, the current version of Spirit <b>no |
---|
55 | longer</b> allows this because experience shows that this behavior leads to |
---|
56 | unwanted gotchas (for instance, it does not allow rules to be held in containers). |
---|
57 | In the current release of Spirit, a second assignment to a rule will simply |
---|
58 | redefine it. The old definition is destructed. This follows more closely C++ |
---|
59 | semantics and is more in line with what the user expects the rule to behave.</p> |
---|
60 | <p><img src="theme/lens.gif" width="15" height="16"> <b>Sequencing Syntax</b> |
---|
61 | <br> |
---|
62 | <br> |
---|
63 | The comma operator as in a, b seems to be a better candidate, syntax-wise. But |
---|
64 | then the problem is with its precedence. It has the lowest precedence in C/C++, |
---|
65 | which makes it virtually useless. <br> |
---|
66 | <br> |
---|
67 | Bjarne Stroustrup, in his article <a href="references.html#generalized_overloading">"Generalizing |
---|
68 | Overloading for C++2000"</a> talks about overloading whitespace. Such a |
---|
69 | feature would allow juxtapositioning of parser objects exactly as we do in (E)BNF |
---|
70 | (e.g. a b | c instead of a >> b | c). Unfortunately, the article was dated |
---|
71 | April 1, 1998. Oh well.</p> |
---|
72 | <p><img src="theme/lens.gif" width="15" height="16"> <b>Forward iterators</b><br> |
---|
73 | <br> |
---|
74 | In general, the scanner expects at least a standard conforming forward iterator. |
---|
75 | Forward iterators are needed for backtracking where the iterator needs to be |
---|
76 | saved and restored later. Generally speaking, Spirit is a backtracking parser. |
---|
77 | The implication of this is that at some point, the iterator position will have |
---|
78 | to be saved to allow the parser to backtrack to a previous point. Thus, for |
---|
79 | backtracking to work, the framework requires at least a forward iterator.<br> |
---|
80 | <br> |
---|
81 | Some parsers might require more specialized iterators (bi-directional or even |
---|
82 | random access). Perhaps in the future, deterministic parsers when added to the |
---|
83 | framework, will perform no backtracking and will need just a single token lookahead, |
---|
84 | hence will require input iterators only.</p> |
---|
85 | <p><img src="theme/lens.gif" width="15" height="16"><b> Why are subrules important?</b><br> |
---|
86 | <br> |
---|
87 | Subrules open up the oportunity to do aggressive meta programming as well because |
---|
88 | they do not rely on virtual functions. The virtual function is the meta-programmer's |
---|
89 | hell. Not only does it slow down the program due to the virtual function indirect |
---|
90 | call, it is also an opaque wall where no metaprogram can get past. It kills |
---|
91 | all meta-information beyond the virtual function call. Worse, the virtual function |
---|
92 | cannot be templated. Which means that its arguments have to be tied to a actual |
---|
93 | types. Many problems stem from this limitation. <br> |
---|
94 | <br> |
---|
95 | While Spirit is a currently classified as a non-deterministic recursive-descent |
---|
96 | parser, Doug Gregor first noted that other parsing techniques apart from top-down |
---|
97 | recursive descent may be applied. For instance, apart from non-deterministic |
---|
98 | recursive descent, deterministic LL(1) and LR(1) can theoretically be implemented |
---|
99 | using the same expression template front end. Spirit rules use virtual functions |
---|
100 | to encode the RHS parser expression in an opaque abstract parser type. While |
---|
101 | it serves its purpose well, the rule's virtual functions are the stumbling blocks |
---|
102 | to more advanced metaprogramming. Subrules are free from virtual functions.</p> |
---|
103 | <p><img src="theme/lens.gif" width="15" height="16"><b> <a name="exhaustive_rd"></a>Exhaustive |
---|
104 | backtracking and greedy RD</b></p> |
---|
105 | <p>Spirit doesn't do exhaustive backtracking like regular expressions are expected |
---|
106 | to. For example:</p> |
---|
107 | <pre> <span class="special">*</span>chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">) >></span> chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">);</span></pre> |
---|
108 | <p>will always fail to match because Spirit's Kleene star does not back off when |
---|
109 | the rest of the rule fails to match. </p> |
---|
110 | <p>Actually, there's a solution to this greedy RD problem. Such a scheme is discussed |
---|
111 | in section 6.6.2 of <a |
---|
112 | href="http://www.cs.vu.nl/%7Edick/PTAPG.html">Parsing Techniques: A Practical |
---|
113 | Guide</a>. The trick involves passing a <em>tail</em> parser (in addition to |
---|
114 | the scanner) to each parser. The start parser will then simply be: <tt>start |
---|
115 | >> end_p;</tt> (end_p is the start's tail). </p> |
---|
116 | <p>Spirit is greedy --using straight forward, naive RD. It is certainly possible |
---|
117 | to implement the fully backtracking scheme presented above, but there will be |
---|
118 | also certainly be a performance hit. The scheme will always try to match all |
---|
119 | possible parser paths (full parser hierarchy traversal) until it reaches a point |
---|
120 | of certainty, that the whole thing matches or fails to match. </p> |
---|
121 | <table border="0" width="80%" align="center"> |
---|
122 | <tr> |
---|
123 | <td class="note_box"><p><img src="theme/note.gif" width="16" height="16"><b>Backtracking |
---|
124 | and Greedy RD </b><br> |
---|
125 | <br> |
---|
126 | Spirit is quite consistent and intuitive about when it backtracks and |
---|
127 | to where, although it may not be obvious to those coming from different |
---|
128 | backgrounds. In general, any (sub)parser will, given the same input, always |
---|
129 | match the same portion of the input (or fail to match the input at all). |
---|
130 | This means that Spirit is inherently greedy. Spirit will only backtrack |
---|
131 | when a (sub)parser fails to match the input, and it will always backtrack |
---|
132 | to the next choice point upward (not backward) in the parser structure. |
---|
133 | In other words abb|ab will match "ab", as will a(bb|b), but |
---|
134 | (ab|a)b won't because the (ab|a) subparser will always match the 'b' after |
---|
135 | the 'a' if it is available.</p> |
---|
136 | <p>--Rainer Deyke</p> |
---|
137 | </td> |
---|
138 | </tr> |
---|
139 | </table> |
---|
140 | <p>There's a strong preference on "simplicity with all the knobs when you |
---|
141 | need them" approach, right now. On the other hand, the flexibility of Spirit |
---|
142 | makes it possible to have different optional schemes available. It might be |
---|
143 | possible to implement an exhaustive backtracking RD scheme as an optional feature |
---|
144 | in the future. </p> |
---|
145 | <table border="0"> |
---|
146 | <tr> |
---|
147 | <td width="10"></td> |
---|
148 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
149 | <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
150 | <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
151 | </tr> |
---|
152 | </table> |
---|
153 | <br> |
---|
154 | <hr size="1"> |
---|
155 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
---|
156 | <br> |
---|
157 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
158 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
159 | http://www.boost.org/LICENSE_1_0.txt)</font></p> |
---|
160 | <p class="copyright"> </p> |
---|
161 | </body> |
---|
162 | </html> |
---|