1 | <html> |
---|
2 | <head> |
---|
3 | <title>In-depth: The Parser</title> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
6 | </head> |
---|
7 | |
---|
8 | <body> |
---|
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
10 | <tr> |
---|
11 | <td width="10"> |
---|
12 | </td> |
---|
13 | <td width="85%"> |
---|
14 | <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>In-depth: The Parser</b></font> |
---|
15 | </td> |
---|
16 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
17 | </tr> |
---|
18 | </table> |
---|
19 | <br> |
---|
20 | <table border="0"> |
---|
21 | <tr> |
---|
22 | <td width="10"></td> |
---|
23 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
24 | <td width="30"><a href="semantic_actions.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
25 | <td width="30"><a href="indepth_the_scanner.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
26 | </tr> |
---|
27 | </table> |
---|
28 | <p>What makes Spirit tick? Now on to some details... The parser class is the most |
---|
29 | fundamental entity in the framework. A parser accepts a scanner comprised of |
---|
30 | a first-last iterator pair and returns a match object as its result. The iterators |
---|
31 | delimit the data currently being parsed. The match object evaluates to true |
---|
32 | if the parse succeeds, in which case the input is advanced accordingly. Each |
---|
33 | parser can represent a specific pattern or algorithm, or it can be a more complex |
---|
34 | parser formed as a composition of other parsers.</p> |
---|
35 | <p>All parsers inherit from the base template class, parser:</p> |
---|
36 | <pre> |
---|
37 | <span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>> |
---|
38 | </span><span class=keyword>struct </span><span class=identifier>parser |
---|
39 | </span><span class=special>{ |
---|
40 | </span><span class=comment>/*...*/ |
---|
41 | |
---|
42 | </span><span class=identifier>DerivedT</span><span class=special>& </span><span class=identifier>derived</span><span class=special>(); |
---|
43 | </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>derived</span><span class=special>() </span><span class=keyword>const</span><span class=special>; |
---|
44 | </span><span class=special>};</span></pre> |
---|
45 | <p>This class is a protocol base class for all parsers. The parser class does |
---|
46 | not really know how to parse anything but instead relies on the template parameter |
---|
47 | <tt>DerivedT</tt> to do the actual parsing. This technique is known as the <a href="references.html#curious_recurring">"Curiously |
---|
48 | Recurring Template Pattern"</a> in template meta-programming circles. This |
---|
49 | inheritance strategy gives us the power of polymorphism without the virtual |
---|
50 | function overhead. In essence this is a way to implement <a href="references.html#generic_patterns">compile |
---|
51 | time polymorphism</a>.</p> |
---|
52 | <h2> parser_category_t</h2> |
---|
53 | <p> Each derived parser has a typedef <tt>parser_category_t</tt> that defines |
---|
54 | its category. By default, if one is not specified, it will inherit from the |
---|
55 | base parser class which typedefs its parser_category_t as <tt>plain_parser_category</tt>. |
---|
56 | Some template classes are provided to distinguish different types of parsers. |
---|
57 | The following categories are the most generic. More specific types may inherit |
---|
58 | from these.</p> |
---|
59 | <table width="90%" border="0" align="center"> |
---|
60 | <tr> |
---|
61 | <td colspan="2" class="table_title">Parser categories</td> |
---|
62 | </tr> |
---|
63 | <tr> |
---|
64 | <td class="table_cells" width="33%"><tt>plain_parser_category</tt></td> |
---|
65 | <td class="table_cells" width="67%">Your plain vanilla parser</td> |
---|
66 | </tr> |
---|
67 | <tr> |
---|
68 | <td class="table_cells" width="33%"><tt>binary_parser_category</tt></td> |
---|
69 | <td class="table_cells" width="67%">A parser that has subject a and b (e.g. |
---|
70 | alternative)</td> |
---|
71 | </tr> |
---|
72 | <tr> |
---|
73 | <td class="table_cells" width="33%"><tt>unary_parser_category</tt></td> |
---|
74 | <td class="table_cells" width="67%">A parser that has single subject (e.g. |
---|
75 | kleene star)</td> |
---|
76 | </tr> |
---|
77 | <tr> |
---|
78 | <td class="table_cells" width="33%"><tt>action_parser_category</tt></td> |
---|
79 | <td class="table_cells" width="67%">A parser with an attached semantic action</td> |
---|
80 | </tr> |
---|
81 | </table> |
---|
82 | <pre><span class=identifier> </span><span class=keyword>struct </span><span class=identifier>plain_parser_category </span><span class=special>{}; |
---|
83 | </span><span class=keyword>struct </span><span class=identifier>binary_parser_category </span><span class=special>: </span><span class=identifier>plain_parser_category </span><span class=special>{}; |
---|
84 | </span><span class=keyword>struct </span><span class=identifier>unary_parser_category </span><span class=special>: </span><span class=identifier>plain_parser_category </span><span class=special>{}; |
---|
85 | </span><span class=keyword>struct </span><span class=identifier>action_parser_category </span><span class=special>: </span><span class=identifier>unary_parser_category </span><span class=special>{};</span></pre> |
---|
86 | <h2>embed_t</h2> |
---|
87 | <p>Each parser has a typedef <tt>embed_t</tt>. This typedef specifies how a parser |
---|
88 | is embedded in a composite. By default, if one is not specified, the parser |
---|
89 | will be embedded by value. That is, a copy of the parser is placed as a member |
---|
90 | variable of the composite. Most parsers are embedded by value. In certain situations |
---|
91 | however, this is not desirable or possible. One particular example is the <a href="rule.html">rule</a>. |
---|
92 | The rule, unlike other parsers is embedded by reference.</p> |
---|
93 | <h2><a name="match"></a>The match</h2> |
---|
94 | <p>The match holds the result of a parser. A match object evaluates to true when |
---|
95 | a succesful match is found, otherwise false. The length of the match is the |
---|
96 | number of characters (or tokens) that is successfully matched. This can be queried |
---|
97 | through its <tt>length()</tt> member function. A negative value means that the |
---|
98 | match is unsucessful. </p> |
---|
99 | <p> Each parser may have an associated attribute. This attribute is also returned |
---|
100 | back to the client on a successful parse through the match object. We can get |
---|
101 | this attribute via the match's <tt>value()</tt> member function. Be warned though |
---|
102 | that the match's attribute may be invalid, in which case, getting the attribute |
---|
103 | will result in an exception. The member function <tt>has_valid_attribute()</tt> |
---|
104 | can be queried to know if it is safe to get the match's attribute. The attribute |
---|
105 | may be set anytime through the member function <tt>value(v)</tt>where <tt>v</tt> |
---|
106 | is the new attribute value.<br> |
---|
107 | <br> |
---|
108 | A match attribute is valid:</p> |
---|
109 | <ul> |
---|
110 | <li> on a successful match</li> |
---|
111 | <li>when its value is set through the <tt>value(val)</tt> member function</li> |
---|
112 | <li> if it is assigned or copied from a compatible match object (e.g. <tt>match<double></tt> |
---|
113 | from <tt>match<int></tt>) with a valid attribute. A match object <tt>A</tt> |
---|
114 | is compatible with another match object <tt>B</tt> if the attribute type of |
---|
115 | <tt>A</tt> can be assigned from the attribute type of <tt></tt> <tt>B</tt> |
---|
116 | (i.e. <tt>a = b;</tt> must compile).</li> |
---|
117 | </ul> |
---|
118 | <p>The match attribute is undefined:</p> |
---|
119 | <ul> |
---|
120 | <li>on an unsuccessful match </li> |
---|
121 | <li>when an attempt to copy or assign from another match object with an incompatible |
---|
122 | attribute type (e.g. <tt>match<std::string></tt> from <tt>match<int></tt>).</li> |
---|
123 | </ul> |
---|
124 | <h3>The match class:</h3> |
---|
125 | <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>T</span><span class=special>> |
---|
126 | </span><span class=keyword> class </span><span class=identifier>match |
---|
127 | </span><span class=keyword> </span><span class=special>{ |
---|
128 | </span><span class=keyword> public</span><span class=special>: |
---|
129 | |
---|
130 | </span><span class=keyword> </span><span class=comment>/*...*/ |
---|
131 | |
---|
132 | </span><span class=special> </span><span class=keyword> typedef</span><span class="identifier"> T attr_t</span><span class="special">;<br> |
---|
133 | </span><span class=keyword> </span><span class="special"> </span><span class=keyword>operator safe_bool</span><span class=special>() </span><span class=keyword>const</span>; <span class="comment">// convertible to a bool</span> |
---|
134 | <span class=keyword> int </span><span class=identifier>length</span><span class=special>() </span><span class=keyword>const</span>; |
---|
135 | <span class="keyword">bool</span> has_valid_attribute<span class="special">()</span> <span class="keyword">const</span><span class="special">;</span> |
---|
136 | <span class=keyword> </span> <span class=identifier>void</span><span class=special> </span><span class=identifier>value</span><span class=special>(</span><span class="identifier">T </span><span class="keyword">const</span><span class=special>&) </span><span class=keyword>const; |
---|
137 | </span><span class=identifier>T </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>value</span><span class=special>(); |
---|
138 | </span><span class=keyword> </span><span class=special>};</span></pre> |
---|
139 | <h2>match_result</h2> |
---|
140 | <p>It has been mentioned repeatedly that the parser returns a match object as |
---|
141 | its result. This is a simplification. Actually, for the sake of genericity, |
---|
142 | parsers are really not hard-coded to return a match object. More accurately, |
---|
143 | a parser returns an object that adheres to a conceptual interface, of which |
---|
144 | the match is an example. Nevertheless, we shall call the result type of a parser |
---|
145 | a match object regardless if it is actually a match class, a derivative or a |
---|
146 | totally unrelated type.</p> |
---|
147 | <table width="80%" border="0" align="center"> |
---|
148 | <tr> |
---|
149 | <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Meta-functions</b><br> |
---|
150 | <br> |
---|
151 | What are meta-functions? We all know how functions look like. In simplest |
---|
152 | terms, a function accepts some arguments and returns a result. Here is the |
---|
153 | function we all love so much:<br> |
---|
154 | <br> |
---|
155 | <code><span class="keyword">int</span> identity_func<span class="special">(</span><span class="keyword">int</span> |
---|
156 | arg<span class="special">)</span><br> |
---|
157 | <span class="special">{</span> <span class="keyword">return</span> arg<span class="special">; |
---|
158 | }</span> <span class="comment">// return the argument arg</span><br> |
---|
159 | </code><br> |
---|
160 | Meta-functions are essentially the same. These beasts also accept arguments |
---|
161 | and return a result. However, while functions work at runtime on values, |
---|
162 | meta-functions work at compile time on types (or constants, but we shall |
---|
163 | deal only with types). The meta-function is a template class (or struct). |
---|
164 | The template parameters are the arguments to the meta-function and a typedef |
---|
165 | within the class is the meta-function's return type. Here is the corresponding |
---|
166 | meta-function:<code><br> |
---|
167 | <br> |
---|
168 | <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> |
---|
169 | ArgT<span class="special">></span><br> |
---|
170 | <span class="keyword">struct</span> identity_meta_func<br> |
---|
171 | <span class="special">{</span> <span class="keyword">typedef</span> ArgT |
---|
172 | type<span class="special">; } </span><span class="comment">// return the |
---|
173 | argument ArgT</span><br> |
---|
174 | <br> |
---|
175 | </code>The meta-function above is invoked as:<br> |
---|
176 | <br> |
---|
177 | <code><span class="keyword">typename</span> identity_meta_func<span class="special"><</span>ArgT<span class="special">>::</span>type</code><br> |
---|
178 | <br> |
---|
179 | By convention, meta-functions return the result through the typedef <tt>type</tt>. |
---|
180 | Take note that <tt>typename</tt> is only required within templates.</td> |
---|
181 | </tr> |
---|
182 | </table> |
---|
183 | <p>The actual match type used by the parser depends on two types: the parser's |
---|
184 | attribute type and the scanner type. <tt>match_result</tt> is the meta-function |
---|
185 | that returns the desired match type given an attribute type and a scanner type. |
---|
186 | </p> |
---|
187 | <p>Usage:</p> |
---|
188 | <pre> <span class=keyword>typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class=identifier>T</span><span class=special>>::</span><span class=identifier>type</span></pre> |
---|
189 | <p>The meta-function basically answers the question "given a scanner type |
---|
190 | <tt>ScannerT</tt> and an attribute type <tt>T</tt>, what is the desired match |
---|
191 | type?" [<img src="theme/note.gif" width="16" height="16"> <tt>typename</tt> |
---|
192 | is only required within templates ].</p> |
---|
193 | <h2>The parse member function</h2> |
---|
194 | <p>Concrete sub-classes inheriting from parser must have a corresponding member |
---|
195 | function <tt>parse(...)</tt> compatible with the conceptual Interface:<br> |
---|
196 | </p> |
---|
197 | <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
198 | </span><span class=identifier>RT |
---|
199 | </span><span class=identifier>parse</span><span class=special>(</span><span class=identifier>ScannerT</span><span class=special></span> const<span class=special>& </span>scan<span class=identifier></span><span class=special>) </span><span class=keyword>const</span><span class=special>;</span></pre> |
---|
200 | <p>where <tt>RT</tt> is the desired return type of the parser. </p> |
---|
201 | <h2>The parser result</h2> |
---|
202 | <p>Concrete sub-classes inheriting from parser in most cases need to have a nested |
---|
203 | meta-function <tt>result</tt> that returns the result <tt>type</tt> of the parser's |
---|
204 | parse member function, given a scanner type. The meta-function has the form:</p> |
---|
205 | <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
206 | </span><span class=keyword>struct </span><span class=identifier>result |
---|
207 | </span><span class=special>{ |
---|
208 | </span><span class=keyword>typedef </span>RT <span class=identifier></span><span class=identifier>type</span><span class=special>; |
---|
209 | </span><span class=special>};</span></pre> |
---|
210 | <p>where <tt>RT</tt> is the desired return type of the parser. This is usually, |
---|
211 | but not always, dependent on the template parameter <tt>ScannerT</tt>. For example, |
---|
212 | given an attribute type <tt>int</tt>, we can use the match_result metafunction:</p> |
---|
213 | <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
214 | </span><span class=keyword>struct </span><span class=identifier>result |
---|
215 | </span><span class=special>{ |
---|
216 | </span><span class=keyword>typedef typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class="keyword">int</span><span class=special>>::</span><span class=identifier>type type</span><span class=special>; |
---|
217 | };</span></pre> |
---|
218 | <p>If a parser does not supply a result metafunction, a default is provided by |
---|
219 | the base parser class.<span class=special> </span>The default is declared as:</p> |
---|
220 | <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
221 | </span><span class=keyword>struct </span><span class=identifier>result |
---|
222 | </span><span class=special>{ |
---|
223 | </span><span class=keyword>typedef typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class="identifier">nil_t</span><span class=special>>::</span><span class=identifier>type type</span><span class=special>; |
---|
224 | };</span></pre> |
---|
225 | <p>Without a result metafunction, notice that the parser's default attribute is |
---|
226 | <tt>nil_t</tt> (i.e. the parser has no attribute).</p> |
---|
227 | <h2><span class=special></span>parser_result</h2> |
---|
228 | <p>Given a a scanner type <tt>ScannerT</tt> and a parser type <tt>ParserT</tt>, |
---|
229 | what will be the actual result of the parser? The answer to this question is |
---|
230 | provided to by the <tt>parser_result</tt> meta-function.</p> |
---|
231 | <p>Usage:</p> |
---|
232 | <pre> <span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special><</span><span class=identifier>ParserT, ScannerT</span><span class=special>>::</span><span class=identifier>type</span></pre> |
---|
233 | <p>In general, the meta-function just forwards the invocation to the parser's |
---|
234 | result meta-function:</p> |
---|
235 | <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ParserT</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
236 | </span><span class=keyword>struct </span><span class=identifier>parser_result |
---|
237 | </span><span class=special>{ |
---|
238 | </span><span class=keyword>typedef </span><span class=keyword>typename </span><span class=identifier>ParserT</span><span class=special>::</span><span class=keyword>template </span><span class=identifier>result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type </span><span class=identifier>type</span><span class=special>; |
---|
239 | </span><span class=special>};</span></pre> |
---|
240 | <p>This is similar to a global function calling a member function. Most of the |
---|
241 | time, the usage above is equivalent to:</p> |
---|
242 | <pre><span class=keyword> typename </span><span class=identifier>ParserT</span><span class=special>::</span><span class=keyword>template </span><span class=identifier>result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type</span></pre> |
---|
243 | <p>Yet, this should not be relied upon to be true all the time because the parser_result |
---|
244 | metafunction might be specialized for specific parser and/or scanner types.</p> |
---|
245 | <p>The parser_result metafunction makes the signature of the required parse member |
---|
246 | function almost canonical:</p> |
---|
247 | <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
248 | </span><span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special><</span><span class=identifier>self_t, ScannerT</span><span class=special>>::</span><span class=identifier>type</span><br> <span class=identifier>parse</span><span class=special>(</span><span class=identifier>ScannerT</span><span class=special></span> const<span class=special>& </span>scan<span class=identifier></span><span class=special>) </span><span class=keyword>const</span><span class=special>;</span></pre> |
---|
249 | <p>where<span class=special></span> <tt>self_t</tt> is a typedef to the parser.</p> |
---|
250 | <h2>parser class declaration</h2> |
---|
251 | <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>> |
---|
252 | </span><span class=keyword>struct </span><span class=identifier>parser |
---|
253 | </span><span class=special>{ |
---|
254 | </span><span class=keyword>typedef </span><span class=identifier>DerivedT embed_t</span><span class=special>; |
---|
255 | </span><span class=keyword>typedef </span><span class=identifier>DerivedT derived_t</span><span class=special>; |
---|
256 | </span><span class=keyword>typedef </span><span class=identifier>plain_parser_category parser_category_t</span><span class=special>; |
---|
257 | |
---|
258 | </span><span class=keyword>template </span><span class=special><</span><span class="keyword">typename</span> ScannerT<span class=special>> |
---|
259 | </span><span class=keyword>struct </span><span class=identifier>result |
---|
260 | </span><span class=special>{ |
---|
261 | </span><span class=keyword>typedef typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class=identifier>nil_t</span><span class=special>>::</span><span class=identifier>type type</span><span class=special>; |
---|
262 | }; |
---|
263 | |
---|
264 | </span><span class=identifier>DerivedT</span><span class=special>& </span><span class=identifier>derived</span><span class=special>(); |
---|
265 | </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>derived</span><span class=special>() </span><span class=keyword>const</span><span class=special>; |
---|
266 | |
---|
267 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ActionT</span><span class=special>> |
---|
268 | </span><span class=identifier>action</span><span class=special><</span><span class=identifier>DerivedT</span><span class=special>, </span><span class=identifier>ActionT</span><span class=special>> |
---|
269 | </span><span class=keyword>operator</span><span class=special>[](</span><span class=identifier>ActionT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>actor</span><span class=special>) </span><span class=keyword>const</span><span class=special>; |
---|
270 | };</span></pre> |
---|
271 | <table border="0"> |
---|
272 | <tr> |
---|
273 | <td width="10"></td> |
---|
274 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
275 | <td width="30"><a href="semantic_actions.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
276 | <td width="30"><a href="indepth_the_scanner.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
277 | </tr> |
---|
278 | </table> |
---|
279 | <br> |
---|
280 | <hr size="1"> |
---|
281 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
---|
282 | <br> |
---|
283 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
284 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
285 | http://www.boost.org/LICENSE_1_0.txt) </font> </p> |
---|
286 | </body> |
---|
287 | </html> |
---|