1 | <html> |
---|
2 | <head> |
---|
3 | <title>The Lazy Parsers</title> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
6 | </head> |
---|
7 | |
---|
8 | <body> |
---|
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
10 | <tr> |
---|
11 | <td width="10"> |
---|
12 | </td> |
---|
13 | <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>The |
---|
14 | Lazy Parser</b></font></td> |
---|
15 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
16 | </tr> |
---|
17 | </table> |
---|
18 | <br> |
---|
19 | <table border="0"> |
---|
20 | <tr> |
---|
21 | <td width="10"></td> |
---|
22 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
23 | <td width="30"><a href="dynamic_parsers.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
24 | <td width="30"><a href="select_parser.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
25 | </tr> |
---|
26 | </table> |
---|
27 | <p>Closures are cool. It allows us to inject stack based local variables anywhere |
---|
28 | in our parse descent hierarchy. Typically, we store temporary variables, generated |
---|
29 | by our semantic actions, in our closure variables, as a means to pass information |
---|
30 | up and down the recursive descent.</p> |
---|
31 | <p>Now imagine this... Having in mind that closure variables can be just about |
---|
32 | any type, we can store a parser, a rule, or a pointer to a parser or rule, in |
---|
33 | a closure variable. <em>Yeah, right, so what?...</em> Ok, hold on... What if |
---|
34 | we can use this closure variable to initiate a parse? Think about it for a second. |
---|
35 | Suddenly we'll have some powerful dynamic parsers! Suddenly we'll have a full |
---|
36 | round trip from to <a href="../phoenix/index.html">Phoenix</a> and Spirit and |
---|
37 | back! <a href="../phoenix/index.html">Phoenix</a> semantic actions choose the |
---|
38 | right Spirit parser and Spirit parsers choose the right <a href="../phoenix/index.html">Phoenix</a> |
---|
39 | semantic action. Oh MAN, what a honky cool idea, I might say!!</p> |
---|
40 | <h2>lazy_p</h2> |
---|
41 | <p>This is the idea behind the <tt>lazy_p</tt> parser. The <tt>lazy_p</tt> syntax |
---|
42 | is:</p> |
---|
43 | <pre> lazy_p<span class="special">(</span>actor<span class="special">)</span></pre> |
---|
44 | <p>where actor is a <a href="../phoenix/index.html">Phoenix</a> expression that |
---|
45 | returns a Spirit parser. This returned parser is used in the parsing process. |
---|
46 | </p> |
---|
47 | <p>Example: </p> |
---|
48 | <pre> lazy_p<span class="special">(</span>phoenix<span class="special">::</span>val<span class="special">(</span>int_p<span class="special">))[</span>assign_a<span class="special">(</span>result<span class="special">)]</span> |
---|
49 | </pre> |
---|
50 | <p>Semantic actions attached to the <tt>lazy_p</tt> parser expects the same signature |
---|
51 | as that of the returned parser (<tt>int_p</tt>, in our example above).</p> |
---|
52 | <h2>lazy_p example</h2> |
---|
53 | <p>To give you a better glimpse (see the <tt><a href="../example/intermediate/lazy_parser.cpp">lazy_parser.cpp</a></tt>), |
---|
54 | say you want to parse inputs such as:</p> |
---|
55 | <pre> <span class=identifier>dec |
---|
56 | </span><span class="special">{</span><span class=identifier><br> 1 2 3<br> bin |
---|
57 | </span><span class="special">{</span><span class=identifier><br> 1 10 11<br> </span><span class="special">}</span><span class=identifier><br> 4 5 6<br> </span><span class="special">}</span></pre> |
---|
58 | <p>where <tt>bin {...}</tt> and <tt>dec {...}</tt> specifies the numeric format |
---|
59 | (binary or decimal) that we are expecting to read. If we analyze the input, |
---|
60 | we want a grammar like:</p> |
---|
61 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>base </span><span class="special">=</span><span class=identifier> </span><span class="string">"bin"</span><span class=identifier> </span><span class="special">|</span><span class=identifier> </span><span class="string">"dec"</span><span class="special">;</span><span class=identifier> |
---|
62 | block </span><span class=special>= </span><span class="identifier">base</span><span class=special> >> </span><span class="literal">'{'</span><span class=special> >> *</span><span class="identifier">block_line</span><span class=special> >> </span><span class="literal">'}'</span><span class=special>; |
---|
63 | </span>block_line <span class=special>= </span><span class="identifier">number</span><span class=special> | </span><span class=identifier>block</span><span class=special>;</span></font></code></pre> |
---|
64 | <p>We intentionally left out the <code><font color="#000000"><span class="identifier"><tt>number</tt></span></font></code> |
---|
65 | rule. The tricky part is that the way <tt>number</tt> rule behaves depends on |
---|
66 | the result of the <tt>base</tt> rule. If <tt>base</tt> got a <em>"bin"</em>, |
---|
67 | then number should parse binary numbers. If <tt>base</tt> got a <em>"dec"</em>, |
---|
68 | then number should parse decimal numbers. Typically we'll have to rewrite our |
---|
69 | grammar to accomodate the different parsing behavior:</p> |
---|
70 | <pre><code><font color="#000000"><span class=identifier> block </span><span class=special>= |
---|
71 | </span><span class=identifier>"bin"</span> <span class=special>>> </span><span class="literal">'{'</span><span class=special> >> *</span>bin_line<span class=special> >> </span><span class="literal">'}'</span><span class=special> |
---|
72 | | </span><span class=identifier>"dec"</span> <span class=special>>> </span><span class="literal">'{'</span><span class=special> >> *</span>dec_line<span class=special> >> </span><span class="literal">'}'</span><span class=special> |
---|
73 | ; |
---|
74 | </span>bin_line <span class=special>= </span><span class="identifier">bin_p</span><span class=special> | </span><span class=identifier>block</span><span class=special>; |
---|
75 | </span>dec_line <span class=special>= </span><span class="identifier">int_p</span><span class=special> | </span><span class=identifier>block</span><span class=special>;</span></font></code></pre> |
---|
76 | <p>while this is fine, the redundancy makes us want to find a better solution; |
---|
77 | after all, we'd want to make full use of Spirit's dynamic parsing capabilities. |
---|
78 | Apart from that, there will be cases where the set of parsing behaviors for |
---|
79 | our <tt>number</tt> rule is not known when the grammar is written. We'll only |
---|
80 | be given a map of string descriptors and corresponding rules [e.g. (("dec", |
---|
81 | int_p), ("bin", bin_p) ... etc...)].</p> |
---|
82 | <p>The basic idea is to have a rule for binary and decimal numbers. That's easy |
---|
83 | enough to do (see <a href="numerics.html">numerics</a>). When <tt>base</tt> |
---|
84 | is being parsed, in your semantic action, store a pointer to the selected base |
---|
85 | in a closure variable (e.g. <tt>block.int_rule</tt>). Here's an example:</p> |
---|
86 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>base |
---|
87 | </span><span class="special">=</span><span class=identifier> str_p</span><span class="special">(</span><span class="string">"bin"</span><span class="special">)[</span><span class=identifier>block.int_rule</span> = <span class="special">&</span>var<span class="special">(</span><span class="identifier">bin_rule</span><span class="special">)] |
---|
88 | | </span><span class=identifier>str_p</span><span class="special">(</span><span class="string">"dec"</span><span class="special">)[</span><span class=identifier>block.int_rule</span> = <span class="special">&</span>var<span class="special">(</span><span class="identifier">dec_rule</span><span class="special">)] |
---|
89 | ;</span></font></code></pre> |
---|
90 | <p>With this setup, your number rule will now look something like:</p> |
---|
91 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>number </span><span class="special">=</span><span class=identifier> lazy_p</span><span class="special">(*</span><span class=identifier>block.int_rule</span><span class="special">);</span></font></code></pre> |
---|
92 | <p>The <tt><a href="../example/intermediate/lazy_parser.cpp">lazy_parser.cpp</a></tt> |
---|
93 | does it a bit differently, ingeniously using the <a href="symbols.html">symbol |
---|
94 | table</a> to dispatch the correct rule, but in essence, both strategies are |
---|
95 | similar. This technique, using the symbol table, is detailed in the Techiques section: <a href="techniques.html#nabialek_trick">nabialek_trick</a>. Admitedly, when you add up all the rules, the resulting grammar is |
---|
96 | more complex than the hard-coded grammar above. Yet, for more complex grammar |
---|
97 | patterns with a lot more rules to choose from, the additional setup is well |
---|
98 | worth it.</p> |
---|
99 | <table border="0"> |
---|
100 | <tr> |
---|
101 | <td width="10"></td> |
---|
102 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
103 | <td width="30"><a href="dynamic_parsers.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
104 | <td width="30"><a href="select_parser.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
105 | </tr> |
---|
106 | </table> |
---|
107 | <br> |
---|
108 | <hr size="1"> |
---|
109 | <p class="copyright">Copyright © 2003 Joel de Guzman<br> |
---|
110 | Copyright © 2003 Vaclav Vesely<br> |
---|
111 | <br> |
---|
112 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
113 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
114 | http://www.boost.org/LICENSE_1_0.txt)</font></p> |
---|
115 | <p class="copyright"> </p> |
---|
116 | </body> |
---|
117 | </html> |
---|