[29] | 1 | <html> |
---|
| 2 | <head> |
---|
| 3 | <title>Techniques</title> |
---|
| 4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
| 5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
| 6 | </head> |
---|
| 7 | |
---|
| 8 | <body> |
---|
| 9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
| 10 | <tr> |
---|
| 11 | <td width="10"> |
---|
| 12 | </td> |
---|
| 13 | <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Techniques</b></font></td> |
---|
| 14 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
| 15 | </tr> |
---|
| 16 | </table> |
---|
| 17 | <br> |
---|
| 18 | <table border="0"> |
---|
| 19 | <tr> |
---|
| 20 | <td width="10"></td> |
---|
| 21 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
| 22 | <td width="30"><a href="style_guide.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
| 23 | <td width="30"><a href="faq.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
| 24 | </tr> |
---|
| 25 | </table> |
---|
| 26 | <ul> |
---|
| 27 | <li><a href="#templatized_functors">Templatized Functors</a></li> |
---|
| 28 | <li><a href="#multiple_scanner_support">Rule With Multiple Scanners</a></li> |
---|
| 29 | <li><a href="#no_rules">Look Ma' No Rules!</a></li> |
---|
| 30 | <li><a href="#typeof">typeof</a></li> |
---|
| 31 | <li><a href="#nabialek_trick">Nabialek trick</a></li> |
---|
| 32 | </ul> |
---|
| 33 | <h3><a name="templatized_functors"></a> Templatized Functors</h3> |
---|
| 34 | <p>For the sake of genericity, it is often better to make the functor's member |
---|
| 35 | <tt>operator()</tt> a template. That way, we do not have to concern ourselves |
---|
| 36 | with the type of the argument to expect as long as the behavior is appropriate. |
---|
| 37 | For instance, rather than hard-coding <tt>char const*</tt> as the argument of |
---|
| 38 | a generic semantic action, it is better to make it a template member function. |
---|
| 39 | That way, it can accept any type of iterator:</p> |
---|
| 40 | <pre><code><font color="#000000"><span class=special> </span><span class=keyword>struct </span><span class=identifier>my_functor |
---|
| 41 | </span><span class=special>{ |
---|
| 42 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>IteratorT</span><span class=special>> |
---|
| 43 | </span><span class=keyword>void </span><span class=keyword>operator</span><span class=special>()(</span><span class=identifier>IteratorT </span><span class=identifier>first</span><span class=special>, </span><span class=identifier>IteratorT </span><span class=identifier>last</span><span class=special>) </span><span class=keyword>const</span><span class=special>; |
---|
| 44 | </span><span class=special>};</span></font></code></pre> |
---|
| 45 | <p>Take note that this is only possible with functors. It is not possible to pass |
---|
| 46 | in template functions as semantic actions unless you cast it to the correct |
---|
| 47 | function signature; in which case, you <em>monomorphize</em> the function. This |
---|
| 48 | clearly shows that functors are superior to plain functions.</p> |
---|
| 49 | <h3><b><a name="multiple_scanner_support" id="multiple_scanner_support"></a> Rule |
---|
| 50 | With Multiple Scanners</b></h3> |
---|
| 51 | <p>As of v1.8.0, rules can use one or more scanner types. There are cases, for |
---|
| 52 | instance, where we need a rule that can work on the phrase and character levels. |
---|
| 53 | Rule/scanner mismatch has been a source of confusion and is the no. 1 <a href="faq.html#scanner_business">FAQ</a>. |
---|
| 54 | To address this issue, we now have <a href="rule.html#multiple_scanner_support">multiple |
---|
| 55 | scanner support</a>. </p> |
---|
| 56 | <p>Here is an example of a grammar with a rule <tt>r</tt> that can be called with |
---|
| 57 | 3 types of scanners (phrase-level, lexeme, and lower-case). See the <a href="rule.html">rule</a>, |
---|
| 58 | <a href="grammar.html">grammar</a>, <a href="scanner.html#lexeme_scanner">lexeme_scanner</a> |
---|
| 59 | and <a href="scanner.html#as_lower_scanner">as_lower_scanner </a>for more information. |
---|
| 60 | </p> |
---|
| 61 | <p>Here's the grammar (see <a href="../example/techniques/multiple_scanners.cpp">multiple_scanners.cpp</a>): |
---|
| 62 | </p> |
---|
| 63 | <pre><span class=special> </span><span class=keyword>struct </span><span class=identifier>my_grammar </span><span class=special>: </span><span class=identifier>grammar</span><span class=special><</span><span class=identifier>my_grammar</span><span class=special>> |
---|
| 64 | </span><span class=special>{ |
---|
| 65 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
| 66 | </span><span class=keyword>struct </span><span class=identifier>definition |
---|
| 67 | </span><span class=special>{ |
---|
| 68 | </span><span class=identifier>definition</span><span class=special>(</span><span class=identifier>my_grammar </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>self</span><span class=special>) |
---|
| 69 | </span><span class=special>{ |
---|
| 70 | </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>lower_p</span><span class=special>; |
---|
| 71 | </span><span class=identifier>rr </span><span class=special>= </span><span class=special>+(</span><span class=identifier>lexeme_d</span><span class=special>[</span><span class=identifier>r</span><span class=special>] </span><span class=special>>> </span><span class=identifier>as_lower_d</span><span class=special>[</span><span class=identifier>r</span><span class=special>] </span><span class=special>>> </span><span class=identifier>r</span><span class=special>); |
---|
| 72 | </span><span class=special>} |
---|
| 73 | |
---|
| 74 | </span><span class=keyword>typedef </span><span class=identifier>scanner_list</span><span class=special>< |
---|
| 75 | </span><span class=identifier>ScannerT |
---|
| 76 | </span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>lexeme_scanner</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type |
---|
| 77 | </span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>as_lower_scanner</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type |
---|
| 78 | </span><span class=special>> </span><span class=identifier>scanners</span><span class=special>; |
---|
| 79 | |
---|
| 80 | </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>scanners</span><span class=special>> </span><span class=identifier>r</span><span class=special>; |
---|
| 81 | </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=identifier>rr</span><span class=special>; |
---|
| 82 | </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>rr</span><span class=special>; </span><span class=special>} |
---|
| 83 | </span><span class=special>}; |
---|
| 84 | </span><span class=special>};</span></pre> |
---|
| 85 | <p>By default support for multiple scanners is disabled. The macro |
---|
| 86 | <tt>BOOST_SPIRIT_RULE_SCANNERTYPE_LIMIT</tt> must be defined to the |
---|
| 87 | maximum number of scanners allowed in a scanner_list. The value must |
---|
| 88 | be greater than 1 to enable multiple scanners. Given the |
---|
| 89 | example above, to define a limit of three scanners for the list, the |
---|
| 90 | following line must be inserted into the source file before the |
---|
| 91 | inclusion of Spirit headers: |
---|
| 92 | </p> |
---|
| 93 | <pre><span class=special> </span><span class=preprocessor>#define </span><span class=identifier>BOOST_SPIRIT_RULE_SCANNERTYPE_LIMIT</span> <span class=literal>3</span></pre> |
---|
| 94 | <h3><span class=special></span><b> <a name="no_rules" id="no_rules"></a> Look |
---|
| 95 | Ma' No Rules</b></h3> |
---|
| 96 | <p>You use grammars and you use lots of 'em? Want a fly-weight, no-cholesterol, |
---|
| 97 | super-optimized grammar? Read on...</p> |
---|
| 98 | <p>I have a love-hate relationship with rules. I guess you know the reasons why. |
---|
| 99 | A lot of problems stem from the limitation of rules. Dynamic polymorphism and |
---|
| 100 | static polymorphism in C++ do not mix well. There is no notion of virtual template |
---|
| 101 | functions in C++; at least not just yet. Thus, the <strong>rule is tied to a |
---|
| 102 | specific scanner type</strong>. This results in problems such as the <a href="faq.html#scanner_business">scanner |
---|
| 103 | business</a>, our no. 1 FAQ. Apart from that, the virtual functions in rules |
---|
| 104 | slow down parsing, kill all meta-information, and kills inlining, hence bloating |
---|
| 105 | the generated code, especially for very tiny rules such as:</p> |
---|
| 106 | <pre> r <span class="special">=</span> ch_p<span class="special">(</span><span class="quotes">'x'</span><span class="special">) >></span> uint_p<span class="special">;</span></pre> |
---|
| 107 | <p> The rule's limitation is the main reason why the grammar is designed the way |
---|
| 108 | it is now, with a nested template definition class. The rule's limitation is |
---|
| 109 | also the reason why subrules exists. But do we really need rules? Of course! |
---|
| 110 | Before C++ adopts some sort of auto-type deduction, such as that proposed by |
---|
| 111 | David Abrahams in clc++m:</p> |
---|
| 112 | <pre> |
---|
| 113 | <code><span class=keyword>auto </span><span class=identifier>r </span><span class=special>= ...</span><span class=identifier>definition </span><span class=special>...</span></code></pre> |
---|
| 114 | <p> we are tied to the rule as RHS placeholders. However.... in some occasions |
---|
| 115 | we can get by without rules! For instance, rather than writing:</p> |
---|
| 116 | <pre> |
---|
| 117 | <code><span class=identifier>rule</span><span class=special><> </span><span class=identifier>x </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'x'</span><span class=special>);</span></code></pre> |
---|
| 118 | <p> It's better to write:</p> |
---|
| 119 | <pre> |
---|
| 120 | <code><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>x </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'x'</span><span class=special>);</span></code></pre> |
---|
| 121 | <p> That's trivial. But what if the rule is rather complicated? Ok, let's proceed |
---|
| 122 | stepwise... I'll investigate a simple skip_parser based on the C grammar from |
---|
| 123 | Hartmut Kaiser. Basically, the grammar is written as (see <a href="../example/techniques/no_rules/no_rule1.cpp">no_rule1.cpp</a>):</p> |
---|
| 124 | <pre><code> <span class=keyword>struct </span><span class=identifier>skip_grammar </span><span class=special>: </span><span class=identifier>grammar</span><span class=special><</span><span class=identifier>skip_grammar</span><span class=special>> |
---|
| 125 | { |
---|
| 126 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
| 127 | </span><span class=keyword>struct </span><span class=identifier>definition |
---|
| 128 | </span><span class=special>{ |
---|
| 129 | </span><span class=identifier>definition</span><span class=special>(</span><span class=identifier>skip_grammar </span><span class=keyword>const</span><span class=special>& /*</span><span class=identifier>self</span><span class=special>*/) |
---|
| 130 | { |
---|
| 131 | </span><span class=identifier>skip |
---|
| 132 | </span><span class=special>= </span><span class=identifier>space_p |
---|
| 133 | </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n' |
---|
| 134 | </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/" |
---|
| 135 | </span><span class=special>; |
---|
| 136 | } |
---|
| 137 | |
---|
| 138 | </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=identifier>skip</span><span class=special>; |
---|
| 139 | |
---|
| 140 | </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=keyword>const</span><span class=special>& |
---|
| 141 | </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>skip</span><span class=special>; } |
---|
| 142 | }; |
---|
| 143 | };</span></code></pre> |
---|
| 144 | <p> Ok, so far so good. Can we do better? Well... since there are no recursive |
---|
| 145 | rules there (in fact there's only one rule), you can expand the type of rule's |
---|
| 146 | RHS as the rule type (see <a href="../example/techniques/no_rules/no_rule2.cpp">no_rule2.cpp</a>):</p> |
---|
| 147 | <pre><code><span class=special> </span><span class=keyword>struct </span><span class=identifier>skip_grammar </span><span class=special>: </span><span class=identifier>grammar</span><span class=special><</span><span class=identifier>skip_grammar</span><span class=special>> |
---|
| 148 | { |
---|
| 149 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
| 150 | </span><span class=keyword>struct </span><span class=identifier>definition |
---|
| 151 | </span><span class=special>{ |
---|
| 152 | </span> <span class=identifier>definition</span><span class=special>(</span><span class=identifier>skip_grammar </span><span class=keyword>const</span><span class=special>& /*</span><span class=identifier>self</span><span class=special>*/) |
---|
| 153 | : </span><span class=identifier>skip</span><span class=special> |
---|
| 154 | ( </span><span class=identifier>space_p |
---|
| 155 | </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n' |
---|
| 156 | </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/" |
---|
| 157 | </span><span class=special>) |
---|
| 158 | { |
---|
| 159 | } |
---|
| 160 | |
---|
| 161 | </span><span class=keyword>typedef |
---|
| 162 | </span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>space_parser</span><span class=special>, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special>< |
---|
| 163 | </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>, |
---|
| 164 | </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > > >, </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > >, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special>< |
---|
| 165 | </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>, |
---|
| 166 | </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > > >, </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > > |
---|
| 167 | </span><span class=identifier>skip_t</span><span class=special>; |
---|
| 168 | </span><span class=special> </span><span class=identifier>skip_t </span><span class=identifier>skip</span><span class=special>; |
---|
| 169 | |
---|
| 170 | </span><span class=identifier>skip_t </span><span class=keyword>const</span><span class=special>& |
---|
| 171 | </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>skip</span><span class=special>; } |
---|
| 172 | }; |
---|
| 173 | };</span></code></pre> |
---|
| 174 | <p> Ughhh! How did I do that? How was I able to get at the complex typedef? Am |
---|
| 175 | I insane? Well, not really... there's a trick! What you do is define the typedef |
---|
| 176 | <tt>skip_t</tt> first as int:</p> |
---|
| 177 | <pre> |
---|
| 178 | <code><span class=keyword>typedef </span><span class=keyword>int </span><span class=identifier>skip_t</span><span class=special>;</span></code></pre> |
---|
| 179 | <p> Try to compile. Then, the compiler will generate an obnoxious error message |
---|
| 180 | such as:</p> |
---|
| 181 | <pre> |
---|
| 182 | <code><span class=string>"cannot convert boost::spirit::alternative<... blah blah...to int"</span><span class=special>.</span></code></pre> |
---|
| 183 | <p> <strong>THERE YOU GO!</strong> You got it's type! I just copy and paste the |
---|
| 184 | correct type (removing explicit qualifications, if preferred).</p> |
---|
| 185 | <p> Can we still go further? Yes. Remember that the grammar was designed for rules. |
---|
| 186 | The nested template definition class is needed to get around the rule's limitations. |
---|
| 187 | Without rules, I propose a new class called <tt>sub_grammar</tt>, the grammar's |
---|
| 188 | low-fat counterpart:</p> |
---|
| 189 | <pre><code><span class=special> </span><span class=keyword>namespace </span><span class=identifier>boost </span><span class=special>{ </span><span class=keyword>namespace </span><span class=identifier>spirit |
---|
| 190 | </span><span class=special>{ |
---|
| 191 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>> |
---|
| 192 | </span><span class=keyword>struct </span><span class=identifier>sub_grammar </span><span class=special>: </span><span class=identifier>parser</span><span class=special><</span><span class=identifier>DerivedT</span><span class=special>> |
---|
| 193 | { |
---|
| 194 | </span><span class=keyword>typedef </span><span class=identifier>sub_grammar </span><span class=identifier>self_t</span><span class=special>; |
---|
| 195 | </span><span class=keyword>typedef </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>embed_t</span><span class=special>; |
---|
| 196 | |
---|
| 197 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
| 198 | </span><span class=keyword>struct </span><span class=identifier>result |
---|
| 199 | </span><span class=special>{ |
---|
| 200 | </span><span class=keyword>typedef </span><span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special>< |
---|
| 201 | </span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>::</span><span class=identifier>start_t</span><span class=special>, </span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type |
---|
| 202 | </span><span class=identifier>type</span><span class=special>; |
---|
| 203 | }; |
---|
| 204 | |
---|
| 205 | </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>derived</span><span class=special>() </span><span class=keyword>const |
---|
| 206 | </span><span class=special>{ </span><span class=keyword>return </span><span class=special>*</span><span class=keyword>static_cast</span><span class=special><</span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>*>(</span><span class=keyword>this</span><span class=special>); } |
---|
| 207 | |
---|
| 208 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
---|
| 209 | </span><span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special><</span><span class=identifier>self_t</span><span class=special>, </span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type |
---|
| 210 | </span><span class=identifier>parse</span><span class=special>(</span><span class=identifier>ScannerT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>scan</span><span class=special>) </span><span class=keyword>const |
---|
| 211 | </span><span class=special>{ |
---|
| 212 | </span><span class=keyword>return </span><span class=identifier>derived</span><span class=special>().</span><span class=identifier>start</span><span class=special>.</span><span class=identifier>parse</span><span class=special>(</span><span class=identifier>scan</span><span class=special>); |
---|
| 213 | } |
---|
| 214 | }; |
---|
| 215 | }}</span></code></pre> |
---|
| 216 | <p>With the <tt>sub_grammar</tt> class, we can define our skipper grammar this |
---|
| 217 | way (see <a href="../example/techniques/no_rules/no_rule3.cpp">no_rule3.cpp</a>):</p> |
---|
| 218 | <pre><code><span class=special> </span><span class=keyword>struct </span><span class=identifier>skip_grammar </span><span class=special>: </span><span class=identifier>sub_grammar</span><span class=special><</span><span class=identifier>skip_grammar</span><span class=special>> |
---|
| 219 | { |
---|
| 220 | </span><span class=keyword>typedef |
---|
| 221 | </span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>space_parser</span><span class=special>, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special>< |
---|
| 222 | </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>, |
---|
| 223 | </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > > >, </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > >, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special>< |
---|
| 224 | </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>, |
---|
| 225 | </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > > >, </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > > |
---|
| 226 | </span><span class=identifier>start_t</span><span class=special>; |
---|
| 227 | |
---|
| 228 | </span><span class=identifier>skip_grammar</span><span class=special>() |
---|
| 229 | : </span><span class=identifier>start |
---|
| 230 | </span><span class=special>( |
---|
| 231 | </span><span class=identifier>space_p |
---|
| 232 | </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n' |
---|
| 233 | </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/" |
---|
| 234 | </span><span class=special>) |
---|
| 235 | {} |
---|
| 236 | |
---|
| 237 | </span><span class=identifier>start_t </span><span class=identifier>start</span><span class=special>; |
---|
| 238 | };</span></code></pre> |
---|
| 239 | <p>But what for, you ask? You can simply use the <tt>start_t</tt> type above as-is. |
---|
| 240 | It's already a parser! We can just type:</p> |
---|
| 241 | <pre> |
---|
| 242 | <code><span class=identifier>skipper_t </span><span class=identifier>skipper </span><span class=special>= |
---|
| 243 | </span><span class=identifier>space_p |
---|
| 244 | </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n' </span><br> <span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/"</span> |
---|
| 245 | <span class=special> ;</span></code></pre> |
---|
| 246 | <p> and use <tt>skipper</tt> just as we would any parser? Well, a subtle difference |
---|
| 247 | is that <tt>skipper</tt>, used this way will be embedded <strong>by value </strong>when<strong> |
---|
| 248 | </strong>you compose more complex parsers using it. That is, if we use <tt>skipper</tt> |
---|
| 249 | inside another production, the whole thing will be stored in the composite. |
---|
| 250 | Heavy!</p> |
---|
| 251 | <p> The proposed <tt>sub_grammar</tt> OTOH will be held by reference. Note:</p> |
---|
| 252 | <pre><code> <span class=keyword>typedef </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>embed_t</span><span class=special>;</span></code></pre> |
---|
| 253 | <p>The proposed <tt>sub_grammar</tt> does not have the inherent limitations of |
---|
| 254 | rules, is very lighweight, and should be blazingly fast (can be fully inlined |
---|
| 255 | and does not use virtual functions). Perhaps this class will be part of a future |
---|
| 256 | spirit release. </p> |
---|
| 257 | <table width="80%" border="0" align="center"> |
---|
| 258 | <tr> |
---|
| 259 | <td class="note_box"><img src="theme/note.gif" width="16" height="16"> <strong>The |
---|
| 260 | no-rules result</strong><br> <br> |
---|
| 261 | So, how much did we save? On MSVCV7.1, the original code: <a href="../example/techniques/no_rules/no_rule1.cpp">no_rule1.cpp</a> |
---|
| 262 | compiles to <strong>28k</strong>. Eliding rules, <a href="../example/techniques/no_rules/no_rule2.cpp">no_rule2.cpp</a>, |
---|
| 263 | we got <strong>24k</strong>. Not bad, we shaved off 4k amounting to a 14% |
---|
| 264 | reduction. But you'll be in for a surprise. The last version, using the |
---|
| 265 | sub-grammar: <a href="../example/techniques/no_rules/no_rule3.cpp">no_rule3.cpp</a>, |
---|
| 266 | compiles to <strong>5.5k</strong>! That's a whopping 80% reduction.<br> |
---|
| 267 | <br> |
---|
| 268 | <table width="100%" border="1"> |
---|
| 269 | <tr> |
---|
| 270 | <td><a href="../example/techniques/no_rules/no_rule1.cpp">no_rule1.cpp</a></td> |
---|
| 271 | <td><strong>28k</strong></td> |
---|
| 272 | <td>standard rule and grammar</td> |
---|
| 273 | </tr> |
---|
| 274 | <tr> |
---|
| 275 | <td><a href="../example/techniques/no_rules/no_rule2.cpp">no_rule2.cpp</a></td> |
---|
| 276 | <td><strong>24k</strong></td> |
---|
| 277 | <td>standard grammar, no rule</td> |
---|
| 278 | </tr> |
---|
| 279 | <tr> |
---|
| 280 | <td><a href="../example/techniques/no_rules/no_rule3.cpp">no_rule3.cpp</a></td> |
---|
| 281 | <td><strong>5.5k</strong></td> |
---|
| 282 | <td>sub_grammar, no rule, no grammar</td> |
---|
| 283 | </tr> |
---|
| 284 | </table> </td> |
---|
| 285 | </tr> |
---|
| 286 | </table> |
---|
| 287 | <h3><b> <a name="typeof" id="typeof"></a> typeof</b></h3> |
---|
| 288 | <p>Some compilers already support the <tt>typeof</tt> keyword. Examples are g++ |
---|
| 289 | and Metrowerks CodeWarrior. Someday, <tt>typeof</tt> will become commonplace. |
---|
| 290 | It is worth noting that we can use <tt>typeof</tt> to define non-recursive rules |
---|
| 291 | without using the rule class. To give an example, we'll use the skipper example |
---|
| 292 | above; this time using <tt>typeof</tt>. First, to avoid redundancy, we'll introduce |
---|
| 293 | a macro <tt>RULE</tt>: </p> |
---|
| 294 | <pre><code> <span class=preprocessor>#define </span><span class=identifier>RULE</span><span class=special>(</span><span class=identifier>name</span><span class=special>, </span><span class=identifier>definition</span><span class=special>) </span><span class="keyword">typeof</span><span class=special>(</span><span class=identifier>definition</span><span class=special>) </span><span class=identifier>name </span><span class=special>= </span><span class=identifier>definition</span></code></pre> |
---|
| 295 | <p>Then, simply:</p> |
---|
| 296 | <pre><code><span class=identifier> </span><span class=identifier>RULE</span><span class=special>( |
---|
| 297 | </span><span class=identifier>skipper</span><span class=special>, |
---|
| 298 | ( </span><span class=identifier>space_p |
---|
| 299 | </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n' |
---|
| 300 | </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/" |
---|
| 301 | </span><span class=special>) |
---|
| 302 | );</span></code></pre> |
---|
| 303 | <p>(see <a href="../example/techniques/typeof.cpp">typeof.cpp</a>)</p> |
---|
| 304 | <p>That's it! Now you can use skipper just as you would any parser. Be reminded, |
---|
| 305 | however, that <tt>skipper</tt> above will be embedded by value when<strong> |
---|
| 306 | </strong>you compose more complex parsers using it (see <tt>sub_grammar</tt> rationale above). You can use the <tt>sub_grammar</tt> class to avoid this problem.</p> |
---|
| 307 | <h3><a name="nabialek_trick"></a> Nabialek trick</h3> |
---|
| 308 | <p>This technique, I'll call the <strong><em>"Nabialek trick" </em></strong>(from the name of its inventor, Sam Nabialek), can improve the rule dispatch from linear non-deterministic to deterministic. The trick applies to grammars where a keyword (operator, etc), precedes a production. There are lots of grammars similar to this:</p> |
---|
| 309 | <pre> <span class=identifier>r </span><span class=special>= |
---|
| 310 | </span><span class=identifier>keyword1 </span><span class=special>>> </span><span class=identifier>production1 |
---|
| 311 | </span><span class=special>| </span><span class=identifier>keyword2 </span><span class=special>>> </span><span class=identifier>production2 |
---|
| 312 | </span><span class=special>| </span><span class=identifier>keyword3 </span><span class=special>>> </span><span class=identifier>production3 |
---|
| 313 | </span><span class=special>| </span><span class=identifier>keyword4 </span><span class=special>>> </span><span class=identifier>production4 |
---|
| 314 | </span><span class=special>| </span><span class=identifier>keyword5 </span><span class=special>>> </span><span class=identifier>production5 |
---|
| 315 | </span><span class=comment>/*** etc ***/ |
---|
| 316 | </span><span class=special>;</span></pre> |
---|
| 317 | <p>The cascaded alternatives are tried one at a time through trial and error until something matches. The Nabialek trick takes advantage of the <a href="symbols.html">symbol table</a>'s search properties to optimize the dispatching of the alternatives. For an example, see <a href="../example/techniques/nabialek.cpp">nabialek.cpp</a>. The grammar works as follows. There are two rules (<tt>one</tt> and <tt>two</tt>). When "one" is recognized, rule <tt>one</tt> is invoked. When "two" is recognized, rule <tt>two</tt> is invoked. Here's the grammar:</p> |
---|
| 318 | <pre><span class=special> </span><span class=identifier>one </span><span class=special>= </span><span class=identifier>name</span><span class=special>; |
---|
| 319 | </span><span class=identifier>two </span><span class=special>= </span><span class=identifier>name </span><span class=special>>> </span><span class=literal>',' </span><span class=special>>> </span><span class=identifier>name</span><span class=special>; |
---|
| 320 | |
---|
| 321 | </span><span class=identifier>continuations</span><span class=special>.</span><span class=identifier>add |
---|
| 322 | </span><span class=special>(</span><span class=string>"one"</span><span class=special>, &</span><span class=identifier>one</span><span class=special>) |
---|
| 323 | </span><span class=special>(</span><span class=string>"two"</span><span class=special>, &</span><span class=identifier>two</span><span class=special>) |
---|
| 324 | </span><span class=special>; |
---|
| 325 | |
---|
| 326 | </span><span class=identifier>line </span><span class=special>= </span><span class=identifier>continuations</span><span class=special>[</span><span class=identifier>set_rest</span><span class=special><</span><span class=identifier>rule_t</span><span class=special>>(</span><span class=identifier>rest</span><span class=special>)] </span><span class=special>>> </span><span class=identifier>rest</span><span class=special>;</span></pre> |
---|
| 327 | <p>where continuations is a <a href="symbols.html">symbol table</a> with pointer to rule_t slots. one, two, name, line and rest are rules:</p> |
---|
| 328 | <pre><span class=special> </span><span class=identifier>rule_t </span><span class=identifier>name</span><span class=special>; |
---|
| 329 | </span><span class=identifier>rule_t </span><span class=identifier>line</span><span class=special>; |
---|
| 330 | </span><span class=identifier>rule_t </span><span class=identifier>rest</span><span class=special>; |
---|
| 331 | </span><span class=identifier>rule_t </span><span class=identifier>one</span><span class=special>; |
---|
| 332 | </span><span class=identifier>rule_t </span><span class=identifier>two</span><span class=special>; |
---|
| 333 | |
---|
| 334 | </span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>rule_t</span><span class=special>*> </span><span class=identifier>continuations</span><span class=special>;</span></pre> |
---|
| 335 | <p>set_rest, the semantic action attached to continuations is:</p> |
---|
| 336 | <pre><span class=special> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>Rule</span><span class=special>> |
---|
| 337 | </span><span class=keyword>struct </span><span class=identifier>set_rest |
---|
| 338 | </span><span class=special>{ |
---|
| 339 | </span><span class=identifier>set_rest</span><span class=special>(</span><span class=identifier>Rule</span><span class=special>& </span><span class=identifier>the_rule</span><span class=special>) |
---|
| 340 | </span><span class=special>: </span><span class=identifier>the_rule</span><span class=special>(</span><span class=identifier>the_rule</span><span class=special>) </span><span class=special>{} |
---|
| 341 | |
---|
| 342 | </span><span class=keyword>void </span><span class=keyword>operator</span><span class=special>()(</span><span class=identifier>Rule</span><span class=special>* </span><span class=identifier>newRule</span><span class=special>) </span><span class=keyword>const |
---|
| 343 | </span><span class=special>{ </span><span class=identifier>m_theRule </span><span class=special>= </span><span class=special>*</span><span class=identifier>newRule</span><span class=special>; </span><span class=special>} |
---|
| 344 | |
---|
| 345 | </span><span class=identifier>Rule</span><span class=special>& </span><span class=identifier>the_rule</span><span class=special>; |
---|
| 346 | </span><span class=special>};</span></pre> |
---|
| 347 | <p>Notice how the rest <tt>rule</tt> gets set dynamically when the set_rule action is called. The dynamic grammar parses inputs such as:</p> |
---|
| 348 | <p> "one only"<br> |
---|
| 349 | "one again"<br> |
---|
| 350 | "two first, second"</p> |
---|
| 351 | <p>The cool part is that the <tt>rest</tt> rule is set (by the <tt>set_rest</tt> action) depending on what the symbol table got. If it got a <em>"one"</em> then rest = one. If it got <em>"two"</em>, then rest = two. Very nifty! This technique should be very fast, especially when there are lots of keywords. It would be nice to add special facilities to make this easy to use. I imagine:</p> |
---|
| 352 | <pre><span class=special> </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>keywords </span><span class=special>>> </span><span class=identifier>rest</span><span class=special>;</span></pre> |
---|
| 353 | <p>where <tt>keywords</tt> is a special parser (based on the symbol table) that automatically sets its RHS (rest) depending on the acquired symbol. This, I think, is mighty cool! Someday perhaps... </p> |
---|
| 354 | <p><img src="theme/note.gif" width="16" height="16"> Also, see the <a href="switch_parser.html">switch parser</a> for another deterministic parsing trick for character/token prefixes. </p> |
---|
| 355 | <span class=special></span> |
---|
| 356 | <table border="0"> |
---|
| 357 | <tr> |
---|
| 358 | <td width="10"></td> |
---|
| 359 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
| 360 | <td width="30"><a href="style_guide.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
| 361 | <td width="30"><a href="faq.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
| 362 | </tr> |
---|
| 363 | </table> |
---|
| 364 | <br> |
---|
| 365 | <hr size="1"> |
---|
| 366 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
---|
| 367 | <br> |
---|
| 368 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
| 369 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
| 370 | http://www.boost.org/LICENSE_1_0.txt)</font></p> |
---|
| 371 | <p class="copyright"> </p> |
---|
| 372 | </body> |
---|
| 373 | </html> |
---|