[12] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
---|
| 2 | <html> |
---|
| 3 | <head> |
---|
| 4 | <title>Confix Parsers</title> |
---|
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
| 6 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
| 7 | </head> |
---|
| 8 | |
---|
| 9 | <body> |
---|
| 10 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
| 11 | <tr> |
---|
| 12 | <td width="10"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b> </b></font></td> |
---|
| 13 | <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Confix Parsers</b></font></td> |
---|
| 14 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
| 15 | </tr> |
---|
| 16 | </table> |
---|
| 17 | <br> |
---|
| 18 | <table border="0"> |
---|
| 19 | <tr> |
---|
| 20 | <td width="10"></td> |
---|
| 21 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
| 22 | <td width="30"><a href="character_sets.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
| 23 | <td width="30"><a href="list_parsers.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
| 24 | </tr> |
---|
| 25 | </table> |
---|
| 26 | <p><a name="confix_parser"></a><b>Confix Parsers</b></p> |
---|
| 27 | <p>Confix Parsers recognize a sequence out of three independent elements: an |
---|
| 28 | opening, an expression and a closing. A simple example is a C comment: |
---|
| 29 | </p> |
---|
| 30 | <pre><code class="comment"> /* This is a C comment */</code></pre> |
---|
| 31 | <p>which could be parsed through the following rule definition:<code><font color="#000000"> |
---|
| 32 | </font></code> </p> |
---|
| 33 | <pre><span class=identifier> </span><span class=identifier>rule</span><span class=special><> </span><span class=identifier>c_comment_rule |
---|
| 34 | </span><span class=special>= </span><span class=identifier>confix_p</span><span class=special>(</span><span class=literal>"/*"</span><span class=special>, </span><span class=special>*</span><span class=identifier>anychar_p</span><span class=special>, </span><span class=literal>"*/"</span><span class=special>) |
---|
| 35 | </span><span class=special>;</span></pre> |
---|
| 36 | <p>The <tt>confix_p</tt> parser generator |
---|
| 37 | should be used for generating the required Confix Parser. The |
---|
| 38 | three parameters to <tt>confix_p</tt> can be single |
---|
| 39 | characters (as above), strings or, if more complex parsing logic is required, |
---|
| 40 | auxiliary parsers, each of which is automatically converted to the corresponding |
---|
| 41 | parser type needed for successful parsing.</p> |
---|
| 42 | <p>The generated parser is equivalent to the following rule: </p> |
---|
| 43 | <pre><code> <span class=identifier>open </span><span class=special>>> (</span><span class=identifier>expr </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
| 44 | <p>If the expr parser is an <tt>action_parser_category</tt> type parser (a parser |
---|
| 45 | with an attached semantic action) we have to do something special. This happens, |
---|
| 46 | if the user wrote something like:</p> |
---|
| 47 | <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, </span><span class=identifier>expr</span><span class=special>[</span><span class=identifier>func</span><span class=special>], </span><span class=identifier>close</span><span class=special>)</span></code></pre> |
---|
| 48 | <p>where <code>expr</code> is the parser matching the expr of the confix sequence |
---|
| 49 | and <code>func</code> is a functor to be called after matching the <code>expr</code>. |
---|
| 50 | If we would do nothing, the resulting code would parse the sequence as follows:</p> |
---|
| 51 | <pre><code> <span class=identifier>open </span><span class=special>>> (</span><span class=identifier>expr</span><span class=special>[</span><span class=identifier>func</span><span class=special>] - </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
| 52 | <p>which in most cases is not what the user expects. (If this <u>is</u> what you've |
---|
| 53 | expected, then please use the <tt>confix_p</tt> generator |
---|
| 54 | function <tt>direct()</tt>, which will inhibit the parser refactoring). To make |
---|
| 55 | the confix parser behave as expected:</p> |
---|
| 56 | <pre><code><span class=identifier> open </span><span class=special>>> (</span><span class=identifier>expr </span><span class=special>- </span><span class=identifier>close</span><span class=special>)[</span><span class=identifier>func</span><span class=special>] >> </span><span class=identifier>close</span></code></pre> |
---|
| 57 | <p>the actor attached to the <code>expr</code> parser has to be re-attached to |
---|
| 58 | the <code>(expr - close)</code> parser construct, which will make the resulting |
---|
| 59 | confix parser 'do the right thing'. This refactoring is done by the help of |
---|
| 60 | the <a href="refactoring.html">Refactoring Parsers</a>. Additionally special |
---|
| 61 | care must be taken, if the expr parser is a <tt>unary_parser_category</tt> type |
---|
| 62 | parser as </p> |
---|
| 63 | <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, *</span><span class=identifier>anychar_p</span><span class=special>, </span><span class=identifier>close</span><span class=special>)</span></code></pre> |
---|
| 64 | <p>which without any refactoring would result in </p> |
---|
| 65 | <pre><code> <span class=identifier>open</span> <span class=special>>> (*</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
| 66 | <p>and will not give the expected result (*anychar_p will eat up all the input up |
---|
| 67 | to the end of the input stream). So we have to refactor this into: |
---|
| 68 | <pre><code><span class=identifier> open </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
| 69 | <p>what will give the correct result. </p> |
---|
| 70 | <p>The case, where the expr parser is a combination of the two mentioned problems |
---|
| 71 | (i.e. the expr parser is a unary parser with an attached action), is handled |
---|
| 72 | accordingly too, so: </p> |
---|
| 73 | <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, (*</span><span class=identifier>anychar_p</span><span class=special>)[</span><span class=identifier>func</span><span class=special>], </span>close<span class=special>)</span></code></pre> |
---|
| 74 | <p>will be parsed as expected: </p> |
---|
| 75 | <pre><code> <span class=identifier>open</span> <span class=special>>> (*(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>end</span><span class=special>))[</span><span class=identifier>func</span><span class=special>] >> </span>close</code></pre> |
---|
| 76 | <p>The required refactoring is implemented here with the help of the <a href="refactoring.html">Refactoring |
---|
| 77 | Parsers</a> too.</p> |
---|
| 78 | <table width="90%" border="0" align="center"> |
---|
| 79 | <tr> |
---|
| 80 | <td colspan="2" class="table_title"><b>Summary of Confix Parser refactorings</b></td> |
---|
| 81 | </tr> |
---|
| 82 | <tr class="table_title"> |
---|
| 83 | <td width="40%"><b>You write it as:</b></td> |
---|
| 84 | <td width="60%"><code><font face="Verdana, Arial, Helvetica, sans-serif">It |
---|
| 85 | is refactored to:</font></code></td> |
---|
| 86 | </tr> |
---|
| 87 | <tr> |
---|
| 88 | <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">,</span> |
---|
| 89 | expr<span class="special">,</span> close<span class="special">)</span></code></td> |
---|
| 90 | <td width="60%" class="table_cells"> <p><code>open <span class=special>>> |
---|
| 91 | (</span>expr <span class=special>-</span> close<span class=special>)</span><font color="#0000FF"> |
---|
| 92 | </font><span class=special>>></span> close</code></p> |
---|
| 93 | </td> |
---|
| 94 | </tr> |
---|
| 95 | <tr> |
---|
| 96 | <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">,</span> |
---|
| 97 | expr<span class="special">[</span>func<span class="special">],</span> close<span class="special">)</span></code></td> |
---|
| 98 | <td width="60%" class="table_cells"> <p><code>open <span class=special>>> |
---|
| 99 | (</span>expr <span class=special>-</span> close<span class="special">)[</span>func<span class="special">] |
---|
| 100 | <font color="#0000FF" class="special">>></font></span> close</code></p> |
---|
| 101 | </td> |
---|
| 102 | </tr> |
---|
| 103 | <tr> |
---|
| 104 | <td width="40%" class="table_cells" height="9"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">, |
---|
| 105 | *</span>expr<span class="special">,</span> close<span class="special">)</span></code></td> |
---|
| 106 | <td width="60%" class="table_cells" height="9"> <p><code>open <font color="#0000FF"><span class="special">>></span></font> |
---|
| 107 | <span class="special"><font color="#0000FF" class="special">*</font>(</span>expr |
---|
| 108 | <font color="#0000FF" class="special">-</font> close<span class="special">) |
---|
| 109 | <font color="#0000FF" class="special">>></font></span> close</code></p> |
---|
| 110 | </td> |
---|
| 111 | </tr> |
---|
| 112 | <tr> |
---|
| 113 | <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">, |
---|
| 114 | (*</span>expr<span class="special">)[</span>func<span class="special">], |
---|
| 115 | close</span><span class="special">)</span></code></td> |
---|
| 116 | <td width="60%" class="table_cells"> <p><code>open <font color="#0000FF"><span class="special">>></span></font><span class="special"> |
---|
| 117 | (<font color="#0000FF" class="special">*</font>(</span>expr <font color="#0000FF" class="special">-</font> |
---|
| 118 | close<span class="special">))[</span>func<span class="special">] <font color="#0000FF" class="special">>></font></span> |
---|
| 119 | close</code></p> |
---|
| 120 | </td> |
---|
| 121 | </tr> |
---|
| 122 | </table> |
---|
| 123 | <p><a name="comment_parsers"></a><b>Comment Parsers</b></p> |
---|
| 124 | <p>The Comment Parser generator template <tt>comment_p</tt> |
---|
| 125 | is helper for generating a correct <a href="#confix_parser">Confix Parser</a> |
---|
| 126 | from auxiliary parameters, which is able to parse comment constructs as follows: |
---|
| 127 | </p> |
---|
| 128 | <pre><code> StartCommentToken <span class="special">>></span> Comment text <span class="special">>></span> EndCommentToken</code></pre> |
---|
| 129 | <p>There are the following types supported as parameters: parsers, single |
---|
| 130 | characters and strings (see as_parser). If it |
---|
| 131 | is used with one parameter, a comment starting with the given first parser |
---|
| 132 | parameter up to the end of the line is matched. So for instance the following |
---|
| 133 | parser matches C++ style comments:</p> |
---|
| 134 | |
---|
| 135 | <pre><code><span class=identifier> comment_p</span><span class=special>(</span><span class=string>"//"</span><span class=special>)</span></code></pre> |
---|
| 136 | <p>If it is used with two parameters, a comment starting with the first parser |
---|
| 137 | parameter up to the second parser parameter is matched. For instance a C style |
---|
| 138 | comment parser could be constrcuted as:</p> |
---|
| 139 | <pre><code> <span class=identifier>comment_p</span><span class=special>(</span><span class=string>"/*"</span><span class=special>, </span><span class=string>"*/"</span><span class=special>)</span></code></pre> |
---|
| 140 | <p>The <tt>comment_p</tt> parser generator allows to generate parsers for matching |
---|
| 141 | non-nested comments (as for C/C++ comments). Sometimes it is necessary to parse |
---|
| 142 | nested comments as for instance allowed in Pascal.</p> |
---|
| 143 | <pre><code class="comment"> { This is a { nested } PASCAL-comment }</code></pre> |
---|
| 144 | <p>Such nested comments are |
---|
| 145 | parseable through parsers generated by the <tt>comment_nest_p</tt> generator |
---|
| 146 | template functor. The following example shows a parser, which can be used for |
---|
| 147 | parsing the two different (nestable) Pascal comment styles:</p> |
---|
| 148 | <pre><code> <span class=identifier>rule</span><span class=special><> </span><span class=identifier>pascal_comment |
---|
| 149 | </span><span class=special>= </span><span class=identifier>comment_nest_p</span><span class=special>(</span><span class=string>"(*"</span><span class=special>, </span><span class=string>"*)"</span><span class=special>) |
---|
| 150 | | </span><span class=identifier>comment_nest_p</span><span class=special>(</span><span class=literal>'{'</span><span class=special>, </span><span class=literal>'}'</span><span class=special>) |
---|
| 151 | ;</span></code></pre> |
---|
| 152 | <p>Please note, that a comment is parsed implicitly as if the whole <tt>comment_p(...)</tt> |
---|
| 153 | statement were embedded into a <tt>lexeme_d[]</tt> directive, i.e. during parsing |
---|
| 154 | of a comment no token skipping will occur, even if you've defined a skip parser |
---|
| 155 | for your whole parsing process.</p> |
---|
| 156 | <p> <img height="16" width="15" src="theme/lens.gif"> <a href="../example/fundamental/comments.cpp">comments.cpp</a> demonstrates various comment parsing schemes: </p> |
---|
| 157 | <ol> |
---|
| 158 | <li>Parsing of different comment styles </li> |
---|
| 159 | <ul> |
---|
| 160 | <li>parsing C/C++-style comment</li> |
---|
| 161 | <li>parsing C++-style comment</li> |
---|
| 162 | <li>parsing PASCAL-style comment</li> |
---|
| 163 | </ul> |
---|
| 164 | <li>Parsing tagged data with the help of the confix_parser</li> |
---|
| 165 | <li>Parsing tagged data with the help of the confix_parser but the semantic<br> |
---|
| 166 | action is directly attached to the body sequence parser</li> |
---|
| 167 | </ol> |
---|
| 168 | <p>This is part of the Spirit distribution.</p> |
---|
| 169 | <table border="0"> |
---|
| 170 | <tr> |
---|
| 171 | <td width="10"></td> |
---|
| 172 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
| 173 | <td width="30"><a href="character_sets.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
| 174 | <td width="30"><a href="list_parsers.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
| 175 | </tr> |
---|
| 176 | </table> |
---|
| 177 | <br> |
---|
| 178 | <hr size="1"> |
---|
| 179 | <p class="copyright">Copyright © 2001-2002 Hartmut Kaiser<br> |
---|
| 180 | <br> |
---|
| 181 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
| 182 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
| 183 | http://www.boost.org/LICENSE_1_0.txt) </font> </p> |
---|
| 184 | </body> |
---|
| 185 | </html> |
---|