1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
---|
2 | <html> |
---|
3 | <head> |
---|
4 | <title>Confix Parsers</title> |
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
6 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
7 | </head> |
---|
8 | |
---|
9 | <body> |
---|
10 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
11 | <tr> |
---|
12 | <td width="10"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b> </b></font></td> |
---|
13 | <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Confix Parsers</b></font></td> |
---|
14 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
15 | </tr> |
---|
16 | </table> |
---|
17 | <br> |
---|
18 | <table border="0"> |
---|
19 | <tr> |
---|
20 | <td width="10"></td> |
---|
21 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
22 | <td width="30"><a href="character_sets.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
23 | <td width="30"><a href="list_parsers.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
24 | </tr> |
---|
25 | </table> |
---|
26 | <p><a name="confix_parser"></a><b>Confix Parsers</b></p> |
---|
27 | <p>Confix Parsers recognize a sequence out of three independent elements: an |
---|
28 | opening, an expression and a closing. A simple example is a C comment: |
---|
29 | </p> |
---|
30 | <pre><code class="comment"> /* This is a C comment */</code></pre> |
---|
31 | <p>which could be parsed through the following rule definition:<code><font color="#000000"> |
---|
32 | </font></code> </p> |
---|
33 | <pre><span class=identifier> </span><span class=identifier>rule</span><span class=special><> </span><span class=identifier>c_comment_rule |
---|
34 | </span><span class=special>= </span><span class=identifier>confix_p</span><span class=special>(</span><span class=literal>"/*"</span><span class=special>, </span><span class=special>*</span><span class=identifier>anychar_p</span><span class=special>, </span><span class=literal>"*/"</span><span class=special>) |
---|
35 | </span><span class=special>;</span></pre> |
---|
36 | <p>The <tt>confix_p</tt> parser generator |
---|
37 | should be used for generating the required Confix Parser. The |
---|
38 | three parameters to <tt>confix_p</tt> can be single |
---|
39 | characters (as above), strings or, if more complex parsing logic is required, |
---|
40 | auxiliary parsers, each of which is automatically converted to the corresponding |
---|
41 | parser type needed for successful parsing.</p> |
---|
42 | <p>The generated parser is equivalent to the following rule: </p> |
---|
43 | <pre><code> <span class=identifier>open </span><span class=special>>> (</span><span class=identifier>expr </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
44 | <p>If the expr parser is an <tt>action_parser_category</tt> type parser (a parser |
---|
45 | with an attached semantic action) we have to do something special. This happens, |
---|
46 | if the user wrote something like:</p> |
---|
47 | <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, </span><span class=identifier>expr</span><span class=special>[</span><span class=identifier>func</span><span class=special>], </span><span class=identifier>close</span><span class=special>)</span></code></pre> |
---|
48 | <p>where <code>expr</code> is the parser matching the expr of the confix sequence |
---|
49 | and <code>func</code> is a functor to be called after matching the <code>expr</code>. |
---|
50 | If we would do nothing, the resulting code would parse the sequence as follows:</p> |
---|
51 | <pre><code> <span class=identifier>open </span><span class=special>>> (</span><span class=identifier>expr</span><span class=special>[</span><span class=identifier>func</span><span class=special>] - </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
52 | <p>which in most cases is not what the user expects. (If this <u>is</u> what you've |
---|
53 | expected, then please use the <tt>confix_p</tt> generator |
---|
54 | function <tt>direct()</tt>, which will inhibit the parser refactoring). To make |
---|
55 | the confix parser behave as expected:</p> |
---|
56 | <pre><code><span class=identifier> open </span><span class=special>>> (</span><span class=identifier>expr </span><span class=special>- </span><span class=identifier>close</span><span class=special>)[</span><span class=identifier>func</span><span class=special>] >> </span><span class=identifier>close</span></code></pre> |
---|
57 | <p>the actor attached to the <code>expr</code> parser has to be re-attached to |
---|
58 | the <code>(expr - close)</code> parser construct, which will make the resulting |
---|
59 | confix parser 'do the right thing'. This refactoring is done by the help of |
---|
60 | the <a href="refactoring.html">Refactoring Parsers</a>. Additionally special |
---|
61 | care must be taken, if the expr parser is a <tt>unary_parser_category</tt> type |
---|
62 | parser as </p> |
---|
63 | <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, *</span><span class=identifier>anychar_p</span><span class=special>, </span><span class=identifier>close</span><span class=special>)</span></code></pre> |
---|
64 | <p>which without any refactoring would result in </p> |
---|
65 | <pre><code> <span class=identifier>open</span> <span class=special>>> (*</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
66 | <p>and will not give the expected result (*anychar_p will eat up all the input up |
---|
67 | to the end of the input stream). So we have to refactor this into: |
---|
68 | <pre><code><span class=identifier> open </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
---|
69 | <p>what will give the correct result. </p> |
---|
70 | <p>The case, where the expr parser is a combination of the two mentioned problems |
---|
71 | (i.e. the expr parser is a unary parser with an attached action), is handled |
---|
72 | accordingly too, so: </p> |
---|
73 | <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, (*</span><span class=identifier>anychar_p</span><span class=special>)[</span><span class=identifier>func</span><span class=special>], </span>close<span class=special>)</span></code></pre> |
---|
74 | <p>will be parsed as expected: </p> |
---|
75 | <pre><code> <span class=identifier>open</span> <span class=special>>> (*(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>end</span><span class=special>))[</span><span class=identifier>func</span><span class=special>] >> </span>close</code></pre> |
---|
76 | <p>The required refactoring is implemented here with the help of the <a href="refactoring.html">Refactoring |
---|
77 | Parsers</a> too.</p> |
---|
78 | <table width="90%" border="0" align="center"> |
---|
79 | <tr> |
---|
80 | <td colspan="2" class="table_title"><b>Summary of Confix Parser refactorings</b></td> |
---|
81 | </tr> |
---|
82 | <tr class="table_title"> |
---|
83 | <td width="40%"><b>You write it as:</b></td> |
---|
84 | <td width="60%"><code><font face="Verdana, Arial, Helvetica, sans-serif">It |
---|
85 | is refactored to:</font></code></td> |
---|
86 | </tr> |
---|
87 | <tr> |
---|
88 | <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">,</span> |
---|
89 | expr<span class="special">,</span> close<span class="special">)</span></code></td> |
---|
90 | <td width="60%" class="table_cells"> <p><code>open <span class=special>>> |
---|
91 | (</span>expr <span class=special>-</span> close<span class=special>)</span><font color="#0000FF"> |
---|
92 | </font><span class=special>>></span> close</code></p> |
---|
93 | </td> |
---|
94 | </tr> |
---|
95 | <tr> |
---|
96 | <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">,</span> |
---|
97 | expr<span class="special">[</span>func<span class="special">],</span> close<span class="special">)</span></code></td> |
---|
98 | <td width="60%" class="table_cells"> <p><code>open <span class=special>>> |
---|
99 | (</span>expr <span class=special>-</span> close<span class="special">)[</span>func<span class="special">] |
---|
100 | <font color="#0000FF" class="special">>></font></span> close</code></p> |
---|
101 | </td> |
---|
102 | </tr> |
---|
103 | <tr> |
---|
104 | <td width="40%" class="table_cells" height="9"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">, |
---|
105 | *</span>expr<span class="special">,</span> close<span class="special">)</span></code></td> |
---|
106 | <td width="60%" class="table_cells" height="9"> <p><code>open <font color="#0000FF"><span class="special">>></span></font> |
---|
107 | <span class="special"><font color="#0000FF" class="special">*</font>(</span>expr |
---|
108 | <font color="#0000FF" class="special">-</font> close<span class="special">) |
---|
109 | <font color="#0000FF" class="special">>></font></span> close</code></p> |
---|
110 | </td> |
---|
111 | </tr> |
---|
112 | <tr> |
---|
113 | <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">, |
---|
114 | (*</span>expr<span class="special">)[</span>func<span class="special">], |
---|
115 | close</span><span class="special">)</span></code></td> |
---|
116 | <td width="60%" class="table_cells"> <p><code>open <font color="#0000FF"><span class="special">>></span></font><span class="special"> |
---|
117 | (<font color="#0000FF" class="special">*</font>(</span>expr <font color="#0000FF" class="special">-</font> |
---|
118 | close<span class="special">))[</span>func<span class="special">] <font color="#0000FF" class="special">>></font></span> |
---|
119 | close</code></p> |
---|
120 | </td> |
---|
121 | </tr> |
---|
122 | </table> |
---|
123 | <p><a name="comment_parsers"></a><b>Comment Parsers</b></p> |
---|
124 | <p>The Comment Parser generator template <tt>comment_p</tt> |
---|
125 | is helper for generating a correct <a href="#confix_parser">Confix Parser</a> |
---|
126 | from auxiliary parameters, which is able to parse comment constructs as follows: |
---|
127 | </p> |
---|
128 | <pre><code> StartCommentToken <span class="special">>></span> Comment text <span class="special">>></span> EndCommentToken</code></pre> |
---|
129 | <p>There are the following types supported as parameters: parsers, single |
---|
130 | characters and strings (see as_parser). If it |
---|
131 | is used with one parameter, a comment starting with the given first parser |
---|
132 | parameter up to the end of the line is matched. So for instance the following |
---|
133 | parser matches C++ style comments:</p> |
---|
134 | |
---|
135 | <pre><code><span class=identifier> comment_p</span><span class=special>(</span><span class=string>"//"</span><span class=special>)</span></code></pre> |
---|
136 | <p>If it is used with two parameters, a comment starting with the first parser |
---|
137 | parameter up to the second parser parameter is matched. For instance a C style |
---|
138 | comment parser could be constrcuted as:</p> |
---|
139 | <pre><code> <span class=identifier>comment_p</span><span class=special>(</span><span class=string>"/*"</span><span class=special>, </span><span class=string>"*/"</span><span class=special>)</span></code></pre> |
---|
140 | <p>The <tt>comment_p</tt> parser generator allows to generate parsers for matching |
---|
141 | non-nested comments (as for C/C++ comments). Sometimes it is necessary to parse |
---|
142 | nested comments as for instance allowed in Pascal.</p> |
---|
143 | <pre><code class="comment"> { This is a { nested } PASCAL-comment }</code></pre> |
---|
144 | <p>Such nested comments are |
---|
145 | parseable through parsers generated by the <tt>comment_nest_p</tt> generator |
---|
146 | template functor. The following example shows a parser, which can be used for |
---|
147 | parsing the two different (nestable) Pascal comment styles:</p> |
---|
148 | <pre><code> <span class=identifier>rule</span><span class=special><> </span><span class=identifier>pascal_comment |
---|
149 | </span><span class=special>= </span><span class=identifier>comment_nest_p</span><span class=special>(</span><span class=string>"(*"</span><span class=special>, </span><span class=string>"*)"</span><span class=special>) |
---|
150 | | </span><span class=identifier>comment_nest_p</span><span class=special>(</span><span class=literal>'{'</span><span class=special>, </span><span class=literal>'}'</span><span class=special>) |
---|
151 | ;</span></code></pre> |
---|
152 | <p>Please note, that a comment is parsed implicitly as if the whole <tt>comment_p(...)</tt> |
---|
153 | statement were embedded into a <tt>lexeme_d[]</tt> directive, i.e. during parsing |
---|
154 | of a comment no token skipping will occur, even if you've defined a skip parser |
---|
155 | for your whole parsing process.</p> |
---|
156 | <p> <img height="16" width="15" src="theme/lens.gif"> <a href="../example/fundamental/comments.cpp">comments.cpp</a> demonstrates various comment parsing schemes: </p> |
---|
157 | <ol> |
---|
158 | <li>Parsing of different comment styles </li> |
---|
159 | <ul> |
---|
160 | <li>parsing C/C++-style comment</li> |
---|
161 | <li>parsing C++-style comment</li> |
---|
162 | <li>parsing PASCAL-style comment</li> |
---|
163 | </ul> |
---|
164 | <li>Parsing tagged data with the help of the confix_parser</li> |
---|
165 | <li>Parsing tagged data with the help of the confix_parser but the semantic<br> |
---|
166 | action is directly attached to the body sequence parser</li> |
---|
167 | </ol> |
---|
168 | <p>This is part of the Spirit distribution.</p> |
---|
169 | <table border="0"> |
---|
170 | <tr> |
---|
171 | <td width="10"></td> |
---|
172 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
173 | <td width="30"><a href="character_sets.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
174 | <td width="30"><a href="list_parsers.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
175 | </tr> |
---|
176 | </table> |
---|
177 | <br> |
---|
178 | <hr size="1"> |
---|
179 | <p class="copyright">Copyright © 2001-2002 Hartmut Kaiser<br> |
---|
180 | <br> |
---|
181 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
182 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
183 | http://www.boost.org/LICENSE_1_0.txt) </font> </p> |
---|
184 | </body> |
---|
185 | </html> |
---|