1 | <html> |
---|
2 | <head> |
---|
3 | <title>Primitives</title> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
6 | </head> |
---|
7 | |
---|
8 | <body> |
---|
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
10 | <tr> |
---|
11 | <td width="10"> |
---|
12 | </td> |
---|
13 | <td width="85%"> |
---|
14 | <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Primitives</b></font> |
---|
15 | </td> |
---|
16 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
17 | </tr> |
---|
18 | </table> |
---|
19 | <br> |
---|
20 | <table border="0"> |
---|
21 | <tr> |
---|
22 | <td width="10"></td> |
---|
23 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
24 | <td width="30"><a href="organization.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
25 | <td width="30"><a href="operators.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
26 | </tr> |
---|
27 | </table> |
---|
28 | <p>The framework predefines some parser primitives. These are the most basic building |
---|
29 | blocks that the client uses to build more complex parsers. These primitive parsers |
---|
30 | are template classes, making them very flexible.</p> |
---|
31 | <p>These primitive parsers can be instantiated directly or through a templatized |
---|
32 | helper function. Generally, the helper function is far simpler to deal with |
---|
33 | as it involves less typing.</p> |
---|
34 | <p>We have seen the character literal parser before through the generator function |
---|
35 | <tt>ch_p</tt> which is not really a parser but, rather, a parser generator. |
---|
36 | Class <tt>chlit<CharT></tt> is the actual template class behind the character |
---|
37 | literal parser. To instantiate a <tt>chlit</tt> object, you must explicitly |
---|
38 | provide the character type, <tt>CharT</tt>, as a template parameter which determines |
---|
39 | the type of the character. This type typically corresponds to the input type, |
---|
40 | usually <tt>char</tt> or <tt>wchar_t</tt>. The following expression creates |
---|
41 | a temporary parser object which will recognize the single letter <span class="quotes">'X'</span>.</p> |
---|
42 | <pre><code><font color="#000000"><span class=identifier> </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>>(</span><span class=literal>'X'</span><span class=special>);</span></font></code></pre> |
---|
43 | <p>Using <tt>chlit</tt>'s generator function <tt>ch_p</tt> simplifies the usage |
---|
44 | of the <tt>chlit<></tt> class (this is true of most Spirit parser classes |
---|
45 | since most have corresponding generator functions). It is convenient to call |
---|
46 | the function because the compiler will deduce the template type through argument |
---|
47 | deduction for us. The example above could be expressed less verbosely using |
---|
48 | the <tt>ch_p </tt>helper function. </p> |
---|
49 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'X'</span><span class=special>) </span><span class=comment>// equivalent to chlit<char>('X') object</span></font></code></pre> |
---|
50 | <table width="80%" border="0" align="center"> |
---|
51 | <tr> |
---|
52 | <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Parser |
---|
53 | generators</b><br> |
---|
54 | <br> |
---|
55 | Whenever you see an invocation of the parser generator function, it is equivalent |
---|
56 | to the parser itself. Therefore, we often call <tt>ch_p</tt> a character |
---|
57 | parser, even if, technically speaking, it is a function that generates a |
---|
58 | character parser.</td> |
---|
59 | </tr> |
---|
60 | </table> |
---|
61 | <p>The following grammar snippet shows these forms in action:</p> |
---|
62 | <pre><code><span class=comment> </span><span class=comment>// a rule can "store" a parser object. They're covered |
---|
63 | </span><span class=comment>// later, but for now just consider a rule as an opaque type |
---|
64 | </span><span class=identifier>rule</span><span class=special><> </span><span class=identifier>r1</span><span class=special>, </span><span class=identifier>r2</span><span class=special>, </span><span class=identifier>r3</span><span class=special>; |
---|
65 | |
---|
66 | </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> </span><span class=identifier>x</span><span class=special>(</span><span class=literal>'X'</span><span class=special>); </span><span class=comment>// declare a parser named x |
---|
67 | |
---|
68 | </span><span class=identifier>r1 </span><span class=special>= </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>>(</span><span class=literal>'X'</span><span class=special>); </span><span class=comment>// explicit declaration |
---|
69 | </span><span class=identifier>r2 </span><span class=special>= </span><span class=identifier>x</span><span class=special>; </span><span class=comment>// using x |
---|
70 | </span><span class=identifier>r3 </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'X'</span><span class=special>) </span><span class=comment>// using the generator</span></code></pre> |
---|
71 | <h2> chlit and ch_p</h2> |
---|
72 | <p>Matches a single character literal. <tt>chlit</tt> has a single template type |
---|
73 | parameter which defaults to <tt>char</tt> (i.e. <tt>chlit<></tt> is equivalent |
---|
74 | to <tt>chlit<char></tt>). This type parameter is the character type that |
---|
75 | <tt>chlit</tt> will recognize when parsing. The function generator version deduces |
---|
76 | the template type parameters from the actual function arguments. The <tt>chlit</tt> |
---|
77 | class constructor accepts a single parameter: the character it will match the |
---|
78 | input against. Examples:</p> |
---|
79 | <pre><code><span class=comment> </span><span class=identifier>r1 </span><span class=special>= </span><span class=identifier>chlit</span><span class=special><>(</span><span class=literal>'X'</span><span class=special>); |
---|
80 | </span><span class=identifier>r2 </span><span class=special>= </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>wchar_t</span><span class=special>>(</span><span class=identifier>L</span><span class=literal>'X'</span><span class=special>); |
---|
81 | </span><span class=identifier>r3 </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'X'</span><span class=special>);</span></code></pre> |
---|
82 | <p>Going back to our original example:</p> |
---|
83 | <pre><code><span class=special> </span><span class=identifier>group </span><span class=special>= </span><span class=literal>'(' </span><span class=special>>> </span><span class=identifier>expr </span><span class=special>>> </span><span class=literal>')'</span><span class=special>; |
---|
84 | </span><span class=identifier>expr1 </span><span class=special>= </span><span class=identifier>integer </span><span class=special>| </span><span class=identifier>group</span><span class=special>; |
---|
85 | </span><span class=identifier>expr2 </span><span class=special>= </span><span class=identifier>expr1 </span><span class=special>>> </span><span class=special>*((</span><span class=literal>'*' </span><span class=special>>> </span><span class=identifier>expr1</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'/' </span><span class=special>>> </span><span class=identifier>expr1</span><span class=special>)); |
---|
86 | </span><span class=identifier>expr </span><span class=special>= </span><span class=identifier>expr2 </span><span class=special>>> </span><span class=special>*((</span><span class=literal>'+' </span><span class=special>>> </span><span class=identifier>expr2</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'-' </span><span class=special>>> </span><span class=identifier>expr2</span><span class=special>));</span></code></pre> |
---|
87 | <p></p> |
---|
88 | <p>the character literals <tt class="quotes">'('</tt>, <tt class="quotes">')'</tt>, |
---|
89 | <tt class="quotes">'+'</tt>, <tt class="quotes">'-'</tt>, <tt class="quotes">'*'</tt> |
---|
90 | and <tt class="quotes">'/'</tt> in the grammar declaration are <tt>chlit</tt> |
---|
91 | objects that are implicitly created behind the scenes.</p> |
---|
92 | <table width="80%" border="0" align="center"> |
---|
93 | <tr> |
---|
94 | <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>char |
---|
95 | operands</b> <br> |
---|
96 | <br> |
---|
97 | The reason this works is from two special templatized overloads of <tt>operator<span class="operators">>></span></tt> |
---|
98 | that takes a (<tt>char</tt>, <tt> ParserT</tt>), or (<tt>ParserT</tt>, <tt>char</tt>). |
---|
99 | These functions convert the character into a <tt>chlit</tt> object.</td> |
---|
100 | </tr> |
---|
101 | </table> |
---|
102 | <p> One may prefer to declare these explicitly as:</p> |
---|
103 | <pre><code><span class=special> </span><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>plus</span><span class=special>(</span><span class=literal>'+'</span><span class=special>); |
---|
104 | </span><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>minus</span><span class=special>(</span><span class=literal>'-'</span><span class=special>); |
---|
105 | </span><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>times</span><span class=special>(</span><span class=literal>'*'</span><span class=special>); |
---|
106 | </span><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>divide</span><span class=special>(</span><span class=literal>'/'</span><span class=special>); |
---|
107 | </span><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>oppar</span><span class=special>(</span><span class=literal>'('</span><span class=special>); |
---|
108 | </span><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>clpar</span><span class=special>(</span><span class=literal>')'</span><span class=special>);</span></code></pre> |
---|
109 | <h2>range and range_p</h2> |
---|
110 | <p>A <tt>range</tt> of characters is created from a low/high character pair. Such |
---|
111 | a parser matches a single character that is in the <tt>range</tt>, including |
---|
112 | both endpoints. Like <tt>chlit</tt>, <tt>range</tt> has a single template type |
---|
113 | parameter which defaults to <tt>char</tt>. The <tt>range</tt> class constructor |
---|
114 | accepts two parameters: the character range (<I>from</I> and <I>to</I>, inclusive) |
---|
115 | it will match the input against. The function generator version is <tt>range_p</tt>. |
---|
116 | Examples:</p> |
---|
117 | <pre><code><span class=special> </span><span class=identifier>range</span><span class=special><>(</span><span class=literal>'A'</span><span class=special>,</span><span class=literal>'Z'</span><span class=special>) </span><span class=comment>// matches 'A'..'Z' |
---|
118 | </span><span class=identifier>range_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>,</span><span class=literal>'z'</span><span class=special>) </span><span class=comment>// matches 'a'..'z'</span></code></pre> |
---|
119 | <p>Note, the first character must be "before" the second, according |
---|
120 | to the underlying character encoding characters. The range, like chlit is a |
---|
121 | single character parser.</p> |
---|
122 | <table border="0" align="center" width="80%"> |
---|
123 | <tr> |
---|
124 | <td class="note_box"><img src="theme/alert.gif" width="16" height="16"><b> |
---|
125 | Character mapping</b><br> |
---|
126 | <br> |
---|
127 | Character mapping to is inherently platform dependent. It is not guaranteed |
---|
128 | in the standard for example that 'A' < 'Z', however, in many occasions, |
---|
129 | we are well aware of the character set we are using such as ASCII, ISO-8859-1 |
---|
130 | or Unicode. Take care though when porting to another platform.</td> |
---|
131 | </tr> |
---|
132 | </table> |
---|
133 | <h2> strlit and str_p</h2> |
---|
134 | <p>This parser matches a string literal. <tt>strlit</tt> has a single template |
---|
135 | type parameter: an iterator type. Internally, <tt>strlit</tt> holds a begin/end |
---|
136 | iterator pair pointing to a string or a container of characters. The <tt>strlit</tt> |
---|
137 | attempts to match the current input stream with this string. The template type |
---|
138 | parameter defaults to <tt>char const<span class="operators">*</span></tt>. <tt>strlit</tt> |
---|
139 | has two constructors. The first accepts a null-terminated character pointer. |
---|
140 | This constructor may be used to build <tt>strlits</tt> from quoted string literals. |
---|
141 | The second constructor takes in a first/last iterator pair. The function generator |
---|
142 | version is <tt>str_p</tt>. Examples:</p> |
---|
143 | <pre><code><span class=comment> </span><span class=identifier>strlit</span><span class=special><>(</span><span class=string>"Hello World"</span><span class=special>) |
---|
144 | </span><span class=identifier>str_p</span><span class=special>(</span><span class=string>"Hello World"</span><span class=special>) |
---|
145 | |
---|
146 | </span><span class=identifier>std</span><span class=special>::</span><span class=identifier>string </span><span class=identifier>msg</span><span class=special>(</span><span class=string>"Hello World"</span><span class=special>); |
---|
147 | </span><span class=identifier>strlit</span><span class=special><</span><span class=identifier>std</span><span class=special>::</span><span class=identifier>string</span><span class=special>::</span><span class=identifier>const_iterator</span><span class=special>>(</span><span class=identifier>msg</span><span class=special>.</span><span class=identifier>begin</span><span class=special>(), </span><span class=identifier>msg</span><span class=special>.</span><span class=identifier>end</span><span class=special>());</span></code></pre> |
---|
148 | <table width="80%" border="0" align="center"> |
---|
149 | <tr> |
---|
150 | <td class="note_box"><img src="theme/note.gif" width="16" height="16"> <b>Character |
---|
151 | and phrase level parsing</b><br> |
---|
152 | <br> |
---|
153 | Typical parsers regard the processing of characters (symbols that form words |
---|
154 | or lexemes) and phrases (words that form sentences) as separate domains. |
---|
155 | Entities such as reserved words, operators, literal strings, numerical constants, |
---|
156 | etc., which constitute the terminals of a grammar are usually extracted |
---|
157 | first in a separate lexical analysis stage.<br> |
---|
158 | <br> |
---|
159 | At this point, as evident in the examples we have so far, it is important |
---|
160 | to note that, contrary to standard practice, the Spirit framework handles |
---|
161 | parsing tasks at both the character level as well as the phrase level. One |
---|
162 | may consider that a lexical analyzer is seamlessly integrated in the Spirit |
---|
163 | framework.<br> |
---|
164 | <br> |
---|
165 | Although the Spirit parser library does not need a separate lexical analyzer, |
---|
166 | there is no reason why we cannot have one. One can always have as many parser |
---|
167 | layers as needed. In theory, one may create a preprocessor, a lexical analyzer |
---|
168 | and a parser proper, all using the same framework.</td> |
---|
169 | </tr> |
---|
170 | </table> |
---|
171 | <h2>chseq and chseq_p</h2> |
---|
172 | <p>Matches a character sequence. <tt>chseq</tt> has the same template type parameters |
---|
173 | and constructor parameters as strlit. The function generator version is <tt>chseq_p</tt>. |
---|
174 | Examples:</p> |
---|
175 | <pre><code><span class=special> </span><span class=identifier>chseq</span><span class=special><>(</span><span class=string>"ABCDEFG"</span><span class=special>) |
---|
176 | </span><span class=identifier>chseq_p</span><span class=special>(</span><span class=string>"ABCDEFG"</span><span class=special>)</span></code></pre> |
---|
177 | <p><tt>strlit</tt> is an implicit lexeme. That is, it works solely on the character |
---|
178 | level. <tt>chseq</tt>, <tt>strlit</tt>'s twin, on the other hand, can work on |
---|
179 | both the character and phrase levels. What this simply means is that it can |
---|
180 | ignore white spaces in between the string characters. For example:</p> |
---|
181 | <pre><code><span class=special> </span><span class=identifier>chseq</span><span class=special><>(</span><span class=string>"ABCDEFG"</span><span class=special>)</span></code></pre> |
---|
182 | <p>can parse:</p> |
---|
183 | <pre><span class=special> </span><span class=identifier>ABCDEFG |
---|
184 | </span><span class=identifier>A </span><span class=identifier>B </span><span class=identifier>C </span><span class=identifier>D </span><span class=identifier>E </span><span class=identifier>F </span><span class=identifier>G |
---|
185 | </span><span class=identifier>AB </span><span class=identifier>CD </span><span class=identifier>EFG</span></pre> |
---|
186 | <h2>More character parsers</h2> |
---|
187 | <p>The framework also predefines the full repertoire of single character parsers:</p> |
---|
188 | <table width="90%" border="0" align="center"> |
---|
189 | <tr> |
---|
190 | <td class="table_title" colspan="2">Single character parsers</td> |
---|
191 | </tr> |
---|
192 | <tr> |
---|
193 | <td class="table_cells" width="30%"><b>anychar_p</b></td> |
---|
194 | <td class="table_cells" width="70%">Matches any single character (including |
---|
195 | the null terminator: '\0')</td> |
---|
196 | </tr> |
---|
197 | <tr> |
---|
198 | <td class="table_cells" width="30%"><b>alnum_p</b></td> |
---|
199 | <td class="table_cells" width="70%">Matches alpha-numeric characters</td> |
---|
200 | </tr> |
---|
201 | <tr> |
---|
202 | <td class="table_cells" width="30%"><b>alpha_p</b></td> |
---|
203 | <td class="table_cells" width="70%">Matches alphabetic characters</td> |
---|
204 | </tr> |
---|
205 | <tr> |
---|
206 | <td class="table_cells" width="30%"><b>blank_p</b></td> |
---|
207 | <td class="table_cells" width="70%">Matches spaces or tabs</td> |
---|
208 | </tr> |
---|
209 | <tr> |
---|
210 | <td class="table_cells" width="30%"><b>cntrl_p</b></td> |
---|
211 | <td class="table_cells" width="70%">Matches control characters</td> |
---|
212 | </tr> |
---|
213 | <tr> |
---|
214 | <td class="table_cells" width="30%"><b>digit_p</b></td> |
---|
215 | <td class="table_cells" width="70%">Matches numeric digits</td> |
---|
216 | </tr> |
---|
217 | <tr> |
---|
218 | <td class="table_cells" width="30%"><b>graph_p</b></td> |
---|
219 | <td class="table_cells" width="70%">Matches non-space printing characters</td> |
---|
220 | </tr> |
---|
221 | <tr> |
---|
222 | <td class="table_cells" width="30%"><b>lower_p</b></td> |
---|
223 | <td class="table_cells" width="70%">Matches lower case letters</td> |
---|
224 | </tr> |
---|
225 | <tr> |
---|
226 | <td class="table_cells" width="30%"><b>print_p</b></td> |
---|
227 | <td class="table_cells" width="70%">Matches printable characters</td> |
---|
228 | </tr> |
---|
229 | <tr> |
---|
230 | <td class="table_cells" width="30%"><b>punct_p</b></td> |
---|
231 | <td class="table_cells" width="70%">Matches punctuation symbols</td> |
---|
232 | </tr> |
---|
233 | <tr> |
---|
234 | <td class="table_cells" width="30%"><b>space_p</b></td> |
---|
235 | <td class="table_cells" width="70%">Matches spaces, tabs, returns, and newlines</td> |
---|
236 | </tr> |
---|
237 | <tr> |
---|
238 | <td class="table_cells" width="30%"><b>upper_p</b></td> |
---|
239 | <td class="table_cells" width="70%">Matches upper case letters</td> |
---|
240 | </tr> |
---|
241 | <tr> |
---|
242 | <td class="table_cells" width="30%"><b>xdigit_p</b></td> |
---|
243 | <td class="table_cells" width="70%">Matches hexadecimal digits</td> |
---|
244 | </tr> |
---|
245 | </table> |
---|
246 | <h2><a name="negation"></a>negation ~</h2> |
---|
247 | <p>Single character parsers such as the <tt>chlit</tt>, <tt>range</tt>, <tt>anychar_p</tt>, |
---|
248 | <tt>alnum_p</tt> etc. can be negated. For example:</p> |
---|
249 | <pre><code><span class=special> ~</span><span class=identifier>ch_p</span><span class="special">(</span><span class="literal">'x'</span><span class="special">)</span></code></pre> |
---|
250 | <p>matches any character except <tt>'x'</tt>. Double negation of a character parser |
---|
251 | cancels out the negation. <tt>~~alpha_p</tt> is equivalent to <tt>alpha_p</tt>.</p> |
---|
252 | <h2>eol_p</h2> |
---|
253 | <p>Matches the end of line (CR/LF and combinations thereof).</p> |
---|
254 | <h2><b>nothing_p</b></h2> |
---|
255 | <p>Never matches anything and always fails.</p> |
---|
256 | <h2>end_p</h2> |
---|
257 | <p>Matches the end of input (returns a sucessful match with 0 length when the |
---|
258 | input is exhausted)</p> |
---|
259 | <table border="0"> |
---|
260 | <tr> |
---|
261 | <td width="10"></td> |
---|
262 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
263 | <td width="30"><a href="organization.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
264 | <td width="30"><a href="operators.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
265 | </tr> |
---|
266 | </table> |
---|
267 | <br> |
---|
268 | <hr size="1"> |
---|
269 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
---|
270 | Copyright © 2003 Martin Wille<br> |
---|
271 | <br> |
---|
272 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
273 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
274 | http://www.boost.org/LICENSE_1_0.txt) </font> </p> |
---|
275 | <p> </p> |
---|
276 | </body> |
---|
277 | </html> |
---|