Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/spirit/doc/primitives.html @ 29

Last change on this file since 29 was 29, checked in by landauf, 16 years ago

updated boost from 1_33_1 to 1_34_1

File size: 19.8 KB
Line 
1<html>
2<head>
3<title>Primitives</title>
4<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
5<link rel="stylesheet" href="theme/style.css" type="text/css">
6</head>
7
8<body>
9<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
10  <tr> 
11    <td width="10"> 
12    </td>
13    <td width="85%"> 
14      <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Primitives</b></font>
15    </td>
16    <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
17  </tr>
18</table>
19<br>
20<table border="0">
21  <tr>
22    <td width="10"></td>
23    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
24    <td width="30"><a href="organization.html"><img src="theme/l_arr.gif" border="0"></a></td>
25    <td width="30"><a href="operators.html"><img src="theme/r_arr.gif" border="0"></a></td>
26   </tr>
27</table>
28<p>The framework predefines some parser primitives. These are the most basic building
29  blocks that the client uses to build more complex parsers. These primitive parsers
30  are template classes, making them very flexible.</p>
31<p>These primitive parsers can be instantiated directly or through a templatized
32  helper function. Generally, the helper function is far simpler to deal with
33  as it involves less typing.</p>
34<p>We have seen the character literal parser before through the generator function
35  <tt>ch_p</tt> which is not really a parser but, rather, a parser generator.
36  Class <tt>chlit&lt;CharT&gt;</tt> is the actual template class behind the character
37  literal parser. To instantiate a <tt>chlit</tt> object, you must explicitly
38  provide the character type, <tt>CharT</tt>, as a template parameter which determines
39  the type of the character. This type typically corresponds to the input type,
40  usually <tt>char</tt> or <tt>wchar_t</tt>. The following expression creates
41  a temporary parser object which will recognize the single letter <span class="quotes">'X'</span>.</p>
42<pre><code><font color="#000000"><span class=identifier>    </span><span class=identifier>chlit</span><span class=special>&lt;</span><span class=keyword>char</span><span class=special>&gt;(</span><span class=literal>'X'</span><span class=special>);</span></font></code></pre>
43<p>Using <tt>chlit</tt>'s generator function <tt>ch_p</tt> simplifies the usage
44  of the <tt>chlit&lt;&gt;</tt> class (this is true of most Spirit parser classes
45  since most have corresponding generator functions). It is convenient to call
46  the function because the compiler will deduce the template type through argument
47  deduction for us. The example above could be expressed less verbosely using
48  the <tt>ch_p </tt>helper function. </p>
49<pre><code><font color="#000000"><span class=special>    </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'X'</span><span class=special></span><span class=comment>// equivalent to chlit&lt;char&gt;('X') object</span></font></code></pre>
50<table width="80%" border="0" align="center">
51  <tr> 
52    <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Parser
53      generators</b><br>
54      <br>
55      Whenever you see an invocation of the parser generator function, it is equivalent
56      to the parser itself. Therefore, we often call <tt>ch_p</tt> a character
57      parser, even if, technically speaking, it is a function that generates a
58      character parser.</td>
59  </tr>
60</table>
61<p>The following grammar snippet shows these forms in action:</p>
62<pre><code><span class=comment>    </span><span class=comment>// a rule can "store" a parser object.  They're covered
63    </span><span class=comment>// later, but for now just consider a rule as an opaque type
64    </span><span class=identifier>rule</span><span class=special>&lt;&gt; </span><span class=identifier>r1</span><span class=special>, </span><span class=identifier>r2</span><span class=special>, </span><span class=identifier>r3</span><span class=special>;
65
66    </span><span class=identifier>chlit</span><span class=special>&lt;</span><span class=keyword>char</span><span class=special>&gt; </span><span class=identifier>x</span><span class=special>(</span><span class=literal>'X'</span><span class=special>);     </span><span class=comment>// declare a parser named x
67
68    </span><span class=identifier>r1 </span><span class=special>= </span><span class=identifier>chlit</span><span class=special>&lt;</span><span class=keyword>char</span><span class=special>&gt;(</span><span class=literal>'X'</span><span class=special>);  </span><span class=comment>//  explicit declaration
69    </span><span class=identifier>r2 </span><span class=special>= </span><span class=identifier>x</span><span class=special>;                 </span><span class=comment>//  using x
70    </span><span class=identifier>r3 </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'X'</span><span class=special>)          </span><span class=comment>//  using the generator</span></code></pre>
71<h2> chlit and ch_p</h2>
72<p>Matches a single character literal. <tt>chlit</tt> has a single template type
73  parameter which defaults to <tt>char</tt> (i.e. <tt>chlit&lt;&gt;</tt> is equivalent
74  to <tt>chlit&lt;char&gt;</tt>). This type parameter is the character type that
75  <tt>chlit</tt> will recognize when parsing. The function generator version deduces
76  the template type parameters from the actual function arguments. The <tt>chlit</tt> 
77  class constructor accepts a single parameter: the character it will match the
78  input against. Examples:</p>
79<pre><code><span class=comment>    </span><span class=identifier>r1 </span><span class=special>= </span><span class=identifier>chlit</span><span class=special>&lt;&gt;(</span><span class=literal>'X'</span><span class=special>);
80    </span><span class=identifier>r2 </span><span class=special>= </span><span class=identifier>chlit</span><span class=special>&lt;</span><span class=keyword>wchar_t</span><span class=special>&gt;(</span><span class=identifier>L</span><span class=literal>'X'</span><span class=special>);
81    </span><span class=identifier>r3 </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'X'</span><span class=special>);</span></code></pre>
82<p>Going back to our original example:</p>
83<pre><code><span class=special>    </span><span class=identifier>group </span><span class=special>= </span><span class=literal>'(' </span><span class=special>&gt;&gt; </span><span class=identifier>expr </span><span class=special>&gt;&gt; </span><span class=literal>')'</span><span class=special>;
84    </span><span class=identifier>expr1 </span><span class=special>= </span><span class=identifier>integer </span><span class=special>| </span><span class=identifier>group</span><span class=special>;
85    </span><span class=identifier>expr2 </span><span class=special>= </span><span class=identifier>expr1 </span><span class=special>&gt;&gt; </span><span class=special>*((</span><span class=literal>'*' </span><span class=special>&gt;&gt; </span><span class=identifier>expr1</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'/' </span><span class=special>&gt;&gt; </span><span class=identifier>expr1</span><span class=special>));
86    </span><span class=identifier>expr  </span><span class=special>= </span><span class=identifier>expr2 </span><span class=special>&gt;&gt; </span><span class=special>*((</span><span class=literal>'+' </span><span class=special>&gt;&gt; </span><span class=identifier>expr2</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'-' </span><span class=special>&gt;&gt; </span><span class=identifier>expr2</span><span class=special>));</span></code></pre>
87<p></p>
88<p>the character literals <tt class="quotes">'('</tt>, <tt class="quotes">')'</tt>,
89  <tt class="quotes">'+'</tt>, <tt class="quotes">'-'</tt>, <tt class="quotes">'*'</tt> 
90  and <tt class="quotes">'/'</tt> in the grammar declaration are <tt>chlit</tt> 
91  objects that are implicitly created behind the scenes.</p>
92<table width="80%" border="0" align="center">
93  <tr> 
94    <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>char
95      operands</b> <br>
96      <br>
97      The reason this works is from two special templatized overloads of <tt>operator<span class="operators">&gt;&gt;</span></tt> 
98      that takes a (<tt>char</tt>, <tt> ParserT</tt>), or (<tt>ParserT</tt>, <tt>char</tt>).
99      These functions convert the character into a <tt>chlit</tt> object.</td>
100  </tr>
101</table>
102<p> One may prefer to declare these explicitly as:</p>
103<pre><code><span class=special>    </span><span class=identifier>chlit</span><span class=special>&lt;&gt; </span><span class=identifier>plus</span><span class=special>(</span><span class=literal>'+'</span><span class=special>);
104    </span><span class=identifier>chlit</span><span class=special>&lt;&gt; </span><span class=identifier>minus</span><span class=special>(</span><span class=literal>'-'</span><span class=special>);
105    </span><span class=identifier>chlit</span><span class=special>&lt;&gt; </span><span class=identifier>times</span><span class=special>(</span><span class=literal>'*'</span><span class=special>);
106    </span><span class=identifier>chlit</span><span class=special>&lt;&gt; </span><span class=identifier>divide</span><span class=special>(</span><span class=literal>'/'</span><span class=special>);
107    </span><span class=identifier>chlit</span><span class=special>&lt;&gt; </span><span class=identifier>oppar</span><span class=special>(</span><span class=literal>'('</span><span class=special>);
108    </span><span class=identifier>chlit</span><span class=special>&lt;&gt; </span><span class=identifier>clpar</span><span class=special>(</span><span class=literal>')'</span><span class=special>);</span></code></pre>
109<h2>range and range_p</h2>
110<p>A <tt>range</tt> of characters is created from a low/high character pair. Such
111  a parser matches a single character that is in the <tt>range</tt>, including
112  both endpoints. Like <tt>chlit</tt>, <tt>range</tt> has a single template type
113  parameter which defaults to <tt>char</tt>. The <tt>range</tt> class constructor
114  accepts two parameters: the character range (<I>from</I> and <I>to</I>, inclusive)
115  it will match the input against. The function generator version is <tt>range_p</tt>.
116  Examples:</p>
117<pre><code><span class=special>    </span><span class=identifier>range</span><span class=special>&lt;&gt;(</span><span class=literal>'A'</span><span class=special>,</span><span class=literal>'Z'</span><span class=special>)    </span><span class=comment>// matches 'A'..'Z'
118    </span><span class=identifier>range_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>,</span><span class=literal>'z'</span><span class=special>)    </span><span class=comment>// matches 'a'..'z'</span></code></pre>
119<p>Note, the first character must be &quot;before&quot; the second, according
120  to the underlying character encoding characters. The range, like chlit is a
121  single character parser.</p>
122<table border="0" align="center" width="80%">
123  <tr>
124    <td class="note_box"><img src="theme/alert.gif" width="16" height="16"><b> 
125      Character mapping</b><br>
126      <br>
127      Character mapping to is inherently platform dependent. It is not guaranteed
128      in the standard for example that 'A' &lt; 'Z', however, in many occasions,
129      we are well aware of the character set we are using such as ASCII, ISO-8859-1
130      or Unicode. Take care though when porting to another platform.</td>
131  </tr>
132</table>
133<h2> strlit and str_p</h2>
134<p>This parser matches a string literal. <tt>strlit</tt> has a single template
135  type parameter: an iterator type. Internally, <tt>strlit</tt> holds a begin/end
136  iterator pair pointing to a string or a container of characters. The <tt>strlit</tt> 
137  attempts to match the current input stream with this string. The template type
138  parameter defaults to <tt>char const<span class="operators">*</span></tt>. <tt>strlit</tt> 
139  has two constructors. The first accepts a null-terminated character pointer.
140  This constructor may be used to build <tt>strlits</tt> from quoted string literals.
141  The second constructor takes in a first/last iterator pair. The function generator
142  version is <tt>str_p</tt>. Examples:</p>
143<pre><code><span class=comment>    </span><span class=identifier>strlit</span><span class=special>&lt;&gt;(</span><span class=string>"Hello World"</span><span class=special>)
144    </span><span class=identifier>str_p</span><span class=special>(</span><span class=string>"Hello World"</span><span class=special>)
145
146    </span><span class=identifier>std</span><span class=special>::</span><span class=identifier>string </span><span class=identifier>msg</span><span class=special>(</span><span class=string>"Hello World"</span><span class=special>);
147    </span><span class=identifier>strlit</span><span class=special>&lt;</span><span class=identifier>std</span><span class=special>::</span><span class=identifier>string</span><span class=special>::</span><span class=identifier>const_iterator</span><span class=special>&gt;(</span><span class=identifier>msg</span><span class=special>.</span><span class=identifier>begin</span><span class=special>(), </span><span class=identifier>msg</span><span class=special>.</span><span class=identifier>end</span><span class=special>());</span></code></pre>
148<table width="80%" border="0" align="center">
149  <tr>
150    <td class="note_box"><img src="theme/note.gif" width="16" height="16"> <b>Character
151      and phrase level parsing</b><br>
152      <br>
153      Typical parsers regard the processing of characters (symbols that form words
154      or lexemes) and phrases (words that form sentences) as separate domains.
155      Entities such as reserved words, operators, literal strings, numerical constants,
156      etc., which constitute the terminals of a grammar are usually extracted
157      first in a separate lexical analysis stage.<br>
158      <br>
159      At this point, as evident in the examples we have so far, it is important
160      to note that, contrary to standard practice, the Spirit framework handles
161      parsing tasks at both the character level as well as the phrase level. One
162      may consider that a lexical analyzer is seamlessly integrated in the Spirit
163      framework.<br>
164      <br>
165      Although the Spirit parser library does not need a separate lexical analyzer,
166      there is no reason why we cannot have one. One can always have as many parser
167      layers as needed. In theory, one may create a preprocessor, a lexical analyzer
168      and a parser proper, all using the same framework.</td>
169  </tr>
170</table>
171<h2>chseq and chseq_p</h2>
172<p>Matches a character sequence. <tt>chseq</tt> has the same template type parameters
173  and constructor parameters as strlit. The function generator version is <tt>chseq_p</tt>.
174  Examples:</p>
175<pre><code><span class=special>    </span><span class=identifier>chseq</span><span class=special>&lt;&gt;(</span><span class=string>"ABCDEFG"</span><span class=special>)
176    </span><span class=identifier>chseq_p</span><span class=special>(</span><span class=string>"ABCDEFG"</span><span class=special>)</span></code></pre>
177<p><tt>strlit</tt> is an implicit lexeme. That is, it works solely on the character
178  level. <tt>chseq</tt>, <tt>strlit</tt>'s twin, on the other hand, can work on
179  both the character and phrase levels. What this simply means is that it can
180  ignore white spaces in between the string characters. For example:</p>
181<pre><code><span class=special>    </span><span class=identifier>chseq</span><span class=special>&lt;&gt;(</span><span class=string>"ABCDEFG"</span><span class=special>)</span></code></pre>
182<p>can parse:</p>
183<pre><span class=special>    </span><span class=identifier>ABCDEFG
184    </span><span class=identifier>A </span><span class=identifier>B </span><span class=identifier>C </span><span class=identifier>D </span><span class=identifier>E </span><span class=identifier>F </span><span class=identifier>G
185    </span><span class=identifier>AB </span><span class=identifier>CD </span><span class=identifier>EFG</span></pre>
186<h2>More character parsers</h2>
187<p>The framework also predefines the full repertoire of single character parsers:</p>
188<table width="90%" border="0" align="center">
189  <tr> 
190    <td class="table_title" colspan="2">Single character parsers</td>
191  </tr>
192  <tr> 
193    <td class="table_cells" width="30%"><b>anychar_p</b></td>
194    <td class="table_cells" width="70%">Matches any single character (including
195      the null terminator: '\0')</td>
196  </tr>
197  <tr> 
198    <td class="table_cells" width="30%"><b>alnum_p</b></td>
199    <td class="table_cells" width="70%">Matches alpha-numeric characters</td>
200  </tr>
201  <tr> 
202    <td class="table_cells" width="30%"><b>alpha_p</b></td>
203    <td class="table_cells" width="70%">Matches alphabetic characters</td>
204  </tr>
205  <tr> 
206    <td class="table_cells" width="30%"><b>blank_p</b></td>
207    <td class="table_cells" width="70%">Matches spaces or tabs</td>
208  </tr>
209  <tr> 
210    <td class="table_cells" width="30%"><b>cntrl_p</b></td>
211    <td class="table_cells" width="70%">Matches control characters</td>
212  </tr>
213  <tr> 
214    <td class="table_cells" width="30%"><b>digit_p</b></td>
215    <td class="table_cells" width="70%">Matches numeric digits</td>
216  </tr>
217  <tr> 
218    <td class="table_cells" width="30%"><b>graph_p</b></td>
219    <td class="table_cells" width="70%">Matches non-space printing characters</td>
220  </tr>
221  <tr> 
222    <td class="table_cells" width="30%"><b>lower_p</b></td>
223    <td class="table_cells" width="70%">Matches lower case letters</td>
224  </tr>
225  <tr> 
226    <td class="table_cells" width="30%"><b>print_p</b></td>
227    <td class="table_cells" width="70%">Matches printable characters</td>
228  </tr>
229  <tr> 
230    <td class="table_cells" width="30%"><b>punct_p</b></td>
231    <td class="table_cells" width="70%">Matches punctuation symbols</td>
232  </tr>
233  <tr> 
234    <td class="table_cells" width="30%"><b>space_p</b></td>
235    <td class="table_cells" width="70%">Matches spaces, tabs, returns, and newlines</td>
236  </tr>
237  <tr> 
238    <td class="table_cells" width="30%"><b>upper_p</b></td>
239    <td class="table_cells" width="70%">Matches upper case letters</td>
240  </tr>
241  <tr> 
242    <td class="table_cells" width="30%"><b>xdigit_p</b></td>
243    <td class="table_cells" width="70%">Matches hexadecimal digits</td>
244  </tr>
245</table>
246<h2><a name="negation"></a>negation ~</h2>
247<p>Single character parsers such as the <tt>chlit</tt>, <tt>range</tt>, <tt>anychar_p</tt>,
248  <tt>alnum_p</tt> etc. can be negated. For example:</p>
249<pre><code><span class=special>    ~</span><span class=identifier>ch_p</span><span class="special">(</span><span class="literal">'x'</span><span class="special">)</span></code></pre>
250<p>matches any character except <tt>'x'</tt>. Double negation of a character parser
251  cancels out the negation. <tt>~~alpha_p</tt> is equivalent to <tt>alpha_p</tt>.</p>
252<h2>eol_p</h2>
253<p>Matches the end of line (CR/LF and combinations thereof).</p>
254<h2><b>nothing_p</b></h2>
255<p>Never matches anything and always fails.</p>
256<h2>end_p</h2>
257<p>Matches the end of input (returns a sucessful match with 0 length when the
258  input is exhausted)</p>
259<table border="0">
260  <tr> 
261    <td width="10"></td>
262    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
263    <td width="30"><a href="organization.html"><img src="theme/l_arr.gif" border="0"></a></td>
264    <td width="30"><a href="operators.html"><img src="theme/r_arr.gif" border="0"></a></td>
265  </tr>
266</table>
267<br>
268<hr size="1">
269<p class="copyright">Copyright &copy; 1998-2003 Joel de Guzman<br>
270  Copyright &copy; 2003 Martin Wille<br>
271  <br>
272  <font size="2">Use, modification and distribution is subject to the Boost Software
273    License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
274    http://www.boost.org/LICENSE_1_0.txt) </font> </p>
275<p>&nbsp;</p>
276</body>
277</html>
Note: See TracBrowser for help on using the repository browser.