1 | <html> |
---|
2 | <head> |
---|
3 | <title>Operators</title> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
6 | </head> |
---|
7 | |
---|
8 | <body> |
---|
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
10 | <tr> |
---|
11 | <td width="10"> |
---|
12 | </td> |
---|
13 | <td width="85%"> |
---|
14 | <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Operators</b></font> |
---|
15 | </td> |
---|
16 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
17 | </tr> |
---|
18 | </table> |
---|
19 | <br> |
---|
20 | <table border="0"> |
---|
21 | <tr> |
---|
22 | <td width="10"></td> |
---|
23 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
24 | <td width="30"><a href="primitives.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
25 | <td width="30"><a href="numerics.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
26 | </tr> |
---|
27 | </table> |
---|
28 | <p>Operators are used as a means for object composition and embedding. Simple |
---|
29 | parsers may be composed to form composites through operator overloading, crafted |
---|
30 | to approximate the syntax of an Extended Backus-Normal Form (EBNF) variant. |
---|
31 | An expression such as:</p> |
---|
32 | <pre><code><font color="#000000"> <span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span></font></code></pre> |
---|
33 | <p>actually yields a new parser type which is a composite of its operands, a and |
---|
34 | b. Taking this example further, if a and b were of type <tt>chlit</tt><>, |
---|
35 | the result would have the composite type:</p> |
---|
36 | <pre><code><font color="#000000"> <span class=identifier>alternative</span><span class=special><</span><span class=identifier>chlit</span><span class=special><>, </span><span class=identifier>chlit</span><span class=special><> </span><span class=special>></span></font></code></pre> |
---|
37 | <p> In general, for any binary operator, it will take its two arguments, parser1 |
---|
38 | and parser2, and create a new composed parser of the form</p> |
---|
39 | <pre><code><font color="#000000"> <span class=identifier>op</span><span class=special><</span><span class=identifier>parser1</span><span class=special>, </span><span class=identifier>parser2</span><span class=special>></span></font></code></pre> |
---|
40 | <p>where parser1 and parser2 can be arbitrarily complex parsers themselves, with |
---|
41 | the only limitations being what your compiler imposes. </p> |
---|
42 | <h3>Set Operators</h3> |
---|
43 | <table width="90%" border="0" align="center"> |
---|
44 | <tr> |
---|
45 | <td class="table_title" colspan="3">Set operators</td> |
---|
46 | </tr> |
---|
47 | <tr> |
---|
48 | <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>| |
---|
49 | </span><span class=identifier>b</span></code></td> |
---|
50 | <td class="table_cells" width="24%">Union</td> |
---|
51 | <td class="table_cells" width="56%">Match a or b. Also referred to as alternative</td> |
---|
52 | </tr> |
---|
53 | <tr> |
---|
54 | <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>& |
---|
55 | </span><span class=identifier>b</span></code></td> |
---|
56 | <td class="table_cells" width="24%">Intersection</td> |
---|
57 | <td class="table_cells" width="56%">Match a and b</td> |
---|
58 | </tr> |
---|
59 | <tr> |
---|
60 | <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>- |
---|
61 | </span><span class=identifier>b</span></code></td> |
---|
62 | <td class="table_cells" width="24%">Difference</td> |
---|
63 | <td class="table_cells" width="56%">Match a but not b. If both match and b's |
---|
64 | matched text is shorter than a's matched text, a successful match is made</td> |
---|
65 | </tr> |
---|
66 | <tr> |
---|
67 | <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>^ |
---|
68 | </span><span class=identifier>b</span></code></td> |
---|
69 | <td class="table_cells" width="24%">XOR</td> |
---|
70 | <td class="table_cells" width="56%">Match a or b, but not both</td> |
---|
71 | </tr> |
---|
72 | </table> |
---|
73 | <p><b>Short-circuiting</b></p> |
---|
74 | <p>Alternative operands are tried one by one on a first come first served basis |
---|
75 | starting from the leftmost operand. After a successfully matched alternative |
---|
76 | is found, the parser concludes its search, essentially short-circuiting the |
---|
77 | search for other potentially viable candidates. This short-circuiting implicitly |
---|
78 | gives the highest priority to the leftmost alternative.</p> |
---|
79 | <p>Short-circuiting is done in the same manner as C or C++'s logical expressions; |
---|
80 | e.g. <tt>if</tt> <tt><span class="operators">(</span>x <span class="operators"><</span> |
---|
81 | 3 <span class="operators">||</span> y <span class="operators"><</span> 2<span class="operators">)</span></tt> |
---|
82 | where, if <tt>x</tt> evaluates to be less than 3, the <tt>y <span class="operators"><</span> |
---|
83 | 2</tt> test is not done at all. In addition to providing an implicit priority |
---|
84 | rule for alternatives which is necessary, given the non-deterministic nature |
---|
85 | of the Spirit parser compiler, short-circuiting improves the execution time. |
---|
86 | If the order of your alternatives is logically irrelevant, strive to put the |
---|
87 | (expected) most common choice first for maximum efficiency.</p> |
---|
88 | <table width="80%" border="0" align="center"> |
---|
89 | <tr> |
---|
90 | <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Intersections</b><br> |
---|
91 | <br> |
---|
92 | Some researchers assert that the intersections (e.g. <tt>a & b</tt>) |
---|
93 | let us define context sensitive languages (<a href="references.html#intersections">"XBNF"</a> |
---|
94 | [citing Leu-Weiner, 1973]). "The theory of defining a language as the |
---|
95 | intersection of a finite number of context free languages was developed |
---|
96 | by Leu and Weiner in 1973".<br> |
---|
97 | <br> |
---|
98 | <b><img src="theme/lens.gif" width="15" height="16"> <b></b>~ Operator</b><br> |
---|
99 | <br> |
---|
100 | The complement operator <tt>~</tt> was originally put into consideration. |
---|
101 | Further understanding of its value and meaning leads us to uncertainty. |
---|
102 | The basic problem stems from the fact that <tt>~a</tt> will yield <tt>U-a</tt>, |
---|
103 | where <tt>U</tt> is the universal set of all strings. However, where it |
---|
104 | makes sense, some parsers can be complemented (see the <a href="primitives.html#negation">primitive |
---|
105 | character parsers</a> for examples).</td> |
---|
106 | </tr> |
---|
107 | </table> |
---|
108 | <h3>Sequencing Operators</h3> |
---|
109 | <table width="90%" border="0" align="center"> |
---|
110 | <tr> |
---|
111 | <td class="table_title" colspan="3">Sequencing operators</td> |
---|
112 | </tr> |
---|
113 | <tr> |
---|
114 | <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>>> |
---|
115 | </span><span class=identifier>b</span></code></td> |
---|
116 | <td class="table_cells" width="23%">Sequence</td> |
---|
117 | <td class="table_cells" width="56%">Match a and b in sequence</td> |
---|
118 | </tr> |
---|
119 | <tr> |
---|
120 | <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>&& |
---|
121 | </span><span class=identifier>b</span></code></td> |
---|
122 | <td class="table_cells" width="23%">Sequential-and</td> |
---|
123 | <td class="table_cells" width="56%">Sequential-and. Same as above, match a |
---|
124 | and b in sequence</td> |
---|
125 | </tr> |
---|
126 | <tr> |
---|
127 | <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>|| |
---|
128 | </span><span class=identifier>b</span></code></td> |
---|
129 | <td class="table_cells" width="23%">Sequential-or</td> |
---|
130 | <td class="table_cells" width="56%">Match a or b in sequence</td> |
---|
131 | </tr> |
---|
132 | </table> |
---|
133 | <p>The sequencing operator <tt class="operators">>></tt> can alternatively |
---|
134 | be thought of as the sequential-and operator. The expression <tt>a <span class="operators">&&</span> |
---|
135 | b</tt> reads as match a and b in sequence. Continuing this logic, we can also |
---|
136 | have a sequential-or operator where the expression <tt>a <span class="operators">||</span> |
---|
137 | b</tt> reads as match a or b and in sequence. That is, if both a and b match, |
---|
138 | it must be in sequence; this is equivalent to <tt>a >> !b | b</tt>. </p> |
---|
139 | <h3>Optional and Loops</h3> |
---|
140 | <table width="90%" border="0" align="center"> |
---|
141 | <tr> |
---|
142 | <td class="table_title" colspan="3">Optional and Loops</td> |
---|
143 | </tr> |
---|
144 | <tr> |
---|
145 | <td class="table_cells" width="21%"><code><span class=special>*</span><span class=identifier>a</span></code></td> |
---|
146 | <td class="table_cells" width="23%">Kleene star</td> |
---|
147 | <td class="table_cells" width="56%">Match a zero (0) or more times</td> |
---|
148 | </tr> |
---|
149 | <tr> |
---|
150 | <td class="table_cells" width="21%"><code><span class=special>+</span><span class=identifier>a</span></code></td> |
---|
151 | <td class="table_cells" width="23%">Positive</td> |
---|
152 | <td class="table_cells" width="56%">Match a one (1) or more times</td> |
---|
153 | </tr> |
---|
154 | <tr> |
---|
155 | <td class="table_cells" width="21%"><code><span class=special>!</span><span class=identifier>a</span></code></td> |
---|
156 | <td class="table_cells" width="23%">Optional</td> |
---|
157 | <td class="table_cells" width="56%">Match a zero (0) or one (1) time</td> |
---|
158 | </tr> |
---|
159 | <tr> |
---|
160 | <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>% |
---|
161 | </span><span class=identifier>b</span></code></td> |
---|
162 | <td class="table_cells" width="23%">List</td> |
---|
163 | <td class="table_cells" width="56%">Match a list of one or more repetitions |
---|
164 | of a separated by occurrences of b. This is the same as <tt>a >> *(b |
---|
165 | >> a)</tt>. Note that <tt>a</tt> must not also match <tt>b</tt></td> |
---|
166 | </tr> |
---|
167 | </table> |
---|
168 | <p><img src="theme/note.gif" width="16" height="16"> If we look more closely, |
---|
169 | take note that we generalized the optional expression of the form <tt>!a</tt> |
---|
170 | in the same category as loops. This is logical, considering that the optional |
---|
171 | matches the expression following it zero (0) or one (1) time. </p> |
---|
172 | <p><b>Primitive type operands</b></p> |
---|
173 | <p> For binary operators, one of the operands but not both may be a <tt>char</tt>, |
---|
174 | <tt> wchar_t</tt>, <tt>char const<span class="operators">*</span></tt> or <tt>wchar_t |
---|
175 | const<span class="operators">*</span></tt>. Where P is a parser object, here |
---|
176 | are some examples:</p> |
---|
177 | <pre><code><span class=identifier> </span><span class=identifier>P </span><span class=special>| </span><span class=literal>'x' |
---|
178 | </span><span class=identifier>P </span><span class=special>- </span><span class=identifier>L</span><span class=string>"Hello World" |
---|
179 | </span><span class=literal>'x' </span><span class=special>>> </span><span class=identifier>P |
---|
180 | </span><span class=string>"bebop" </span><span class=special>>> </span><span class=identifier>P</span></code></pre> |
---|
181 | <p>It is important to emphasize that C++ mandates that operators may only be overloaded |
---|
182 | if at least one argument is a user-defined type. Typically, in an expression |
---|
183 | involving multiple operators, explicitly typing the leftmost operand as a parser |
---|
184 | is enough to cause propagation to all the rest of the operands to its right |
---|
185 | to be regarded as parsers. Examples:</p> |
---|
186 | <pre><code><font color="#000000"><span class=identifier> </span><span class=identifier>r </span><span class=special>= </span><span class=literal>'a' </span><span class=special>| </span><span class=literal>'b' </span><span class=special>| </span><span class=literal>'c' </span><span class=special>| </span><span class=literal>'d'</span><span class=special>; </span><span class=comment>// ill formed |
---|
187 | </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>) </span><span class=special>| </span><span class=literal>'b' </span><span class=special>| </span><span class=literal>'c' </span><span class=special>| </span><span class=literal>'d'</span><span class=special>; </span><span class=comment>// OK</span></font></code></pre> |
---|
188 | <p>The second case is parsed as follows:</p> |
---|
189 | <pre><code><font color="#000000"> r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> </span><span class=special>| </span><span class=keyword>char</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font> |
---|
190 | |
---|
191 | a <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(</span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font> |
---|
192 | r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>a</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font> |
---|
193 | |
---|
194 | b <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(</span><span class=identifier>a </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font> |
---|
195 | r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>b</span><span class=special>)) </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font> |
---|
196 | |
---|
197 | c <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(</span><span class=identifier>b </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font> |
---|
198 | r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>c</span><span class=special>)))</span></font></font></code></pre> |
---|
199 | <p><b>Operator precedence and grouping</b></p> |
---|
200 | <p>Since we are defining our meta-language in C++, we follow C/C++'s operator |
---|
201 | precedence rules. Grouping expressions inside the parentheses override this |
---|
202 | (e.g., <tt><span class="operators">*(</span>a <span class="operators">|</span> |
---|
203 | b<span class="operators">)</span></tt> reads: match a or b zero (0) or more |
---|
204 | times). </p> |
---|
205 | <table border="0"> |
---|
206 | <tr> |
---|
207 | <td width="10"></td> |
---|
208 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
209 | <td width="30"><a href="primitives.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
210 | <td width="30"><a href="numerics.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
211 | </tr> |
---|
212 | </table> |
---|
213 | <br> |
---|
214 | <hr size="1"> |
---|
215 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
---|
216 | <br> |
---|
217 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
218 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
219 | http://www.boost.org/LICENSE_1_0.txt) </font> </p> |
---|
220 | <p> </p> |
---|
221 | </body> |
---|
222 | </html> |
---|