Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/spirit/doc/character_sets.html @ 29

Last change on this file since 29 was 29, checked in by landauf, 16 years ago

updated boost from 1_33_1 to 1_34_1

File size: 8.6 KB
Line 
1<html>
2<head>
3<title>Character Sets</title>
4<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
5<link rel="stylesheet" href="theme/style.css" type="text/css">
6</head>
7
8<body>
9<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
10  <tr> 
11    <td width="10"> 
12    </td>
13    <td width="85%"> 
14      <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Character Sets</b></font>
15    </td>
16    <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
17  </tr>
18</table>
19<br>
20<table border="0">
21  <tr>
22    <td width="10"></td>
23    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
24    <td width="30"><a href="loops.html"><img src="theme/l_arr.gif" border="0"></a></td>
25    <td width="30"><a href="confix.html"><img src="theme/r_arr.gif" border="0"></a></td>
26   </tr>
27</table>
28<p>The character set <tt>chset</tt> matches a set of characters over a finite
29  range bounded by the limits of its template parameter <tt>CharT</tt>. This class
30  is an optimization of a parser that acts on a set of single characters. The
31  template class is parameterized by the character type <tt>CharT</tt> and can
32  work efficiently with 8, 16 and 32 and even 64 bit characters.</p>
33<pre><span class=identifier>    </span><span class=keyword>template </span><span class=special>&lt;</span><span class=keyword>typename </span><span class=identifier>CharT </span><span class=special>= </span><span class=keyword>char</span><span class=special>&gt;
34    </span><span class=keyword>class </span><span class=identifier>chset</span><span class=special>;</span></pre>
35<p>The <tt>chset</tt> is constructed from literals (e.g. <tt>'x'</tt>), <tt>ch_p</tt> 
36  or <tt>chlit&lt;&gt;</tt>, <tt>range_p</tt> or <tt>range&lt;&gt;</tt>, <tt>anychar_p</tt> 
37  and <tt>nothing_p</tt> (see <a href="primitives.html">primitives</a>) or copy-constructed
38  from another <tt>chset</tt>. The <tt>chset</tt> class uses a copy-on-write scheme
39  that enables instances to be passed along easily by value.</p>
40<table width="80%" border="0" align="center">
41  <tr> 
42    <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Sparse
43      bit vectors</b><br>
44      <br>
45      To accomodate 16/32 and 64 bit characters, the <tt>chset</tt> class
46      statically switches from a <tt>std::bitset</tt> implementation when the
47      character type is not greater than 8 bits, to a sparse bit/boolean set which
48      uses a sorted vector of disjoint ranges (<tt>range_run</tt>). The set is
49      constructed from ranges such that adjacent or overlapping ranges are coalesced.<br>
50      <br>
51      range_runs are very space-economical in situations where there are lots
52      of ranges and a few individual disjoint values. Searching is O(log n) where
53      n is the number of ranges.</td>
54  </tr>
55</table>
56<p> Examples:<br>
57</p>
58<pre><span class=identifier>    </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s1</span><span class=special>(</span><span class=literal>'x'</span><span class=special>);
59    </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s2</span><span class=special>(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>s1</span><span class=special>);</span></pre>
60<p>Optionally, character sets may also be constructed using a definition string
61  following a syntax that resembles posix style regular expression character sets,
62  except that double quotes delimit the set elements instead of square brackets
63  and there is no special negation <tt>^</tt> character.</p>
64<pre>    <span class=identifier>range </span><span class=special>= </span><span class=identifier>anychar_p </span><span class=special>&gt;&gt; </span><span class=literal>'-' </span><span class=special>&gt;&gt; </span><span class=identifier>anychar_p</span><span class=special>;
65    </span><span class=identifier>set </span><span class=special>= *(</span><span class=identifier>range_p </span><span class=special>| </span><span class=identifier>anychar_p</span><span class=special>);</span></pre>
66<p>Since we are defining the set using a C string, the usual C/C++ literal string
67  syntax rules apply. Examples:<br>
68</p>
69<pre>    <span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s1</span><span class=special>(</span><span class=string>&quot;a-zA-Z&quot;</span><span class=special>);       </span><span class=comment>// alphabetic characters
70    </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s2</span><span class=special>(</span><span class=string>&quot;0-9a-fA-F&quot;</span><span class=special>);    </span><span class=comment>// hexadecimal characters
71    </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s3</span><span class=special>(</span><span class=string>&quot;actgACTG&quot;</span><span class=special>);     </span><span class=comment>// DNA identifiers
72    </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s4</span><span class=special>(</span><span class=string>&quot;\x7f\x7e&quot;</span><span class=special>);     </span><span class=comment>// Hexadecimal 0x7F and 0x7E</span></pre>
73<p>The standard Spirit set operators apply (see <a href="operators.html">operators</a>)
74  plus an additional character-set-specific inverse (negation <tt>~</tt>) operator:<span class=comment></span></p>
75
76<table width="90%" border="0" align="center">
77  <tr> 
78    <td class="table_title" colspan="2">Character set operators</td>
79  </tr>
80  <tr> 
81    <td class="table_cells" width="28%"><b>~a</b></td>
82    <td class="table_cells" width="72%">Set inverse</td>
83  </tr>
84  <tr> 
85    <td class="table_cells" width="28%"><b>a | b</b></td>
86    <td class="table_cells" width="72%">Set union</td>
87  </tr>
88  <tr> 
89    <td class="table_cells" width="28%"><b>a &amp; </b></td>
90    <td class="table_cells" width="72%">Set intersection</td>
91  </tr>
92  <tr> 
93    <td class="table_cells" width="28%"><b>a - b</b></td>
94    <td class="table_cells" width="72%">Set difference</td>
95  </tr>
96  <tr> 
97    <td class="table_cells" width="28%"><b>a ^ b</b></td>
98    <td class="table_cells" width="72%">Set xor</td>
99  </tr>
100</table>
101<p></p>
102<p></p>
103<p></p>
104<p></p>
105<p></p>
106<p></p>
107<p></p>
108<p></p>
109<p>where operands a and b are both <tt>chsets</tt> or one of the operand is either
110  a literal character, <tt>ch_p</tt> or <tt>chlit</tt>, <tt>range_p</tt> or <tt>range</tt>,
111  <tt>anychar_p</tt> or <tt>nothing_p</tt>. Special optimized overloads are provided
112  for <tt>anychar_p</tt> and <tt>nothing_p</tt> operands. A <tt>nothing_p</tt> 
113  operand is converted to an empty set, while an <tt>anychar_p</tt> operand is
114  converted to a set having elements of the full range of the character type used
115  (e.g. 0-255 for unsigned 8 bit chars).</p>
116<p>A special case is <tt>~anychar_p</tt> which yields <tt>nothing_p</tt>, but
117  <tt>~nothing_p</tt> is illegal. Inversion of <tt>anychar_p</tt> is asymmetrical,
118  a one-way trip comparable to converting <tt>T*</tt> to a <tt>void*.</tt></p>
119<table width="90%" border="0" align="center">
120  <tr> 
121    <td class="table_title" colspan="2">Special conversions</td>
122  </tr>
123  <tr> 
124    <td class="table_cells" width="28%"><b>chset&lt;CharT&gt;(nothing_p)</b></td>
125    <td class="table_cells" width="72%">empty set</td>
126  </tr>
127  <tr> 
128    <td class="table_cells" width="28%"><b>chset&lt;CharT&gt;(anychar_p)</b></td>
129    <td class="table_cells" width="72%">full range of CharT (e.g. 0-255 for unsigned
130      8 bit chars)</td>
131  </tr>
132  <tr> 
133    <td class="table_cells" width="28%"><b>~anychar_p</b></td>
134    <td class="table_cells" width="72%">nothing_p</td>
135  </tr>
136  <tr> 
137    <td class="table_cells" width="28%"><b>~nothing_p</b></td>
138    <td class="table_cells" width="72%">illegal</td>
139  </tr>
140</table>
141
142<p></p><table border="0">
143  <tr> 
144    <td width="10"></td>
145    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
146    <td width="30"><a href="loops.html"><img src="theme/l_arr.gif" border="0"></a></td>
147    <td width="30"><a href="confix.html"><img src="theme/r_arr.gif" border="0"></a></td>
148  </tr>
149</table>
150<br>
151<hr size="1">
152<p class="copyright">Copyright &copy; 1998-2003 Joel de Guzman<br>
153  <br>
154<font size="2">Use, modification and distribution is subject to the Boost Software
155    License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
156    http://www.boost.org/LICENSE_1_0.txt) </font> </p>
157</body>
158</html>
Note: See TracBrowser for help on using the repository browser.