1 | <html> |
---|
2 | <head> |
---|
3 | <title>Symbols</title> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> |
---|
6 | </head> |
---|
7 | |
---|
8 | <body> |
---|
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
---|
10 | <tr> |
---|
11 | <td width="10"> |
---|
12 | </td> |
---|
13 | <td width="85%"> |
---|
14 | <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Symbols</b></font> |
---|
15 | </td> |
---|
16 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
---|
17 | </tr> |
---|
18 | </table> |
---|
19 | <br> |
---|
20 | <table border="0"> |
---|
21 | <tr> |
---|
22 | <td width="10"></td> |
---|
23 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
24 | <td width="30"><a href="distinct.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
25 | <td width="30"><a href="trees.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
26 | </tr> |
---|
27 | </table> |
---|
28 | <p>This class symbols implements a symbol table. The symbol table holds a dictionary |
---|
29 | of symbols where each symbol is a sequence of CharTs (a <tt>char</tt>, <tt>wchar_t</tt>, |
---|
30 | <tt>int</tt>, enumeration etc.) . The template class, parameterized by the character |
---|
31 | type (CharT), can work efficiently with 8, 16, 32 and even 64 bit characters. |
---|
32 | Mutable data of type T is associated with each symbol.<br> |
---|
33 | </p> |
---|
34 | <p>Traditionally, symbol table management is maintained seperately outside the |
---|
35 | BNF grammar through semantic actions. Contrary to standard practice, the Spirit |
---|
36 | symbol table class <tt>symbols</tt> is-a parser. An instance of which may be |
---|
37 | used anywhere in the EBNF grammar specification. It is an example of a dynamic |
---|
38 | parser. A dynamic parser is characterized by its ability to modify its behavior |
---|
39 | at run time. Initially, an empty symbols object matches nothing. At any time, |
---|
40 | symbols may be added, thus, dynamically altering its behavior.</p> |
---|
41 | <p>Each entry in a symbol table has an associated mutable data slot. In this regard, |
---|
42 | one can view the symbol table as an associative container (or map) of key-value |
---|
43 | pairs where the keys are strings. </p> |
---|
44 | <p>The symbols class expects two template parameters (actually there is a third, |
---|
45 | see detail box). The first parameter <tt>T</tt> specifies the data type associated |
---|
46 | with each symbol (defaults to <tt>int</tt>) and the second parameter <tt>CharT</tt> |
---|
47 | specifies the character type of the symbols (defaults to <tt>char</tt>). </p> |
---|
48 | <pre><span class=identifier> </span><span class=keyword>template |
---|
49 | </span><span class=special>< |
---|
50 | </span><span class=keyword>typename </span><span class=identifier>T </span><span class=special>= </span><span class=keyword>int</span><span class=special>, |
---|
51 | </span><span class=keyword>typename </span><span class=identifier>CharT </span><span class=special>= </span><span class=keyword>char</span><span class=special>, |
---|
52 | </span><span class=keyword>typename </span><span class=identifier>SetT </span><span class=special>= </span><span class=identifier>impl</span><span class=special>::</span><span class=identifier>tst</span><span class=special><</span><span class=identifier>T</span><span class=special>, </span><span class=identifier>CharT</span><span class=special>> |
---|
53 | </span><span class=special>> |
---|
54 | </span><span class=keyword>class </span><span class=identifier>symbols</span><span class=special>;</span></pre> |
---|
55 | <table width="80%" border="0" align="center"> |
---|
56 | <tr> |
---|
57 | <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Ternary |
---|
58 | State Trees</b><br> |
---|
59 | <br> |
---|
60 | The actual set implementation is supplied by the SetT template parameter |
---|
61 | (3rd template parameter of the symbols class) . By default, this uses the |
---|
62 | tst class which is an implementation of the Ternary Search Tree. <br> |
---|
63 | <br> |
---|
64 | Ternary Search Trees are faster than hashing for many typical search problems |
---|
65 | especially when the search interface is iterator based. Searching for a |
---|
66 | string of length k in a ternary search tree with n strings will require |
---|
67 | at most O(log n+k) character comparisons. TSTs are many times faster than |
---|
68 | hash tables for unsuccessful searches since mismatches are discovered earlier |
---|
69 | after examining only a few characters. Hash tables always examine an entire |
---|
70 | key when searching.<br> |
---|
71 | <br> |
---|
72 | For details see <a href="http://www.cs.princeton.edu/%7Ers/strings/">http://www.cs.princeton.edu/~rs/strings/</a>.</td> |
---|
73 | </tr> |
---|
74 | </table> |
---|
75 | <p>Here are some sample declarations:</p> |
---|
76 | <pre><span class=identifier> </span><span class=identifier>symbols</span><span class=special><> </span><span class=identifier>sym</span><span class=special>; |
---|
77 | </span><span class=identifier>symbols</span><span class=special><</span><span class=keyword>short</span><span class=special>, </span><span class=keyword>wchar_t</span><span class=special>> </span><span class=identifier>sym2</span><span class=special>; |
---|
78 | |
---|
79 | </span><span class=keyword>struct </span><span class=identifier>my_info |
---|
80 | </span><span class=special>{ |
---|
81 | </span><span class=keyword>int </span><span class=identifier>id</span><span class=special>; |
---|
82 | </span><span class=keyword>double </span><span class=identifier>value</span><span class=special>; |
---|
83 | </span><span class=special>}; |
---|
84 | |
---|
85 | </span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>my_info</span><span class=special>> </span><span class=identifier>sym3</span><span class=special>;</span></pre> |
---|
86 | <p>After having declared our symbol tables, symbols may be added statically using |
---|
87 | the construct:</p> |
---|
88 | <pre><span class=identifier> sym </span><span class=special>= </span><span class=identifier>a</span><span class=special>, </span><span class=identifier>b</span><span class=special>, </span><span class=identifier>c</span><span class=special>, </span><span class=identifier>d </span><span class=special>...;</span></pre> |
---|
89 | <p>where <tt>sym</tt> is a symbol table and <tt>a..d</tt> etc. are strings. <img src="theme/note.gif" width="16" height="16">Note |
---|
90 | that the comma operator is separating the items being added to the symbol table, |
---|
91 | through an assignment. Due to operator overloading this is possible and correct |
---|
92 | (though it may take a little getting used to) and is a concise way to initialize |
---|
93 | the symbol table with many symbols. Also, it is perfectly valid to make multiple |
---|
94 | assignments to a symbol table to iteratively add symbols (or groups of symbols) |
---|
95 | at different times.</p> |
---|
96 | <p>Simple example:<br> |
---|
97 | </p> |
---|
98 | <pre><span class=identifier> sym </span><span class=special>= </span><span class=string>"pineapple"</span><span class=special>, </span><span class=string>"orange"</span><span class=special>, </span><span class=string>"banana"</span><span class=special>, </span><span class=string>"apple"</span><span class=special>, </span><span class=string>"mango"</span><span class=special>;</span></pre> |
---|
99 | <p>Note that it is invalid to add the same symbol multiple times to a symbol table, |
---|
100 | though you may modify the value associated with a symbol artibrarily many times.</p> |
---|
101 | <p>Now, we may use sym in the grammar. Example:</p> |
---|
102 | <pre><span class=identifier> fruits </span><span class=special>= </span><span class=identifier>sym </span><span class=special>>> </span><span class=special>*(</span><span class=literal>',' </span><span class=special>>> </span><span class=identifier>sym</span><span class=special>);</span></pre> |
---|
103 | <p>Alternatively, symbols may be added dynamically through the member functor |
---|
104 | <tt>add</tt> (see <tt><a href="#symbol_inserter">symbol_inserter</a></tt> below). |
---|
105 | The member functor <tt>add</tt> may be attached to a parser as a semantic action |
---|
106 | taking in a begin/end pair:</p> |
---|
107 | <pre><span class=identifier> p</span><span class=special>[</span><span class=identifier>sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>]</span></pre> |
---|
108 | <p>where p is a parser (and sym is a symbol table). On success, the matching portion |
---|
109 | of the input is added to the symbol table.</p> |
---|
110 | <p><tt>add</tt> may also be used to directly initialize data. Examples:</p> |
---|
111 | <pre><span class=identifier> sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>(</span><span class=string>"hello"</span><span class=special>, </span><span class=number>1</span><span class=special>)(</span><span class=string>"crazy"</span><span class=special>, </span><span class=number>2</span><span class=special>)(</span><span class=string>"world"</span><span class=special>, </span><span class=number>3</span><span class=special>);</span></pre> |
---|
112 | <p>Assuming of course that the data slot associated with <tt>sym</tt> is an integer.</p> |
---|
113 | <p>The data associated with each symbol may be modified any time. The most obvious |
---|
114 | way of course is through <a href="semantic_actions.html">semantic actions</a>. |
---|
115 | A function or functor, as usual, may be attached to the symbol table. The symbol |
---|
116 | table expects a function or functor compatible with the signature:</p> |
---|
117 | <p><b>Signature for functions:</b></p> |
---|
118 | <pre><code><font color="#000000"><span class=identifier> </span><span class=keyword>void </span><span class=identifier>func</span><span class=special>(</span><span class=identifier>T</span><span class="special">&</span><span class=identifier> data</span><span class=special>);</span></font></code></pre> |
---|
119 | <p><b>Signature for functors:</b><br> |
---|
120 | </p> |
---|
121 | <pre><code><font color="#000000"><span class=special> </span><span class=keyword>struct </span><span class=identifier>ftor |
---|
122 | </span><span class=special>{ |
---|
123 | </span><span class=keyword>void </span><span class=keyword>operator</span><span class=special>()(</span><span class=identifier>T</span><span class="special">&</span><span class=identifier> data</span><span class=special>) </span><span class=keyword>const</span><span class=special>; |
---|
124 | </span><span class=special>};</span></font></code></pre> |
---|
125 | <p>Where <tt>T</tt> is the data type of the symbol table (the <tt>T</tt> in its |
---|
126 | template parameter list). When the symbol table successfully matches something |
---|
127 | from the input, the data associated with the matching entry in the symbol table |
---|
128 | is reported to the semantic action.</p> |
---|
129 | <h2>Symbol table utilities</h2> |
---|
130 | <p>Sometimes, one may wish to deal with the symbol table directly. Provided are |
---|
131 | some symbol table utilities.</p> |
---|
132 | <p><b>add</b></p> |
---|
133 | <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>T</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>CharT</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>SetT</span><span class=special>> |
---|
134 | </span><span class=identifier>T</span><span class=special>* </span><span class=identifier>add</span><span class=special>(</span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>T</span><span class=special>, </span><span class=identifier>CharT</span><span class=special>, </span><span class=identifier>SetT</span><span class=special>>& </span><span class=identifier>table</span><span class=special>, </span><span class=identifier>CharT </span><span class=keyword>const</span><span class=special>* </span><span class=identifier>sym</span><span class=special>, </span><span class=identifier>T </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>data </span><span class=special>= </span><span class=identifier>T</span><span class=special>());</span></pre> |
---|
135 | <p>adds a symbol <tt>sym</tt> (C string) to a symbol table <tt>table</tt> plus |
---|
136 | an optional data <tt>data</tt> associated with the symbol. Returns a pointer |
---|
137 | to the data associated with the symbol or <tt>NULL</tt> if add failed (e.g. |
---|
138 | when the symbol is already added before).<br> |
---|
139 | <br> |
---|
140 | <b>find</b></p> |
---|
141 | <pre><span class=special> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>T</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>CharT</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>SetT</span><span class=special>> |
---|
142 | </span><span class=identifier>T</span><span class=special>* </span><span class=identifier>find</span><span class=special>(</span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>T</span><span class=special>, </span><span class=identifier>CharT</span><span class=special>, </span><span class=identifier>SetT</span><span class=special>> </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>table</span><span class=special>, </span><span class=identifier>CharT </span><span class=keyword>const</span><span class=special>* </span><span class=identifier>sym</span><span class=special>);</span></pre> |
---|
143 | <p>finds a symbol <tt>sym</tt> (C string) from a symbol table <tt>table</tt>. |
---|
144 | Returns a pointer to the data associated with the symbol or <tt>NULL</tt> if |
---|
145 | not found</p> |
---|
146 | <h2><a name="symbol_inserter"></a>symbol_inserter</h2> |
---|
147 | <p>The symbols class holds an instance of this class named <tt>add</tt>. This |
---|
148 | can be called directly just like a member function, passing in a first/last |
---|
149 | iterator and optional data:<br> |
---|
150 | <br> |
---|
151 | </p> |
---|
152 | <pre><span class=identifier> sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>(</span><span class=identifier>first</span><span class=special>, </span><span class=identifier>last</span><span class=special>, </span><span class=identifier>data</span><span class=special>);</span></pre> |
---|
153 | <p>Or, passing in a C string and optional data:<br> |
---|
154 | </p> |
---|
155 | <pre><span class=identifier> sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>(</span><span class=identifier>c_string</span><span class=special>, </span><span class=identifier>data</span><span class=special>);</span></pre> |
---|
156 | <p>where <tt>sym</tt> is a symbol table. The <tt>data</tt> argument is optional. |
---|
157 | The nice thing about this scheme is that it can be cascaded. We've seen this |
---|
158 | applied above. Here's a snippet from the roman numerals parser</p> |
---|
159 | <pre> <span class=comment>// Parse roman numerals (1..9) using the symbol table. |
---|
160 | |
---|
161 | </span> <span class=keyword>struct </span><span class=identifier>ones </span><span class=special>: </span><span class=identifier>symbols</span><span class=special><</span><span class=keyword>unsigned</span><span class=special>> |
---|
162 | </span><span class=special>{ |
---|
163 | </span><span class=identifier>ones</span><span class=special>() |
---|
164 | </span><span class=special>{ |
---|
165 | </span><span class=identifier>add |
---|
166 | </span><span class=special>(</span><span class=string>"I" </span><span class=special>, </span><span class=number>1</span><span class=special>) |
---|
167 | </span><span class=special>(</span><span class=string>"II" </span><span class=special>, </span><span class=number>2</span><span class=special>) |
---|
168 | </span><span class=special>(</span><span class=string>"III" </span><span class=special>, </span><span class=number>3</span><span class=special>) |
---|
169 | </span><span class=special>(</span><span class=string>"IV" </span><span class=special>, </span><span class=number>4</span><span class=special>) |
---|
170 | </span><span class=special>(</span><span class=string>"V" </span><span class=special>, </span><span class=number>5</span><span class=special>) |
---|
171 | </span><span class=special>(</span><span class=string>"VI" </span><span class=special>, </span><span class=number>6</span><span class=special>) |
---|
172 | </span><span class=special>(</span><span class=string>"VII" </span><span class=special>, </span><span class=number>7</span><span class=special>) |
---|
173 | </span><span class=special>(</span><span class=string>"VIII" </span><span class=special>, </span><span class=number>8</span><span class=special>) |
---|
174 | </span><span class=special>(</span><span class=string>"IX" </span><span class=special>, </span><span class=number>9</span><span class=special>) |
---|
175 | </span><span class=special>; |
---|
176 | </span><span class=special>} |
---|
177 | |
---|
178 | </span><span class=special>} </span><span class=identifier>ones_p</span><span class=special>;</span></pre> |
---|
179 | <p>Notice that a user defined struct <tt>ones</tt> is subclassed from <tt>symbols</tt>. |
---|
180 | Then at construction time, we added all the symbols using the <tt>add</tt> symbol_inserter.</p> |
---|
181 | <p> <img height="16" width="15" src="theme/lens.gif"> The full source code can be <a href="../example/fundamental/roman_numerals.cpp">viewed here</a>. This is part of the Spirit distribution.</p> |
---|
182 | <p>Again, <tt>add</tt> may also be used as a semantic action since it conforms |
---|
183 | to the action interface (see semantic actions):<br> |
---|
184 | </p> |
---|
185 | <pre><span class=special></span><span class=identifier> p</span><span class=special>[</span><span class=identifier>sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>]</span></pre> |
---|
186 | <p>where p is a parser of course.<span class=special><br> |
---|
187 | </span></p> |
---|
188 | <table border="0"> |
---|
189 | <tr> |
---|
190 | <td width="10"></td> |
---|
191 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
---|
192 | <td width="30"><a href="distinct.html"><img src="theme/l_arr.gif" border="0"></a></td> |
---|
193 | <td width="30"><a href="trees.html"><img src="theme/r_arr.gif" border="0"></a></td> |
---|
194 | </tr> |
---|
195 | </table> |
---|
196 | <br> |
---|
197 | <hr size="1"> |
---|
198 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
---|
199 | <br> |
---|
200 | <font size="2">Use, modification and distribution is subject to the Boost Software |
---|
201 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
202 | http://www.boost.org/LICENSE_1_0.txt)</font></p> |
---|
203 | </body> |
---|
204 | </html> |
---|