Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_33_1/libs/tokenizer/char_separator.htm @ 12

Last change on this file since 12 was 12, checked in by landauf, 17 years ago

added boost

File size: 7.6 KB
Line 
1<html>
2
3<head>
4<meta http-equiv="Content-Type"
5content="text/html; charset=iso-8859-1">
6<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
7<title>Boost Char Separator</title>
8<!--
9  -- Copyright © Jeremy Siek and John Bandela 2001-2002
10  --
11  -- Permission to use, copy, modify, distribute and sell this software
12  -- and its documentation for any purpose is hereby granted without fee,
13  -- provided that the above copyright notice appears in all copies and
14  -- that both that copyright notice and this permission notice appear
15  -- in supporting documentation.  Jeremy Siek makes no
16  -- representations about the suitability of this software for any
17  -- purpose.  It is provided "as is" without express or implied warranty.
18  -->
19</head>
20
21<body bgcolor="#FFFFFF" text="#000000" link="#0000EE"
22vlink="#551A8B" alink="#FF0000">
23
24<p><img src="../../boost.png" alt="C++ Boost" width="277"
25height="86"> <br>
26</p>
27
28<h1>
29char_separator&lt;Char, Traits&gt;
30</h1>
31
32<p>
33The <tt>char_separator</tt> class breaks a sequence of characters into
34tokens based on character delimiters much in the same way that
35<tt>strtok()</tt> does (but without all the evils of non-reentrancy
36and destruction of the input sequence).
37</p>
38
39<p>
40The <tt>char_separator</tt> class is used in conjunction with the <a
41href="token_iterator.htm"><tt>token_iterator</tt></a> or <a
42href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.
43</p>
44
45<h2>Definitions</h2>
46
47<p>
48The <tt>strtok()</tt> function does not include matches with the
49character delimiters in the output sequence of tokens. However,
50sometimes it is useful to have the delimiters show up in the output
51sequence, therefore <tt>char_separator</tt> provides this as an
52option.  We refer to delimiters that show up as output tokens as
53<b><i>kept delimiters</i></b> and delimiters that do now show up as
54output tokens as <b><i>dropped delimiters</i></b>.
55</p>
56
57<p>
58When two delimiters appear next to each other in the input sequence,
59there is the question of whether to output an <b><i>empty
60token</i></b> or to skip ahead. The behaviour of <tt>strtok()</tt> is
61to skip ahead. The <tt>char_separator</tt> class provides both
62options.
63</p>
64
65
66<h2>Examples</h2>
67
68<p>
69This first examples shows how to use <tt>char_separator</tt> as a
70replacement for the <tt>strtok()</tt> function. We've specified three
71character delimiters, and they will not show up as output tokens.  We
72have not specified any kept delimiters, and by default any empty
73tokens will be ignored.
74</p>
75
76<blockquote>
77<pre>// char_sep_example_1.cpp
78#include &lt;iostream&gt;
79#include &lt;boost/tokenizer.hpp&gt;
80#include &lt;string&gt;
81
82int main()
83{
84  std::string str = &quot;;;Hello|world||-foo--bar;yow;baz|&quot;;
85  typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; 
86    tokenizer;
87  boost::char_separator&lt;char&gt; sep(&quot;-;|&quot;);
88  tokenizer tokens(str, sep);
89  for (tokenizer::iterator tok_iter = tokens.begin();
90       tok_iter != tokens.end(); ++tok_iter)
91    std::cout &lt;&lt; &quot;&lt;&quot; &lt;&lt; *tok_iter &lt;&lt; &quot;&gt; &quot;;
92  std::cout &lt;&lt; &quot;\n&quot;;
93  return EXIT_SUCCESS;
94}
95</pre>
96</blockquote>
97The output is:
98<blockquote>
99<pre>
100&lt;Hello&gt; &lt;world&gt; &lt;foo&gt; &lt;bar&gt; &lt;yow&gt; &lt;baz&gt; 
101</pre>
102</blockquote>
103
104
105<p>
106The next example shows tokenizing with two dropped delimiters '-' and
107';' and a single kept delimiter '|'. We also specify that empty tokens
108should show up in the output when two delimiters are next to each
109other.
110</p>
111
112<blockquote>
113<pre>// char_sep_example_2.cpp
114#include &lt;iostream&gt;
115#include &lt;boost/tokenizer.hpp&gt;
116#include &lt;string&gt;
117
118int main()
119{
120    std::string str = &quot;;;Hello|world||-foo--bar;yow;baz|&quot;;
121    typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; 
122        tokenizer;
123    boost::char_separator&lt;char&gt; sep(&quot;-;&quot;, &quot;|&quot;, boost::keep_empty_tokens);
124    tokenizer tokens(str, sep);
125    for (tokenizer::iterator tok_iter = tokens.begin();
126         tok_iter != tokens.end(); ++tok_iter)
127      std::cout &lt;&lt; &quot;&lt;&quot; &lt;&lt; *tok_iter &lt;&lt; &quot;&gt; &quot;;
128    std::cout &lt;&lt; &quot;\n&quot;;
129    return EXIT_SUCCESS;
130}
131</pre>
132</blockquote>
133The output is:
134<blockquote>
135<pre>
136&lt;&gt; &lt;&gt; &lt;Hello&gt; &lt;|&gt; &lt;world&gt; &lt;|&gt; &lt;&gt; &lt;|&gt; &lt;&gt; &lt;foo&gt; &lt;&gt; &lt;bar&gt; &lt;yow&gt; &lt;baz&gt; &lt;|&gt; &lt;&gt;
137</pre>
138</blockquote>
139
140<p>
141The final example shows tokenizing on punctuation and whitespace
142characters using the default constructor of the
143<tt>char_separator</tt>.
144</p>
145
146<blockquote>
147<pre>// char_sep_example_3.cpp
148#include &lt;iostream&gt;
149#include &lt;boost/tokenizer.hpp&gt;
150#include &lt;string&gt;
151
152int main()
153{
154   std::string str = "This is,  a test";
155   typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; Tok;
156   boost::char_separator&lt;char&gt; sep; // default constructed
157   Tok tok(str, sep);
158   for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
159     std::cout &lt;&lt; "&lt;" &lt;&lt; *tok_iter &lt;&lt; "&gt; ";
160   std::cout &lt;&lt; "\n";
161   return EXIT_SUCCESS;
162}
163</pre>
164</blockquote>
165The output is:
166<blockquote>
167<pre>
168&lt;This&gt; &lt;is&gt; &lt;,&gt; &lt;a&gt; &lt;test&gt; 
169</pre>
170</blockquote>
171
172<h2>Template parameters</h2>
173
174<P>
175<table border>
176<TR>
177<th>Parameter</th><th>Description</th><th>Default</th>
178</tr>
179
180<TR><TD><TT>Char</TT></TD>
181<TD>The type of elements within a token, typically <tt>char</tt>.</TD>
182<TD>&nbsp;</TD>
183</TR>
184
185<TR><TD><TT>Traits</TT></TD>
186<TD>The <tt>char_traits</tt> for the character type.</TD>
187<TD><tt>char_traits&lt;char&gt;</tt></TD>
188</TR>
189
190</table>
191
192<h2>Model of</h2>
193
194<a href="tokenizerfunction.htm">Tokenizer Function</a>
195
196
197<h2>Members</h2>
198
199<hr>
200<pre>
201explicit char_separator(const Char* dropped_delims,
202                        const Char* kept_delims = &quot;&quot;,
203                        empty_token_policy empty_tokens = drop_empty_tokens)
204</pre>
205
206<p>
207This creates a <tt>char_separator</tt> object, which can then be used
208to create a <a href="token_iterator.htm"><tt>token_iterator</tt></a>
209or <a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform
210tokenizing. The <tt>dropped_delims</tt> and <tt>kept_delims</tt> are
211strings of characters where each character is used as delimiter during
212tokenizing. Whenever a delimiter is seen in the input sequence, the
213current token is finished, and a new token begins.
214
215The delimiters in <tt>dropped_delims</tt> do not show up as tokens in
216the output whereas the delimiters in <tt>kept_delims</tt> do show up
217as tokens.  If <tt>empty_tokens</tt> is <tt>drop_empty_tokens</tt>,
218then empty tokens will not show up in the output. If
219<tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty tokens
220will show up in the output.
221</p>
222
223<hr>
224
225<pre>
226explicit char_separator()
227</pre>
228<p>
229The function <tt>std::isspace()</tt> is used to identify dropped
230delimiters and <tt>std::ispunct()</tt> is used to identify kept
231delimiters. In addition, empty tokens are dropped.
232</p>
233
234<hr>
235
236<pre>
237template &lt;typename InputIterator, typename Token&gt;
238bool operator()(InputIterator&amp; next, InputIterator end, Token&amp; tok)
239</pre>
240
241<p>
242This function is called by the <a
243href="token_iterator.htm"><tt>token_iterator</tt></a> to perform
244tokenizing. The user typically does not call this function directly.
245</p>
246
247
248<hr>
249
250<p>© Copyright Jeremy Siek and John R. Bandela 2001-2002. Permission
251to copy, use, modify, sell and distribute this document is granted
252provided this copyright notice appears in all copies. This document is
253provided &quot;as is&quot; without express or implied warranty, and
254with no claim as to its suitability for any purpose.</p>
255</body>
256</html>
Note: See TracBrowser for help on using the repository browser.