Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/tokenizer/char_separator.htm @ 45

Last change on this file since 45 was 29, checked in by landauf, 17 years ago

updated boost from 1_33_1 to 1_34_1

File size: 7.7 KB
Line 
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2
3<html>
4<head>
5  <meta http-equiv="Content-Language" content="en-us">
6  <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
7  <meta name="GENERATOR" content="Microsoft FrontPage 6.0">
8  <meta name="ProgId" content="FrontPage.Editor.Document">
9
10  <title>Boost Char Separator</title>
11</head>
12
13<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
14"#FF0000">
15  <p><img src="../../boost.png" alt="C++ Boost" width="277" height=
16  "86"><br></p>
17
18  <h1>char_separator&lt;Char, Traits&gt;</h1>
19
20  <p>The <tt>char_separator</tt> class breaks a sequence of characters into
21  tokens based on character delimiters much in the same way that
22  <tt>strtok()</tt> does (but without all the evils of non-reentrancy and
23  destruction of the input sequence).</p>
24
25  <p>The <tt>char_separator</tt> class is used in conjunction with the
26  <a href="token_iterator.htm"><tt>token_iterator</tt></a> or <a href=
27  "tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.</p>
28
29  <h2>Definitions</h2>
30
31  <p>The <tt>strtok()</tt> function does not include matches with the
32  character delimiters in the output sequence of tokens. However, sometimes
33  it is useful to have the delimiters show up in the output sequence,
34  therefore <tt>char_separator</tt> provides this as an option. We refer to
35  delimiters that show up as output tokens as <b><i>kept delimiters</i></b>
36  and delimiters that do now show up as output tokens as <b><i>dropped
37  delimiters</i></b>.</p>
38
39  <p>When two delimiters appear next to each other in the input sequence,
40  there is the question of whether to output an <b><i>empty token</i></b> or
41  to skip ahead. The behaviour of <tt>strtok()</tt> is to skip ahead. The
42  <tt>char_separator</tt> class provides both options.</p>
43
44  <h2>Examples</h2>
45
46  <p>This first examples shows how to use <tt>char_separator</tt> as a
47  replacement for the <tt>strtok()</tt> function. We've specified three
48  character delimiters, and they will not show up as output tokens. We have
49  not specified any kept delimiters, and by default any empty tokens will be
50  ignored.</p>
51
52  <blockquote>
53    <pre>
54// char_sep_example_1.cpp
55#include &lt;iostream&gt;
56#include &lt;boost/tokenizer.hpp&gt;
57#include &lt;string&gt;
58
59int main()
60{
61  std::string str = ";;Hello|world||-foo--bar;yow;baz|";
62  typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; 
63    tokenizer;
64  boost::char_separator&lt;char&gt; sep("-;|");
65  tokenizer tokens(str, sep);
66  for (tokenizer::iterator tok_iter = tokens.begin();
67       tok_iter != tokens.end(); ++tok_iter)
68    std::cout &lt;&lt; "&lt;" &lt;&lt; *tok_iter &lt;&lt; "&gt; ";
69  std::cout &lt;&lt; "\n";
70  return EXIT_SUCCESS;
71}
72</pre>
73  </blockquote>The output is:
74
75  <blockquote>
76    <pre>
77&lt;Hello&gt; &lt;world&gt; &lt;foo&gt; &lt;bar&gt; &lt;yow&gt; &lt;baz&gt; 
78</pre>
79  </blockquote>
80
81  <p>The next example shows tokenizing with two dropped delimiters '-' and
82  ';' and a single kept delimiter '|'. We also specify that empty tokens
83  should show up in the output when two delimiters are next to each
84  other.</p>
85
86  <blockquote>
87    <pre>
88// char_sep_example_2.cpp
89#include &lt;iostream&gt;
90#include &lt;boost/tokenizer.hpp&gt;
91#include &lt;string&gt;
92
93int main()
94{
95    std::string str = ";;Hello|world||-foo--bar;yow;baz|";
96    typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; 
97        tokenizer;
98    boost::char_separator&lt;char&gt; sep("-;", "|", boost::keep_empty_tokens);
99    tokenizer tokens(str, sep);
100    for (tokenizer::iterator tok_iter = tokens.begin();
101         tok_iter != tokens.end(); ++tok_iter)
102      std::cout &lt;&lt; "&lt;" &lt;&lt; *tok_iter &lt;&lt; "&gt; ";
103    std::cout &lt;&lt; "\n";
104    return EXIT_SUCCESS;
105}
106</pre>
107  </blockquote>The output is:
108
109  <blockquote>
110    <pre>
111&lt;&gt; &lt;&gt; &lt;Hello&gt; &lt;|&gt; &lt;world&gt; &lt;|&gt; &lt;&gt; &lt;|&gt; &lt;&gt; &lt;foo&gt; &lt;&gt; &lt;bar&gt; &lt;yow&gt; &lt;baz&gt; &lt;|&gt; &lt;&gt;
112</pre>
113  </blockquote>
114
115  <p>The final example shows tokenizing on punctuation and whitespace
116  characters using the default constructor of the
117  <tt>char_separator</tt>.</p>
118
119  <blockquote>
120    <pre>
121// char_sep_example_3.cpp
122#include &lt;iostream&gt;
123#include &lt;boost/tokenizer.hpp&gt;
124#include &lt;string&gt;
125
126int main()
127{
128   std::string str = "This is,  a test";
129   typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; Tok;
130   boost::char_separator&lt;char&gt; sep; // default constructed
131   Tok tok(str, sep);
132   for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
133     std::cout &lt;&lt; "&lt;" &lt;&lt; *tok_iter &lt;&lt; "&gt; ";
134   std::cout &lt;&lt; "\n";
135   return EXIT_SUCCESS;
136}
137</pre>
138  </blockquote>The output is:
139
140  <blockquote>
141    <pre>
142&lt;This&gt; &lt;is&gt; &lt;,&gt; &lt;a&gt; &lt;test&gt; 
143</pre>
144  </blockquote>
145
146  <h2>Template parameters</h2>
147
148  <table border summary="">
149    <tr>
150      <th>Parameter</th>
151
152      <th>Description</th>
153
154      <th>Default</th>
155    </tr>
156
157    <tr>
158      <td><tt>Char</tt></td>
159
160      <td>The type of elements within a token, typically <tt>char</tt>.</td>
161
162      <td>&nbsp;</td>
163    </tr>
164
165    <tr>
166      <td><tt>Traits</tt></td>
167
168      <td>The <tt>char_traits</tt> for the character type.</td>
169
170      <td><tt>char_traits&lt;char&gt;</tt></td>
171    </tr>
172  </table>
173
174  <h2>Model of</h2><a href="tokenizerfunction.htm">Tokenizer Function</a>
175
176  <h2>Members</h2>
177  <hr>
178  <pre>
179explicit char_separator(const Char* dropped_delims,
180                        const Char* kept_delims = "",
181                        empty_token_policy empty_tokens = drop_empty_tokens)
182</pre>
183
184  <p>This creates a <tt>char_separator</tt> object, which can then be used to
185  create a <a href="token_iterator.htm"><tt>token_iterator</tt></a> or
186  <a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing. The
187  <tt>dropped_delims</tt> and <tt>kept_delims</tt> are strings of characters
188  where each character is used as delimiter during tokenizing. Whenever a
189  delimiter is seen in the input sequence, the current token is finished, and
190  a new token begins. The delimiters in <tt>dropped_delims</tt> do not show
191  up as tokens in the output whereas the delimiters in <tt>kept_delims</tt>
192  do show up as tokens. If <tt>empty_tokens</tt> is
193  <tt>drop_empty_tokens</tt>, then empty tokens will not show up in the
194  output. If <tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty
195  tokens will show up in the output.</p>
196  <hr>
197  <pre>
198explicit char_separator()
199</pre>
200
201  <p>The function <tt>std::isspace()</tt> is used to identify dropped
202  delimiters and <tt>std::ispunct()</tt> is used to identify kept delimiters.
203  In addition, empty tokens are dropped.</p>
204  <hr>
205  <pre>
206template &lt;typename InputIterator, typename Token&gt;
207bool operator()(InputIterator&amp; next, InputIterator end, Token&amp; tok)
208</pre>
209
210  <p>This function is called by the <a href=
211  "token_iterator.htm"><tt>token_iterator</tt></a> to perform tokenizing. The
212  user typically does not call this function directly.</p>
213  <hr>
214
215  <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
216  "http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01 Transitional"
217  height="31" width="88"></a></p>
218
219  <p>Revised
220  <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
221  December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
222
223  <p><i>Copyright &copy; 2001-2002 Jeremy Siek and John R. Bandela</i></p>
224
225  <p><i>Distributed under the Boost Software License, Version 1.0. (See
226  accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
227  copy at <a href=
228  "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
229</body>
230</html>
Note: See TracBrowser for help on using the repository browser.