Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/regex/doc/partial_matches.html @ 30

Last change on this file since 30 was 29, checked in by landauf, 17 years ago

updated boost from 1_33_1 to 1_34_1

File size: 8.9 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3   <head>
4      <title>Boost.Regex: Partial Matches</title>
5      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6      <link rel="stylesheet" type="text/css" href="../../../boost.css">
7   </head>
8   <body>
9      <P>
10         <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
11            <TR>
12               <td valign="top" width="300">
13                  <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3>
14               </td>
15               <TD width="353">
16                  <H1 align="center">Boost.Regex</H1>
17                  <H2 align="center">Partial Matches</H2>
18               </TD>
19               <td width="50">
20                  <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
21               </td>
22            </TR>
23         </TABLE>
24      </P>
25      <HR>
26      <p></p>
27      <P>The <A href="match_flag_type.html">match-flag</A> <CODE>match_partial</CODE> can
28         be passed to the following algorithms: <A href="regex_match.html">regex_match</A>,
29         <A href="regex_search.html">regex_search</A>, and <A href="regex_grep.html">regex_grep</A>,
30         and used with the iterator <A href="regex_iterator.html">regex_iterator</A>.
31         When used it indicates that partial as well as full matches should be found. A
32         partial match is one that matched one or more characters at the end of the text
33         input, but did not match all of the regular expression (although it may have
34         done so had more input been available). Partial matches are typically used when
35         either validating data input (checking each character as it is entered on the
36         keyboard), or when searching texts that are either too long to load into memory
37         (or even into a memory mapped file), or are of indeterminate length (for
38         example the source may be a socket or similar). Partial and full matches can be
39         differentiated as shown in the following table (the variable M represents an
40         instance of <A href="match_results.html">match_results&lt;&gt;</A> as filled in
41         by regex_match, regex_search or regex_grep):<BR>
42      </P>
43      <P>
44         <TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
45            <TR>
46               <TD vAlign="top" width="20%">&nbsp;</TD>
47               <TD vAlign="top" width="20%">Result</TD>
48               <TD vAlign="top" width="20%">M[0].matched</TD>
49               <TD vAlign="top" width="20%">M[0].first</TD>
50               <TD vAlign="top" width="20%">M[0].second</TD>
51            </TR>
52            <TR>
53               <TD vAlign="top" width="20%">No match</TD>
54               <TD vAlign="top" width="20%">False</TD>
55               <TD vAlign="top" width="20%">Undefined</TD>
56               <TD vAlign="top" width="20%">Undefined</TD>
57               <TD vAlign="top" width="20%">Undefined</TD>
58            </TR>
59            <TR>
60               <TD vAlign="top" width="20%">Partial match</TD>
61               <TD vAlign="top" width="20%">True</TD>
62               <TD vAlign="top" width="20%">False</TD>
63               <TD vAlign="top" width="20%">Start of partial match.</TD>
64               <TD vAlign="top" width="20%">End of partial match (end of text).</TD>
65            </TR>
66            <TR>
67               <TD vAlign="top" width="20%">Full match</TD>
68               <TD vAlign="top" width="20%">True</TD>
69               <TD vAlign="top" width="20%">True</TD>
70               <TD vAlign="top" width="20%">Start of full match.</TD>
71               <TD vAlign="top" width="20%">End of full match.</TD>
72            </TR>
73         </TABLE>
74      </P>
75      <P>Be aware that using partial matches can sometimes result in somewhat imperfect
76         behavior:</P>
77      <UL>
78         <LI>
79            There are some expressions, such as ".*abc" that will always produce a partial
80            match.&nbsp; This problem can be reduced by careful construction of the regular
81            expressions used, or by setting flags like match_not_dot_newline so that
82            expressions like .* can't match past line boundaries.</LI>
83         <LI>
84            Boost.Regex currently prefers leftmost matches to full matches, so for example
85            matching "abc|b" against "ab" produces a partial match&nbsp;against the "ab"
86            rather than a full match against "b".&nbsp; It's more efficient to work this
87            way, but may not be the behavior you want in all situations.</LI></UL>
88      <P>The following <A href="../example/snippets/partial_regex_match.cpp">example</A> 
89         tests to see whether the text could be a valid credit card number, as the user
90         presses a key, the character entered would be added to the string being built
91         up, and passed to <CODE>is_possible_card_number</CODE>. If this returns true
92         then the text could be a valid card number, so the user interface's OK button
93         would be enabled. If it returns false, then this is not yet a valid card
94         number, but could be with more input, so the user interface would disable the
95         OK button. Finally, if the procedure throws an exception the input could never
96         become a valid number, and the inputted character must be discarded, and a
97         suitable error indication displayed to the user.</P>
98      <PRE>#include &lt;string&gt;
99#include &lt;iostream&gt;
100#include &lt;boost/regex.hpp&gt;
101
102boost::regex e("(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})");
103
104bool is_possible_card_number(const std::string&amp; input)
105{
106   //
107   // return false for partial match, true for full match, or throw for
108   // impossible match based on what we have so far...
109   boost::match_results&lt;std::string::const_iterator&gt; what;
110   if(0 == boost::regex_match(input, what, e, boost::match_default | boost::match_partial))
111   {
112      // the input so far could not possibly be valid so reject it:
113      throw std::runtime_error("Invalid data entered - this could not possibly be a valid card number");
114   }
115   // OK so far so good, but have we finished?
116   if(what[0].matched)
117   {
118      // excellent, we have a result:
119      return true;
120   }
121   // what we have so far is only a partial match...
122   return false;
123}</PRE>
124      <P>In the following <A href="../example/snippets/partial_regex_grep.cpp">example</A>,
125         text input is taken from a stream containing an unknown amount of text; this
126         example simply counts the number of html tags encountered in the stream. The
127         text is loaded into a buffer and searched a part at a time, if a partial match
128         was encountered, then the partial match gets searched a second time as the
129         start of the next batch of text:</P>
130      <PRE>#include &lt;iostream&gt;
131#include &lt;fstream&gt;
132#include &lt;sstream&gt;
133#include &lt;string&gt;
134#include &lt;boost/regex.hpp&gt;
135
136// match some kind of html tag:
137boost::regex e("&lt;[^&gt;]*&gt;");
138// count how many:
139unsigned int tags = 0;
140// saved position of partial match:
141char* next_pos = 0;
142
143bool grep_callback(const boost::match_results&lt;char*&gt;&amp; m)
144{
145   if(m[0].matched == false)
146   {
147      // save position and return:
148      next_pos = m[0].first;
149   }
150   else
151      ++tags;
152   return true;
153}
154
155void search(std::istream&amp; is)
156{
157   char buf[4096];
158   next_pos = buf + sizeof(buf);
159   bool have_more = true;
160   while(have_more)
161   {
162      // how much do we copy forward from last try:
163      unsigned leftover = (buf + sizeof(buf)) - next_pos;
164      // and how much is left to fill:
165      unsigned size = next_pos - buf;
166      // copy forward whatever we have left:
167      memcpy(buf, next_pos, leftover);
168      // fill the rest from the stream:
169      unsigned read = is.readsome(buf + leftover, size);
170      // check to see if we've run out of text:
171      have_more = read == size;
172      // reset next_pos:
173      next_pos = buf + sizeof(buf);
174      // and then grep:
175      boost::regex_grep(grep_callback,
176                        buf,
177                        buf + read + leftover,
178                        e,
179                        boost::match_default | boost::match_partial);
180   }
181}</PRE>
182      <P>
183         <HR>
184      <P></P>
185      <p>Revised
186         <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> 
187         24 Oct 2003
188         <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
189      <p><i>© Copyright John Maddock&nbsp;1998-
190            <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->  2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
191      <P><I>Use, modification and distribution are subject to the Boost Software License,
192            Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
193            or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
194   </body>
195</html>
Note: See TracBrowser for help on using the repository browser.