1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
---|
2 | <html> |
---|
3 | <head> |
---|
4 | <title>Boost.Regex: Partial Matches</title> |
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
6 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
---|
7 | </head> |
---|
8 | <body> |
---|
9 | <P> |
---|
10 | <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0"> |
---|
11 | <TR> |
---|
12 | <td valign="top" width="300"> |
---|
13 | <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
---|
14 | </td> |
---|
15 | <TD width="353"> |
---|
16 | <H1 align="center">Boost.Regex</H1> |
---|
17 | <H2 align="center">Partial Matches</H2> |
---|
18 | </TD> |
---|
19 | <td width="50"> |
---|
20 | <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3> |
---|
21 | </td> |
---|
22 | </TR> |
---|
23 | </TABLE> |
---|
24 | </P> |
---|
25 | <HR> |
---|
26 | <p></p> |
---|
27 | <P>The <A href="match_flag_type.html">match-flag</A> <CODE>match_partial</CODE> can |
---|
28 | be passed to the following algorithms: <A href="regex_match.html">regex_match</A>, |
---|
29 | <A href="regex_search.html">regex_search</A>, and <A href="regex_grep.html">regex_grep</A>, |
---|
30 | and used with the iterator <A href="regex_iterator.html">regex_iterator</A>. |
---|
31 | When used it indicates that partial as well as full matches should be found. A |
---|
32 | partial match is one that matched one or more characters at the end of the text |
---|
33 | input, but did not match all of the regular expression (although it may have |
---|
34 | done so had more input been available). Partial matches are typically used when |
---|
35 | either validating data input (checking each character as it is entered on the |
---|
36 | keyboard), or when searching texts that are either too long to load into memory |
---|
37 | (or even into a memory mapped file), or are of indeterminate length (for |
---|
38 | example the source may be a socket or similar). Partial and full matches can be |
---|
39 | differentiated as shown in the following table (the variable M represents an |
---|
40 | instance of <A href="match_results.html">match_results<></A> as filled in |
---|
41 | by regex_match, regex_search or regex_grep):<BR> |
---|
42 | </P> |
---|
43 | <P> |
---|
44 | <TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0"> |
---|
45 | <TR> |
---|
46 | <TD vAlign="top" width="20%"> </TD> |
---|
47 | <TD vAlign="top" width="20%">Result</TD> |
---|
48 | <TD vAlign="top" width="20%">M[0].matched</TD> |
---|
49 | <TD vAlign="top" width="20%">M[0].first</TD> |
---|
50 | <TD vAlign="top" width="20%">M[0].second</TD> |
---|
51 | </TR> |
---|
52 | <TR> |
---|
53 | <TD vAlign="top" width="20%">No match</TD> |
---|
54 | <TD vAlign="top" width="20%">False</TD> |
---|
55 | <TD vAlign="top" width="20%">Undefined</TD> |
---|
56 | <TD vAlign="top" width="20%">Undefined</TD> |
---|
57 | <TD vAlign="top" width="20%">Undefined</TD> |
---|
58 | </TR> |
---|
59 | <TR> |
---|
60 | <TD vAlign="top" width="20%">Partial match</TD> |
---|
61 | <TD vAlign="top" width="20%">True</TD> |
---|
62 | <TD vAlign="top" width="20%">False</TD> |
---|
63 | <TD vAlign="top" width="20%">Start of partial match.</TD> |
---|
64 | <TD vAlign="top" width="20%">End of partial match (end of text).</TD> |
---|
65 | </TR> |
---|
66 | <TR> |
---|
67 | <TD vAlign="top" width="20%">Full match</TD> |
---|
68 | <TD vAlign="top" width="20%">True</TD> |
---|
69 | <TD vAlign="top" width="20%">True</TD> |
---|
70 | <TD vAlign="top" width="20%">Start of full match.</TD> |
---|
71 | <TD vAlign="top" width="20%">End of full match.</TD> |
---|
72 | </TR> |
---|
73 | </TABLE> |
---|
74 | </P> |
---|
75 | <P>Be aware that using partial matches can sometimes result in somewhat imperfect |
---|
76 | behavior:</P> |
---|
77 | <UL> |
---|
78 | <LI> |
---|
79 | There are some expressions, such as ".*abc" that will always produce a partial |
---|
80 | match. This problem can be reduced by careful construction of the regular |
---|
81 | expressions used, or by setting flags like match_not_dot_newline so that |
---|
82 | expressions like .* can't match past line boundaries.</LI> |
---|
83 | <LI> |
---|
84 | Boost.Regex currently prefers leftmost matches to full matches, so for example |
---|
85 | matching "abc|b" against "ab" produces a partial match against the "ab" |
---|
86 | rather than a full match against "b". It's more efficient to work this |
---|
87 | way, but may not be the behavior you want in all situations.</LI></UL> |
---|
88 | <P>The following <A href="../example/snippets/partial_regex_match.cpp">example</A> |
---|
89 | tests to see whether the text could be a valid credit card number, as the user |
---|
90 | presses a key, the character entered would be added to the string being built |
---|
91 | up, and passed to <CODE>is_possible_card_number</CODE>. If this returns true |
---|
92 | then the text could be a valid card number, so the user interface's OK button |
---|
93 | would be enabled. If it returns false, then this is not yet a valid card |
---|
94 | number, but could be with more input, so the user interface would disable the |
---|
95 | OK button. Finally, if the procedure throws an exception the input could never |
---|
96 | become a valid number, and the inputted character must be discarded, and a |
---|
97 | suitable error indication displayed to the user.</P> |
---|
98 | <PRE>#include <string> |
---|
99 | #include <iostream> |
---|
100 | #include <boost/regex.hpp> |
---|
101 | |
---|
102 | boost::regex e("(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})"); |
---|
103 | |
---|
104 | bool is_possible_card_number(const std::string& input) |
---|
105 | { |
---|
106 | // |
---|
107 | // return false for partial match, true for full match, or throw for |
---|
108 | // impossible match based on what we have so far... |
---|
109 | boost::match_results<std::string::const_iterator> what; |
---|
110 | if(0 == boost::regex_match(input, what, e, boost::match_default | boost::match_partial)) |
---|
111 | { |
---|
112 | // the input so far could not possibly be valid so reject it: |
---|
113 | throw std::runtime_error("Invalid data entered - this could not possibly be a valid card number"); |
---|
114 | } |
---|
115 | // OK so far so good, but have we finished? |
---|
116 | if(what[0].matched) |
---|
117 | { |
---|
118 | // excellent, we have a result: |
---|
119 | return true; |
---|
120 | } |
---|
121 | // what we have so far is only a partial match... |
---|
122 | return false; |
---|
123 | }</PRE> |
---|
124 | <P>In the following <A href="../example/snippets/partial_regex_grep.cpp">example</A>, |
---|
125 | text input is taken from a stream containing an unknown amount of text; this |
---|
126 | example simply counts the number of html tags encountered in the stream. The |
---|
127 | text is loaded into a buffer and searched a part at a time, if a partial match |
---|
128 | was encountered, then the partial match gets searched a second time as the |
---|
129 | start of the next batch of text:</P> |
---|
130 | <PRE>#include <iostream> |
---|
131 | #include <fstream> |
---|
132 | #include <sstream> |
---|
133 | #include <string> |
---|
134 | #include <boost/regex.hpp> |
---|
135 | |
---|
136 | // match some kind of html tag: |
---|
137 | boost::regex e("<[^>]*>"); |
---|
138 | // count how many: |
---|
139 | unsigned int tags = 0; |
---|
140 | // saved position of partial match: |
---|
141 | char* next_pos = 0; |
---|
142 | |
---|
143 | bool grep_callback(const boost::match_results<char*>& m) |
---|
144 | { |
---|
145 | if(m[0].matched == false) |
---|
146 | { |
---|
147 | // save position and return: |
---|
148 | next_pos = m[0].first; |
---|
149 | } |
---|
150 | else |
---|
151 | ++tags; |
---|
152 | return true; |
---|
153 | } |
---|
154 | |
---|
155 | void search(std::istream& is) |
---|
156 | { |
---|
157 | char buf[4096]; |
---|
158 | next_pos = buf + sizeof(buf); |
---|
159 | bool have_more = true; |
---|
160 | while(have_more) |
---|
161 | { |
---|
162 | // how much do we copy forward from last try: |
---|
163 | unsigned leftover = (buf + sizeof(buf)) - next_pos; |
---|
164 | // and how much is left to fill: |
---|
165 | unsigned size = next_pos - buf; |
---|
166 | // copy forward whatever we have left: |
---|
167 | memcpy(buf, next_pos, leftover); |
---|
168 | // fill the rest from the stream: |
---|
169 | unsigned read = is.readsome(buf + leftover, size); |
---|
170 | // check to see if we've run out of text: |
---|
171 | have_more = read == size; |
---|
172 | // reset next_pos: |
---|
173 | next_pos = buf + sizeof(buf); |
---|
174 | // and then grep: |
---|
175 | boost::regex_grep(grep_callback, |
---|
176 | buf, |
---|
177 | buf + read + leftover, |
---|
178 | e, |
---|
179 | boost::match_default | boost::match_partial); |
---|
180 | } |
---|
181 | }</PRE> |
---|
182 | <P> |
---|
183 | <HR> |
---|
184 | <P></P> |
---|
185 | <p>Revised |
---|
186 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> |
---|
187 | 24 Oct 2003 |
---|
188 | <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> |
---|
189 | <p><i>© Copyright John Maddock 1998- |
---|
190 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p> |
---|
191 | <P><I>Use, modification and distribution are subject to the Boost Software License, |
---|
192 | Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> |
---|
193 | or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> |
---|
194 | </body> |
---|
195 | </html> |
---|