Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/boost_1_34_1/libs/regex/doc/captures.html @ 45

Last change on this file since 45 was 29, checked in by landauf, 16 years ago
updated boost from 1_33_1 to 1_34_1
File size: 10.7 KB

Line
1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2	<html>
3	<head>
4	<title>Boost.Regex: Understanding Captures</title>
5	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6	<link rel="stylesheet" type="text/css" href="../../../boost.css">
7	</head>
8	<body>
9	<P>
10	<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
11	<TR>
12	<td valign="top" width="300">
13	<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3>
14	</td>
15	<TD width="353">
16	<H1 align="center">Boost.Regex</H1>
17	<H2 align="center">Understanding Captures</H2>
18	</TD>
19	<td width="50">
20	<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
21	</td>
22	</TR>
23	</TABLE>
24	</P>
25	<HR>
26	<p></p>
27	<P>Captures are the iterator ranges that are "captured" by marked sub-expressions
28	as a regular expression gets matched.  Each marked sub-expression can
29	result in more than one capture, if it is matched more than once.  This
30	document explains how captures and marked sub-expressions in Boost.Regex are
31	represented and accessed.</P>
32	<H2>Marked sub-expressions</H2>
33	<P>Every time a Perl regular expression contains a parenthesis group (), it spits
34	out an extra field, known as a marked sub-expression, for example the
35	expression:</P>
36	<PRE>(\w+)\W+(\w+)</PRE>
37	<P>
38	Has two marked sub-expressions (known as $1 and $2 respectively), in addition
39	the complete match is known as $&, everything before the first match as $`,
40	and everything after the match as $'.  So if the above expression is
41	searched for within "@abc def--", then we obtain:</P>
42	<BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
43	<P>
44	<TABLE id="Table2" cellSpacing="1" cellPadding="1" width="300" border="0">
45	<TR>
46	<TD>
47	<P dir="ltr" style="MARGIN-RIGHT: 0px">$`</P>
48	</TD>
49	<TD>"@"</TD>
50	</TR>
51	<TR>
52	<TD>$&</TD>
53	<TD>"abc def"</TD>
54	</TR>
55	<TR>
56	<TD>$1</TD>
57	<TD>"abc"</TD>
58	</TR>
59	<TR>
60	<TD>$2</TD>
61	<TD>"def"</TD>
62	</TR>
63	<TR>
64	<TD>$'</TD>
65	<TD>"--"</TD>
66	</TR>
67	</TABLE>
68	</P>
69	</BLOCKQUOTE>
70	<P>In Boost.regex all these are accessible via the <A href="match_results.html">match_results</A>
71	class that gets filled in when calling one of the matching algorithms (<A href="regex_search.html">regex_search</A>,
72	<A href="regex_match.html">regex_match</A>, or <A href="regex_iterator.html">regex_iterator</A>).
73	So given:</P>
74	<PRE>boost::match_results<IteratorType> m;</PRE>
75	<P>The Perl and Boost.Regex equivalents are as follows:</P>
76	<BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
77	<P>
78	<TABLE id="Table3" cellSpacing="1" cellPadding="1" width="300" border="0">
79	<TR>
80	<TD><STRONG>Perl</STRONG></TD>
81	<TD><STRONG>Boost.Regex</STRONG></TD>
82	</TR>
83	<TR>
84	<TD>$`</TD>
85	<TD>m.prefix()</TD>
86	</TR>
87	<TR>
88	<TD>$&</TD>
89	<TD>m[0]</TD>
90	</TR>
91	<TR>
92	<TD>$n</TD>
93	<TD>m[n]</TD>
94	</TR>
95	<TR>
96	<TD>$'</TD>
97	<TD>m.suffix()</TD>
98	</TR>
99	</TABLE>
100	</P>
101	</BLOCKQUOTE>
102	<P>
103	<P>In Boost.Regex each sub-expression match is represented by a <A href="sub_match.html">
104	sub_match</A> object, this is basically just a pair of iterators denoting
105	the start and end possition of the sub-expression match, but there are some
106	additional operators provided so that objects of type sub_match behave a lot
107	like a std::basic_string: for example they are implicitly <A href="sub_match.html#m3">
108	convertible to a basic_string</A>, they can be <A href="sub_match.html#o21">compared
109	to a string</A>, <A href="sub_match.html#o81">added to a string</A>, or <A href="sub_match.html#oi">
110	streamed out to an output stream</A>.</P>
111	<H2>Unmatched Sub-Expressions</H2>
112	<P>When a regular expression match is found there is no need for all of the marked
113	sub-expressions to have participated in the match, for example the expression:</P>
114	<P>(abc)\|(def)</P>
115	<P>can match either $1 or $2, but never both at the same time.  In
116	Boost.Regex you can determine which sub-expressions matched by accessing the <A href="sub_match.html#m1">
117	sub_match::matched</A> data member.</P>
118	<H2>Repeated Captures</H2>
119	<P>When a marked sub-expression is repeated, then the sub-expression gets
120	"captured" multiple times, however normally only the final capture is
121	available, for example if</P>
122	<PRE>(?:(\w+)\W+)+</PRE>
123	<P>is matched against</P>
124	<PRE>one fine day</PRE>
125	<P>Then $1 will contain the string "day", and all the previous captures will have
126	been forgotten.</P>
127	<P>However, Boost.Regex has an experimental feature that allows all the capture
128	information to be retained - this is accessed either via the <A href="match_results.html#m17">
129	match_results::captures</A> member function or the <A href="sub_match.html#m8">sub_match::captures</A>
130	member function.  These functions return a container that contains a
131	sequence of all the captures obtained during the regular expression
132	matching.  The following example program shows how this information may be
133	used:</P>
134	<PRE>#include <boost/regex.hpp>
135	#include <iostream>
136
137
138	void print_captures(const std::string& regx, const std::string& text)
139	{
140	boost::regex e(regx);
141	boost::smatch what;
142	std::cout << "Expression: \"" << regx << "\"\n";
143	std::cout << "Text: \"" << text << "\"\n";
144	if(boost::regex_match(text, what, e, boost::match_extra))
145	{
146	unsigned i, j;
147	std::cout << " Match found \n Sub-Expressions:\n";
148	for(i = 0; i < what.size(); ++i)
149	std::cout << " $" << i << " = \"" << what[i] << "\"\n";
150	std::cout << " Captures:\n";
151	for(i = 0; i < what.size(); ++i)
152	{
153	std::cout << " $" << i << " = {";
154	for(j = 0; j < what.captures(i).size(); ++j)
155	{
156	if(j)
157	std::cout << ", ";
158	else
159	std::cout << " ";
160	std::cout << "\"" << what.captures(i)[j] << "\"";
161	}
162	std::cout << " }\n";
163	}
164	}
165	else
166	{
167	std::cout << " No Match found \n";
168	}
169	}
170
171	int main(int , char* [])
172	{
173	print_captures("(([[:lower:]]+)\|([[:upper:]]+))+", "aBBcccDDDDDeeeeeeee");
174	print_captures("(.)bar\|(.)bah", "abcbar");
175	print_captures("(.)bar\|(.)bah", "abcbah");
176	print_captures("^(?:(\\w+)\|(?>\\W+))*$", "now is the time for all good men to come to the aid of the party");
177	return 0;
178	}</PRE>
179	<P>Which produces the following output:</P>
180	<PRE>Expression: "(([[:lower:]]+)\|([[:upper:]]+))+"
181	Text: "aBBcccDDDDDeeeeeeee"
182	Match found
183	Sub-Expressions:
184	$0 = "aBBcccDDDDDeeeeeeee"
185	$1 = "eeeeeeee"
186	$2 = "eeeeeeee"
187	$3 = "DDDDD"
188	Captures:
189	$0 = { "aBBcccDDDDDeeeeeeee" }
190	$1 = { "a", "BB", "ccc", "DDDDD", "eeeeeeee" }
191	$2 = { "a", "ccc", "eeeeeeee" }
192	$3 = { "BB", "DDDDD" }
193	Expression: "(.)bar\|(.)bah"
194	Text: "abcbar"
195	Match found
196	Sub-Expressions:
197	$0 = "abcbar"
198	$1 = "abc"
199	$2 = ""
200	Captures:
201	$0 = { "abcbar" }
202	$1 = { "abc" }
203	$2 = { }
204	Expression: "(.)bar\|(.)bah"
205	Text: "abcbah"
206	Match found
207	Sub-Expressions:
208	$0 = "abcbah"
209	$1 = ""
210	$2 = "abc"
211	Captures:
212	$0 = { "abcbah" }
213	$1 = { }
214	$2 = { "abc" }
215	Expression: "^(?:(\w+)\|(?>\W+))*$"
216	Text: "now is the time for all good men to come to the aid of the party"
217	Match found
218	Sub-Expressions:
219	$0 = "now is the time for all good men to come to the aid of the party"
220	$1 = "party"
221	Captures:
222	$0 = { "now is the time for all good men to come to the aid of the party" }
223	$1 = { "now", "is", "the", "time", "for", "all", "good", "men", "to", "come", "to", "the", "aid", "of", "the", "party" }
224	</PRE>
225	<P>Unfortunately enabling this feature has an impact on performance (even if you
226	don't use it), and a much bigger impact if you do use it, therefore to use this
227	feature you need to:</P>
228	<UL>
229	<LI>
230	Define BOOST_REGEX_MATCH_EXTRA for all translation units including the library
231	source (the best way to do this is to uncomment this define in <A href="../../../boost/regex/user.hpp">
232	boost/regex/user.hpp</A>
233	and then rebuild everything.
234	<LI>
235	Pass the <A href="match_flag_type.html">match_extra flag</A> to the particular
236	algorithms where you actually need the captures information (<A href="regex_search.html">regex_search</A>,
237	<A href="regex_match.html">regex_match</A>, or <A href="regex_iterator.html">regex_iterator</A>).
238	</LI>
239	</UL>
240	<P>
241	<HR>
242	<P></P>
243	<P></P>
244	<p>Revised
245	<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
246	12 Dec 2003
247	<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
248	<p><i>© Copyright John Maddock
249	<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
250	<P><I>Use, modification and distribution are subject to the Boost Software License,
251	Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
252	or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
253	</body>
254	</html>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: