Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/boost_1_34_1/libs/regex/doc/captures.html @ 45

Last change on this file since 45 was 29, checked in by landauf, 16 years ago
updated boost from 1_33_1 to 1_34_1
File size: 10.7 KB

Rev	Line
[29]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
	2	<html>
	3	<head>
	4	<title>Boost.Regex: Understanding Captures</title>
	5	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
	6	<link rel="stylesheet" type="text/css" href="../../../boost.css">
	7	</head>
	8	<body>
	9	<P>
	10	<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
	11	<TR>
	12	<td valign="top" width="300">
	13	<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3>
	14	</td>
	15	<TD width="353">
	16	<H1 align="center">Boost.Regex</H1>
	17	<H2 align="center">Understanding Captures</H2>
	18	</TD>
	19	<td width="50">
	20	<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
	21	</td>
	22	</TR>
	23	</TABLE>
	24	</P>
	25	<HR>
	26	<p></p>
	27	<P>Captures are the iterator ranges that are "captured" by marked sub-expressions
	28	as a regular expression gets matched.  Each marked sub-expression can
	29	result in more than one capture, if it is matched more than once.  This
	30	document explains how captures and marked sub-expressions in Boost.Regex are
	31	represented and accessed.</P>
	32	<H2>Marked sub-expressions</H2>
	33	<P>Every time a Perl regular expression contains a parenthesis group (), it spits
	34	out an extra field, known as a marked sub-expression, for example the
	35	expression:</P>
	36	<PRE>(\w+)\W+(\w+)</PRE>
	37	<P>
	38	Has two marked sub-expressions (known as $1 and $2 respectively), in addition
	39	the complete match is known as $&, everything before the first match as $`,
	40	and everything after the match as $'.  So if the above expression is
	41	searched for within "@abc def--", then we obtain:</P>
	42	<BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
	43	<P>
	44	<TABLE id="Table2" cellSpacing="1" cellPadding="1" width="300" border="0">
	45	<TR>
	46	<TD>
	47	<P dir="ltr" style="MARGIN-RIGHT: 0px">$`</P>
	48	</TD>
	49	<TD>"@"</TD>
	50	</TR>
	51	<TR>
	52	<TD>$&</TD>
	53	<TD>"abc def"</TD>
	54	</TR>
	55	<TR>
	56	<TD>$1</TD>
	57	<TD>"abc"</TD>
	58	</TR>
	59	<TR>
	60	<TD>$2</TD>
	61	<TD>"def"</TD>
	62	</TR>
	63	<TR>
	64	<TD>$'</TD>
	65	<TD>"--"</TD>
	66	</TR>
	67	</TABLE>
	68	</P>
	69	</BLOCKQUOTE>
	70	<P>In Boost.regex all these are accessible via the <A href="match_results.html">match_results</A>
	71	class that gets filled in when calling one of the matching algorithms (<A href="regex_search.html">regex_search</A>,
	72	<A href="regex_match.html">regex_match</A>, or <A href="regex_iterator.html">regex_iterator</A>).
	73	So given:</P>
	74	<PRE>boost::match_results<IteratorType> m;</PRE>
	75	<P>The Perl and Boost.Regex equivalents are as follows:</P>
	76	<BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
	77	<P>
	78	<TABLE id="Table3" cellSpacing="1" cellPadding="1" width="300" border="0">
	79	<TR>
	80	<TD><STRONG>Perl</STRONG></TD>
	81	<TD><STRONG>Boost.Regex</STRONG></TD>
	82	</TR>
	83	<TR>
	84	<TD>$`</TD>
	85	<TD>m.prefix()</TD>
	86	</TR>
	87	<TR>
	88	<TD>$&</TD>
	89	<TD>m[0]</TD>
	90	</TR>
	91	<TR>
	92	<TD>$n</TD>
	93	<TD>m[n]</TD>
	94	</TR>
	95	<TR>
	96	<TD>$'</TD>
	97	<TD>m.suffix()</TD>
	98	</TR>
	99	</TABLE>
	100	</P>
	101	</BLOCKQUOTE>
	102	<P>
	103	<P>In Boost.Regex each sub-expression match is represented by a <A href="sub_match.html">
	104	sub_match</A> object, this is basically just a pair of iterators denoting
	105	the start and end possition of the sub-expression match, but there are some
	106	additional operators provided so that objects of type sub_match behave a lot
	107	like a std::basic_string: for example they are implicitly <A href="sub_match.html#m3">
	108	convertible to a basic_string</A>, they can be <A href="sub_match.html#o21">compared
	109	to a string</A>, <A href="sub_match.html#o81">added to a string</A>, or <A href="sub_match.html#oi">
	110	streamed out to an output stream</A>.</P>
	111	<H2>Unmatched Sub-Expressions</H2>
	112	<P>When a regular expression match is found there is no need for all of the marked
	113	sub-expressions to have participated in the match, for example the expression:</P>
	114	<P>(abc)\|(def)</P>
	115	<P>can match either $1 or $2, but never both at the same time.  In
	116	Boost.Regex you can determine which sub-expressions matched by accessing the <A href="sub_match.html#m1">
	117	sub_match::matched</A> data member.</P>
	118	<H2>Repeated Captures</H2>
	119	<P>When a marked sub-expression is repeated, then the sub-expression gets
	120	"captured" multiple times, however normally only the final capture is
	121	available, for example if</P>
	122	<PRE>(?:(\w+)\W+)+</PRE>
	123	<P>is matched against</P>
	124	<PRE>one fine day</PRE>
	125	<P>Then $1 will contain the string "day", and all the previous captures will have
	126	been forgotten.</P>
	127	<P>However, Boost.Regex has an experimental feature that allows all the capture
	128	information to be retained - this is accessed either via the <A href="match_results.html#m17">
	129	match_results::captures</A> member function or the <A href="sub_match.html#m8">sub_match::captures</A>
	130	member function.  These functions return a container that contains a
	131	sequence of all the captures obtained during the regular expression
	132	matching.  The following example program shows how this information may be
	133	used:</P>
	134	<PRE>#include <boost/regex.hpp>
	135	#include <iostream>
	136
	137
	138	void print_captures(const std::string& regx, const std::string& text)
	139	{
	140	boost::regex e(regx);
	141	boost::smatch what;
	142	std::cout << "Expression: \"" << regx << "\"\n";
	143	std::cout << "Text: \"" << text << "\"\n";
	144	if(boost::regex_match(text, what, e, boost::match_extra))
	145	{
	146	unsigned i, j;
	147	std::cout << " Match found \n Sub-Expressions:\n";
	148	for(i = 0; i < what.size(); ++i)
	149	std::cout << " $" << i << " = \"" << what[i] << "\"\n";
	150	std::cout << " Captures:\n";
	151	for(i = 0; i < what.size(); ++i)
	152	{
	153	std::cout << " $" << i << " = {";
	154	for(j = 0; j < what.captures(i).size(); ++j)
	155	{
	156	if(j)
	157	std::cout << ", ";
	158	else
	159	std::cout << " ";
	160	std::cout << "\"" << what.captures(i)[j] << "\"";
	161	}
	162	std::cout << " }\n";
	163	}
	164	}
	165	else
	166	{
	167	std::cout << " No Match found \n";
	168	}
	169	}
	170
	171	int main(int , char* [])
	172	{
	173	print_captures("(([[:lower:]]+)\|([[:upper:]]+))+", "aBBcccDDDDDeeeeeeee");
	174	print_captures("(.)bar\|(.)bah", "abcbar");
	175	print_captures("(.)bar\|(.)bah", "abcbah");
	176	print_captures("^(?:(\\w+)\|(?>\\W+))*$", "now is the time for all good men to come to the aid of the party");
	177	return 0;
	178	}</PRE>
	179	<P>Which produces the following output:</P>
	180	<PRE>Expression: "(([[:lower:]]+)\|([[:upper:]]+))+"
	181	Text: "aBBcccDDDDDeeeeeeee"
	182	Match found
	183	Sub-Expressions:
	184	$0 = "aBBcccDDDDDeeeeeeee"
	185	$1 = "eeeeeeee"
	186	$2 = "eeeeeeee"
	187	$3 = "DDDDD"
	188	Captures:
	189	$0 = { "aBBcccDDDDDeeeeeeee" }
	190	$1 = { "a", "BB", "ccc", "DDDDD", "eeeeeeee" }
	191	$2 = { "a", "ccc", "eeeeeeee" }
	192	$3 = { "BB", "DDDDD" }
	193	Expression: "(.)bar\|(.)bah"
	194	Text: "abcbar"
	195	Match found
	196	Sub-Expressions:
	197	$0 = "abcbar"
	198	$1 = "abc"
	199	$2 = ""
	200	Captures:
	201	$0 = { "abcbar" }
	202	$1 = { "abc" }
	203	$2 = { }
	204	Expression: "(.)bar\|(.)bah"
	205	Text: "abcbah"
	206	Match found
	207	Sub-Expressions:
	208	$0 = "abcbah"
	209	$1 = ""
	210	$2 = "abc"
	211	Captures:
	212	$0 = { "abcbah" }
	213	$1 = { }
	214	$2 = { "abc" }
	215	Expression: "^(?:(\w+)\|(?>\W+))*$"
	216	Text: "now is the time for all good men to come to the aid of the party"
	217	Match found
	218	Sub-Expressions:
	219	$0 = "now is the time for all good men to come to the aid of the party"
	220	$1 = "party"
	221	Captures:
	222	$0 = { "now is the time for all good men to come to the aid of the party" }
	223	$1 = { "now", "is", "the", "time", "for", "all", "good", "men", "to", "come", "to", "the", "aid", "of", "the", "party" }
	224	</PRE>
	225	<P>Unfortunately enabling this feature has an impact on performance (even if you
	226	don't use it), and a much bigger impact if you do use it, therefore to use this
	227	feature you need to:</P>
	228	<UL>
	229	<LI>
	230	Define BOOST_REGEX_MATCH_EXTRA for all translation units including the library
	231	source (the best way to do this is to uncomment this define in <A href="../../../boost/regex/user.hpp">
	232	boost/regex/user.hpp</A>
	233	and then rebuild everything.
	234	<LI>
	235	Pass the <A href="match_flag_type.html">match_extra flag</A> to the particular
	236	algorithms where you actually need the captures information (<A href="regex_search.html">regex_search</A>,
	237	<A href="regex_match.html">regex_match</A>, or <A href="regex_iterator.html">regex_iterator</A>).
	238	</LI>
	239	</UL>
	240	<P>
	241	<HR>
	242	<P></P>
	243	<P></P>
	244	<p>Revised
	245	<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
	246	12 Dec 2003
	247	<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
	248	<p><i>© Copyright John Maddock
	249	<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
	250	<P><I>Use, modification and distribution are subject to the Boost Software License,
	251	Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
	252	or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
	253	</body>
	254	</html>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: