1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
---|
2 | <html> |
---|
3 | <head> |
---|
4 | <title>Boost.Regex: POSIX API Compatibility Functions</title> |
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
6 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
---|
7 | </head> |
---|
8 | <body> |
---|
9 | <P> |
---|
10 | <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0"> |
---|
11 | <TR> |
---|
12 | <td valign="top" width="300"> |
---|
13 | <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
---|
14 | </td> |
---|
15 | <TD width="353"> |
---|
16 | <H1 align="center">Boost.Regex</H1> |
---|
17 | <H2 align="center">POSIX API Compatibility Functions</H2> |
---|
18 | </TD> |
---|
19 | <td width="50"> |
---|
20 | <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3> |
---|
21 | </td> |
---|
22 | </TR> |
---|
23 | </TABLE> |
---|
24 | </P> |
---|
25 | <HR> |
---|
26 | <p></p> |
---|
27 | <PRE>#include <boost/cregex.hpp> |
---|
28 | <I>or</I>: |
---|
29 | #include <boost/regex.h></PRE> |
---|
30 | <P>The following functions are available for users who need a POSIX compatible C |
---|
31 | library, they are available in both Unicode and narrow character versions, the |
---|
32 | standard POSIX API names are macros that expand to one version or the other |
---|
33 | depending upon whether UNICODE is defined or not. |
---|
34 | </P> |
---|
35 | <P><B>Important</B>: Note that all the symbols defined here are enclosed inside |
---|
36 | namespace <I>boost</I> when used in C++ programs, unless you use #include |
---|
37 | <boost/regex.h> instead - in which case the symbols are still defined in |
---|
38 | namespace boost, but are made available in the global namespace as well.</P> |
---|
39 | <P>The functions are defined as: |
---|
40 | </P> |
---|
41 | <PRE>extern "C" { |
---|
42 | <B>int</B> regcompA(regex_tA*, <B>const</B> <B>char</B>*, <B>int</B>); |
---|
43 | <B>unsigned</B> <B>int</B> regerrorA(<B>int</B>, <B>const</B> regex_tA*, <B>char</B>*, <B>unsigned</B> <B>int</B>); |
---|
44 | <B>int</B> regexecA(<B>const</B> regex_tA*, <B>const</B> <B>char</B>*, <B>unsigned</B> <B>int</B>, regmatch_t*, <B>int</B>); |
---|
45 | <B>void</B> regfreeA(regex_tA*); |
---|
46 | |
---|
47 | <B>int</B> regcompW(regex_tW*, <B>const</B> <B>wchar_t</B>*, <B>int</B>); |
---|
48 | <B>unsigned</B> <B>int</B> regerrorW(<B>int</B>, <B>const</B> regex_tW*, <B>wchar_t</B>*, <B>unsigned</B> <B>int</B>); |
---|
49 | <B>int</B> regexecW(<B>const</B> regex_tW*, <B>const</B> <B>wchar_t</B>*, <B>unsigned</B> <B>int</B>, regmatch_t*, <B>int</B>); |
---|
50 | <B>void</B> regfreeW(regex_tW*); |
---|
51 | |
---|
52 | #ifdef UNICODE |
---|
53 | #define regcomp regcompW |
---|
54 | #define regerror regerrorW |
---|
55 | #define regexec regexecW |
---|
56 | #define regfree regfreeW |
---|
57 | #define regex_t regex_tW |
---|
58 | #else |
---|
59 | #define regcomp regcompA |
---|
60 | #define regerror regerrorA |
---|
61 | #define regexec regexecA |
---|
62 | #define regfree regfreeA |
---|
63 | #define regex_t regex_tA |
---|
64 | #endif |
---|
65 | }</PRE> |
---|
66 | <P>All the functions operate on structure <B>regex_t</B>, which exposes two public |
---|
67 | members: |
---|
68 | </P> |
---|
69 | <P><B>unsigned int re_nsub</B> this is filled in by <B>regcomp</B> and indicates |
---|
70 | the number of sub-expressions contained in the regular expression. |
---|
71 | </P> |
---|
72 | <P><B>const TCHAR* re_endp</B> points to the end of the expression to compile when |
---|
73 | the flag REG_PEND is set. |
---|
74 | </P> |
---|
75 | <P><I>Footnote: regex_t is actually a #define - it is either regex_tA or regex_tW |
---|
76 | depending upon whether UNICODE is defined or not, TCHAR is either char or |
---|
77 | wchar_t again depending upon the macro UNICODE.</I> |
---|
78 | </P> |
---|
79 | <H3>regcomp</H3> |
---|
80 | <P><B>regcomp</B> takes a pointer to a <B>regex_t</B>, a pointer to the expression |
---|
81 | to compile and a flags parameter which can be a combination of: |
---|
82 | <BR> |
---|
83 | |
---|
84 | </P> |
---|
85 | <P> |
---|
86 | <TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0"> |
---|
87 | <TR> |
---|
88 | <TD width="5%"> </TD> |
---|
89 | <TD vAlign="top" width="45%">REG_EXTENDED</TD> |
---|
90 | <TD vAlign="top" width="45%">Compiles modern regular expressions. Equivalent to |
---|
91 | regbase::char_classes | regbase::intervals | regbase::bk_refs.</TD> |
---|
92 | <TD width="5%"> </TD> |
---|
93 | </TR> |
---|
94 | <TR> |
---|
95 | <TD width="5%"> </TD> |
---|
96 | <TD vAlign="top" width="45%">REG_BASIC</TD> |
---|
97 | <TD vAlign="top" width="45%">Compiles basic (obsolete) regular expression syntax. |
---|
98 | Equivalent to regbase::char_classes | regbase::intervals | regbase::limited_ops |
---|
99 | | regbase::bk_braces | regbase::bk_parens | regbase::bk_refs.</TD> |
---|
100 | <TD width="5%"> </TD> |
---|
101 | </TR> |
---|
102 | <TR> |
---|
103 | <TD width="5%"> </TD> |
---|
104 | <TD vAlign="top" width="45%">REG_NOSPEC</TD> |
---|
105 | <TD vAlign="top" width="45%">All characters are ordinary, the expression is a |
---|
106 | literal string.</TD> |
---|
107 | <TD width="5%"> </TD> |
---|
108 | </TR> |
---|
109 | <TR> |
---|
110 | <TD width="5%"> </TD> |
---|
111 | <TD vAlign="top" width="45%">REG_ICASE</TD> |
---|
112 | <TD vAlign="top" width="45%">Compiles for matching that ignores character case.</TD> |
---|
113 | <TD width="5%"> </TD> |
---|
114 | </TR> |
---|
115 | <TR> |
---|
116 | <TD width="5%"> </TD> |
---|
117 | <TD vAlign="top" width="45%">REG_NOSUB</TD> |
---|
118 | <TD vAlign="top" width="45%">Has no effect in this library.</TD> |
---|
119 | <TD width="5%"> </TD> |
---|
120 | </TR> |
---|
121 | <TR> |
---|
122 | <TD width="5%"> </TD> |
---|
123 | <TD vAlign="top" width="45%">REG_NEWLINE</TD> |
---|
124 | <TD vAlign="top" width="45%">When this flag is set a dot does not match the |
---|
125 | newline character.</TD> |
---|
126 | <TD width="5%"> </TD> |
---|
127 | </TR> |
---|
128 | <TR> |
---|
129 | <TD width="5%"> </TD> |
---|
130 | <TD vAlign="top" width="45%">REG_PEND</TD> |
---|
131 | <TD vAlign="top" width="45%">When this flag is set the re_endp parameter of the |
---|
132 | regex_t structure must point to the end of the regular expression to compile.</TD> |
---|
133 | <TD width="5%"> </TD> |
---|
134 | </TR> |
---|
135 | <TR> |
---|
136 | <TD width="5%"> </TD> |
---|
137 | <TD vAlign="top" width="45%">REG_NOCOLLATE</TD> |
---|
138 | <TD vAlign="top" width="45%">When this flag is set then locale dependent collation |
---|
139 | for character ranges is turned off.</TD> |
---|
140 | <TD width="5%"> </TD> |
---|
141 | </TR> |
---|
142 | <TR> |
---|
143 | <TD width="5%"> </TD> |
---|
144 | <TD vAlign="top" width="45%">REG_ESCAPE_IN_LISTS<BR> |
---|
145 | , , , |
---|
146 | </TD> |
---|
147 | <TD vAlign="top" width="45%">When this flag is set, then escape sequences are |
---|
148 | permitted in bracket expressions (character sets).</TD> |
---|
149 | <TD width="5%"> </TD> |
---|
150 | </TR> |
---|
151 | <TR> |
---|
152 | <TD width="5%"> </TD> |
---|
153 | <TD vAlign="top" width="45%">REG_NEWLINE_ALT </TD> |
---|
154 | <TD vAlign="top" width="45%">When this flag is set then the newline character is |
---|
155 | equivalent to the alternation operator |.</TD> |
---|
156 | <TD width="5%"> </TD> |
---|
157 | </TR> |
---|
158 | <TR> |
---|
159 | <TD width="5%"> </TD> |
---|
160 | <TD vAlign="top" width="45%">REG_PERL </TD> |
---|
161 | <TD vAlign="top" width="45%">Compiles Perl like regular expressions.</TD> |
---|
162 | <TD width="5%"> </TD> |
---|
163 | </TR> |
---|
164 | <TR> |
---|
165 | <TD width="5%"> </TD> |
---|
166 | <TD vAlign="top" width="45%">REG_AWK</TD> |
---|
167 | <TD vAlign="top" width="45%">A shortcut for awk-like behavior: REG_EXTENDED | |
---|
168 | REG_ESCAPE_IN_LISTS</TD> |
---|
169 | <TD width="5%"> </TD> |
---|
170 | </TR> |
---|
171 | <TR> |
---|
172 | <TD width="5%"> </TD> |
---|
173 | <TD vAlign="top" width="45%">REG_GREP</TD> |
---|
174 | <TD vAlign="top" width="45%">A shortcut for grep like behavior: REG_BASIC | |
---|
175 | REG_NEWLINE_ALT</TD> |
---|
176 | <TD width="5%"> </TD> |
---|
177 | </TR> |
---|
178 | <TR> |
---|
179 | <TD width="5%"> </TD> |
---|
180 | <TD vAlign="top" width="45%">REG_EGREP</TD> |
---|
181 | <TD vAlign="top" width="45%"> A shortcut for egrep like behavior: |
---|
182 | REG_EXTENDED | REG_NEWLINE_ALT</TD> |
---|
183 | <TD width="5%"> </TD> |
---|
184 | </TR> |
---|
185 | </TABLE> |
---|
186 | </P> |
---|
187 | <H3>regerror</H3> |
---|
188 | <P>regerror takes the following parameters, it maps an error code to a human |
---|
189 | readable string: |
---|
190 | <BR> |
---|
191 | </P> |
---|
192 | <P> |
---|
193 | <TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0"> |
---|
194 | <TR> |
---|
195 | <TD width="5%"> </TD> |
---|
196 | <TD vAlign="top" width="50%">int code</TD> |
---|
197 | <TD vAlign="top" width="50%">The error code.</TD> |
---|
198 | <TD width="5%"> </TD> |
---|
199 | </TR> |
---|
200 | <TR> |
---|
201 | <TD> </TD> |
---|
202 | <TD vAlign="top" width="50%">const regex_t* e</TD> |
---|
203 | <TD vAlign="top" width="50%">The regular expression (can be null).</TD> |
---|
204 | <TD> </TD> |
---|
205 | </TR> |
---|
206 | <TR> |
---|
207 | <TD> </TD> |
---|
208 | <TD vAlign="top" width="50%">char* buf</TD> |
---|
209 | <TD vAlign="top" width="50%">The buffer to fill in with the error message.</TD> |
---|
210 | <TD> </TD> |
---|
211 | </TR> |
---|
212 | <TR> |
---|
213 | <TD> </TD> |
---|
214 | <TD vAlign="top" width="50%">unsigned int buf_size</TD> |
---|
215 | <TD vAlign="top" width="50%">The length of buf.</TD> |
---|
216 | <TD> </TD> |
---|
217 | </TR> |
---|
218 | </TABLE> |
---|
219 | </P> |
---|
220 | <P>If the error code is OR'ed with REG_ITOA then the message that results is the |
---|
221 | printable name of the code rather than a message, for example "REG_BADPAT". If |
---|
222 | the code is REG_ATIO then <B>e</B> must not be null and <B>e->re_pend</B> must |
---|
223 | point to the printable name of an error code, the return value is then the |
---|
224 | value of the error code. For any other value of <B>code</B>, the return value |
---|
225 | is the number of characters in the error message, if the return value is |
---|
226 | greater than or equal to <B>buf_size</B> then <B>regerror</B> will have to be |
---|
227 | called again with a larger buffer.</P> |
---|
228 | <H3>regexec</H3> |
---|
229 | <P><B>regexec</B> finds the first occurrence of expression <B>e</B> within string <B>buf</B>. |
---|
230 | If <B>len</B> is non-zero then *<B>m</B> is filled in with what matched the |
---|
231 | regular expression, <B>m[0]</B> contains what matched the whole string, <B>m[1] </B> |
---|
232 | the first sub-expression etc, see <B>regmatch_t</B> in the header file |
---|
233 | declaration for more details. The <B>eflags</B> parameter can be a combination |
---|
234 | of: |
---|
235 | <BR> |
---|
236 | |
---|
237 | </P> |
---|
238 | <P> |
---|
239 | <TABLE id="Table4" cellSpacing="0" cellPadding="7" width="100%" border="0"> |
---|
240 | <TR> |
---|
241 | <TD width="5%"> </TD> |
---|
242 | <TD vAlign="top" width="50%">REG_NOTBOL</TD> |
---|
243 | <TD vAlign="top" width="50%">Parameter <B>buf </B>does not represent the start of |
---|
244 | a line.</TD> |
---|
245 | <TD width="5%"> </TD> |
---|
246 | </TR> |
---|
247 | <TR> |
---|
248 | <TD> </TD> |
---|
249 | <TD vAlign="top" width="50%">REG_NOTEOL</TD> |
---|
250 | <TD vAlign="top" width="50%">Parameter <B>buf</B> does not terminate at the end of |
---|
251 | a line.</TD> |
---|
252 | <TD> </TD> |
---|
253 | </TR> |
---|
254 | <TR> |
---|
255 | <TD> </TD> |
---|
256 | <TD vAlign="top" width="50%">REG_STARTEND</TD> |
---|
257 | <TD vAlign="top" width="50%">The string searched starts at buf + pmatch[0].rm_so |
---|
258 | and ends at buf + pmatch[0].rm_eo.</TD> |
---|
259 | <TD> </TD> |
---|
260 | </TR> |
---|
261 | </TABLE> |
---|
262 | </P> |
---|
263 | <H3>regfree</H3> |
---|
264 | <P>Finally <B>regfree</B> frees all the memory that was allocated by regcomp. |
---|
265 | </P> |
---|
266 | <P><I>Footnote: this is an abridged reference to the POSIX API functions, it is |
---|
267 | provided for compatibility with other libraries, rather than an API to be used |
---|
268 | in new code (unless you need access from a language other than C++). This |
---|
269 | version of these functions should also happily coexist with other versions, as |
---|
270 | the names used are macros that expand to the actual function names.</I> |
---|
271 | <P> |
---|
272 | <HR> |
---|
273 | <P></P> |
---|
274 | <p>Revised |
---|
275 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> |
---|
276 | 24 Oct 2003 |
---|
277 | <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> |
---|
278 | <p><i>© Copyright John Maddock 1998- |
---|
279 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> |
---|
280 | 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p> |
---|
281 | <P><I>Use, modification and distribution are subject to the Boost Software License, |
---|
282 | Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> |
---|
283 | or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> |
---|
284 | </body> |
---|
285 | </html> |
---|
286 | |
---|