[29] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
---|
| 2 | <html> |
---|
| 3 | <head> |
---|
| 4 | <title>Boost.Regex: FAQ</title> |
---|
| 5 | <meta name="generator" content="HTML Tidy, see www.w3.org"> |
---|
| 6 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
| 7 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
---|
| 8 | </head> |
---|
| 9 | <body> |
---|
| 10 | <p></p> |
---|
| 11 | <table id="Table1" cellspacing="1" cellpadding="1" width="100%" border="0"> |
---|
| 12 | <tr> |
---|
| 13 | <td valign="top" width="300"> |
---|
| 14 | <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
---|
| 15 | </td> |
---|
| 16 | <td width="353"> |
---|
| 17 | <h1 align="center">Boost.Regex</h1> |
---|
| 18 | <h2 align="center">FAQ</h2> |
---|
| 19 | </td> |
---|
| 20 | <td width="50"> |
---|
| 21 | <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3> |
---|
| 22 | </td> |
---|
| 23 | </tr> |
---|
| 24 | </table> |
---|
| 25 | <br> |
---|
| 26 | <br> |
---|
| 27 | <hr> |
---|
| 28 | <font color="#ff0000"><font color="#ff0000"></font></font> |
---|
| 29 | <p><font color="#ff0000"><font color="#ff0000"><font color="#ff0000"> Q. Why can't I |
---|
| 30 | use the "convenience" versions of regex_match / regex_search / regex_grep / |
---|
| 31 | regex_format / regex_merge?</font></font></font></p> |
---|
| 32 | <p>A. These versions may or may not be available depending upon the capabilities |
---|
| 33 | of your compiler, the rules determining the format of these functions are quite |
---|
| 34 | complex - and only the versions visible to a standard compliant compiler are |
---|
| 35 | given in the help. To find out what your compiler supports, run |
---|
| 36 | <boost/regex.hpp> through your C++ pre-processor, and search the output |
---|
| 37 | file for the function that you are interested in.<font color="#ff0000"><font color="#ff0000"></font></font></p> |
---|
| 38 | <p><font color="#ff0000"><font color="#ff0000">Q. I can't get regex++ to work with |
---|
| 39 | escape characters, what's going on?</font></font></p> |
---|
| 40 | <p>A. If you embed regular expressions in C++ code, then remember that escape |
---|
| 41 | characters are processed twice: once by the C++ compiler, and once by the |
---|
| 42 | regex++ expression compiler, so to pass the regular expression \d+ to regex++, |
---|
| 43 | you need to embed "\\d+" in your code. Likewise to match a literal backslash |
---|
| 44 | you will need to embed "\\\\" in your code. <font color="#ff0000"></font> |
---|
| 45 | </p> |
---|
| 46 | <p><font color="#ff0000">Q. Why does using parenthesis in a POSIX regular expression |
---|
| 47 | change the result of a match?</font></p> |
---|
| 48 | <p>For POSIX (extended and basic) regular expressions, but not for perl regexes, |
---|
| 49 | parentheses don't only mark; they determine what the best match is as well. |
---|
| 50 | When the expression is compiled as a POSIX basic or extended regex then |
---|
| 51 | Boost.regex follows the POSIX standard leftmost longest rule for determining |
---|
| 52 | what matched. So if there is more than one possible match after considering the |
---|
| 53 | whole expression, it looks next at the first sub-expression and then the second |
---|
| 54 | sub-expression and so on. So...</p> |
---|
| 55 | <pre> |
---|
| 56 | "(0*)([0-9]*)" against "00123" would produce |
---|
| 57 | $1 = "00" |
---|
| 58 | $2 = "123" |
---|
| 59 | </pre> |
---|
| 60 | <p>where as</p> |
---|
| 61 | <pre> |
---|
| 62 | "0*([0-9])*" against "00123" would produce |
---|
| 63 | $1 = "00123" |
---|
| 64 | </pre> |
---|
| 65 | <p>If you think about it, had $1 only matched the "123", this would be "less good" |
---|
| 66 | than the match "00123" which is both further to the left and longer. If you |
---|
| 67 | want $1 to match only the "123" part, then you need to use something like:</p> |
---|
| 68 | <pre> |
---|
| 69 | "0*([1-9][0-9]*)" |
---|
| 70 | </pre> |
---|
| 71 | <p>as the expression.</p> |
---|
| 72 | <p><font color="#ff0000">Q. Why don't character ranges work properly (POSIX mode |
---|
| 73 | only)?</font><br> |
---|
| 74 | A. The POSIX standard specifies that character range expressions are locale |
---|
| 75 | sensitive - so for example the expression [A-Z] will match any collating |
---|
| 76 | element that collates between 'A' and 'Z'. That means that for most locales |
---|
| 77 | other than "C" or "POSIX", [A-Z] would match the single character 't' for |
---|
| 78 | example, which is not what most people expect - or at least not what most |
---|
| 79 | people have come to expect from regular expression engines. For this reason, |
---|
| 80 | the default behaviour of boost.regex (perl mode) is to turn locale sensitive |
---|
| 81 | collation off by not setting the regex_constants::collate compile time flag. |
---|
| 82 | However if you set a non-default compile time flag - for example |
---|
| 83 | regex_constants::extended or regex_constants::basic, then locale dependent |
---|
| 84 | collation will be enabled, this also applies to the POSIX API functions which |
---|
| 85 | use either regex_constants::extended or regex_constants::basic internally. <i>[Note |
---|
| 86 | - when regex_constants::nocollate in effect, the library behaves "as if" the |
---|
| 87 | LC_COLLATE locale category were always "C", regardless of what its actually set |
---|
| 88 | to - end note</i>].</p> |
---|
| 89 | <p><font color="#ff0000">Q. Why are there no throw specifications on any of the |
---|
| 90 | functions? What exceptions can the library throw?</font></p> |
---|
| 91 | <p>A. Not all compilers support (or honor) throw specifications, others support |
---|
| 92 | them but with reduced efficiency. Throw specifications may be added at a later |
---|
| 93 | date as compilers begin to handle this better. The library should throw only |
---|
| 94 | three types of exception: boost::bad_expression can be thrown by basic_regex |
---|
| 95 | when compiling a regular expression, std::runtime_error can be thrown when a |
---|
| 96 | call to basic_regex::imbue tries to open a message catalogue that doesn't |
---|
| 97 | exist, or when a call to regex_search or regex_match results in an |
---|
| 98 | "everlasting" search, or when a call to RegEx::GrepFiles or |
---|
| 99 | RegEx::FindFiles tries to open a file that cannot be opened, finally |
---|
| 100 | std::bad_alloc can be thrown by just about any of the functions in this |
---|
| 101 | library.</p> |
---|
| 102 | <p></p> |
---|
| 103 | <hr> |
---|
| 104 | <p>Revised |
---|
| 105 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> |
---|
| 106 | 24 Oct 2003 |
---|
| 107 | <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> |
---|
| 108 | <p><i>© Copyright John Maddock 1998- |
---|
| 109 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p> |
---|
| 110 | <P><I>Use, modification and distribution are subject to the Boost Software License, |
---|
| 111 | Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> |
---|
| 112 | or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> |
---|
| 113 | </body> |
---|
| 114 | </html> |
---|