[29] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
---|
| 2 | <html> |
---|
| 3 | <head> |
---|
| 4 | <title>Boost.Regex: Index</title> |
---|
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
| 6 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
---|
| 7 | </head> |
---|
| 8 | <body> |
---|
| 9 | <P> |
---|
| 10 | <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0"> |
---|
| 11 | <TR> |
---|
| 12 | <td valign="top" width="300"> |
---|
| 13 | <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
---|
| 14 | </td> |
---|
| 15 | <TD width="353"> |
---|
| 16 | <H1 align="center">Boost.Regex</H1> |
---|
| 17 | <H2 align="center">Unicode Regular Expressions.</H2> |
---|
| 18 | </TD> |
---|
| 19 | <td width="50"> |
---|
| 20 | <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3> |
---|
| 21 | </td> |
---|
| 22 | </TR> |
---|
| 23 | </TABLE> |
---|
| 24 | </P> |
---|
| 25 | <HR> |
---|
| 26 | <p></p> |
---|
| 27 | <P>There are two ways to use Boost.Regex with Unicode strings:</P> |
---|
| 28 | <H3>Rely on wchar_t</H3> |
---|
| 29 | <P>If your platform's wchar_t type can hold Unicode strings, <EM>and</EM> your |
---|
| 30 | platform's C/C++ runtime correctly handles wide character constants (when |
---|
| 31 | passed to std::iswspace std::iswlower etc), then you can use boost::wregex to |
---|
| 32 | process Unicode. However, there are several disadvantages to this |
---|
| 33 | approach:</P> |
---|
| 34 | <UL> |
---|
| 35 | <LI> |
---|
| 36 | It's not portable: there's no guarantee on the width of wchar_t, or even |
---|
| 37 | whether the runtime treats wide characters as Unicode at all, most Windows |
---|
| 38 | compilers do so, but many Unix systems do not.</LI> |
---|
| 39 | <LI> |
---|
| 40 | There's no support for Unicode-specific character classes: [[:Nd:]], [[:Po:]] |
---|
| 41 | etc.</LI> |
---|
| 42 | <LI> |
---|
| 43 | You can only search strings that are encoded as sequences of wide characters, |
---|
| 44 | it is not possible to search UTF-8, or even UTF-16 on many platforms.</LI></UL> |
---|
| 45 | <H3>Use a Unicode Aware Regular Expression Type.</H3> |
---|
| 46 | <P>If you have the <A href="http://www.ibm.com/software/globalization/icu/">ICU |
---|
| 47 | library</A>, then Boost.Regex can be <A href="install.html#unicode">configured |
---|
| 48 | to make use of it</A>, and provide a distinct regular expression type |
---|
| 49 | (boost::u32regex), that supports both Unicode specific character properties, |
---|
| 50 | and the searching of text that is encoded in either UTF-8, UTF-16, or |
---|
| 51 | UTF-32. See: <A href="icu_strings.html">ICU string class support</A>.</P> |
---|
| 52 | <P> |
---|
| 53 | <HR> |
---|
| 54 | </P> |
---|
| 55 | <P></P> |
---|
| 56 | <p>Revised |
---|
| 57 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> |
---|
| 58 | 04 Jan 2005 |
---|
| 59 | <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> |
---|
| 60 | <p><i>© Copyright John Maddock 2005</i></p> |
---|
| 61 | <P><I>Use, modification and distribution are subject to the Boost Software License, |
---|
| 62 | Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> |
---|
| 63 | or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> |
---|
| 64 | </body> |
---|
| 65 | </html> |
---|
| 66 | |
---|