1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
---|
2 | <html> |
---|
3 | <head> |
---|
4 | <title>Boost.Regex: Index</title> |
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
6 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
---|
7 | </head> |
---|
8 | <body> |
---|
9 | <P> |
---|
10 | <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0"> |
---|
11 | <TR> |
---|
12 | <td valign="top" width="300"> |
---|
13 | <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
---|
14 | </td> |
---|
15 | <TD width="353"> |
---|
16 | <H1 align="center">Boost.Regex</H1> |
---|
17 | <H2 align="center">Unicode Regular Expressions.</H2> |
---|
18 | </TD> |
---|
19 | <td width="50"> |
---|
20 | <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3> |
---|
21 | </td> |
---|
22 | </TR> |
---|
23 | </TABLE> |
---|
24 | </P> |
---|
25 | <HR> |
---|
26 | <p></p> |
---|
27 | <P>There are two ways to use Boost.Regex with Unicode strings:</P> |
---|
28 | <H3>Rely on wchar_t</H3> |
---|
29 | <P>If your platform's wchar_t type can hold Unicode strings, <EM>and</EM> your |
---|
30 | platform's C/C++ runtime correctly handles wide character constants (when |
---|
31 | passed to std::iswspace std::iswlower etc), then you can use boost::wregex to |
---|
32 | process Unicode. However, there are several disadvantages to this |
---|
33 | approach:</P> |
---|
34 | <UL> |
---|
35 | <LI> |
---|
36 | It's not portable: there's no guarantee on the width of wchar_t, or even |
---|
37 | whether the runtime treats wide characters as Unicode at all, most Windows |
---|
38 | compilers do so, but many Unix systems do not.</LI> |
---|
39 | <LI> |
---|
40 | There's no support for Unicode-specific character classes: [[:Nd:]], [[:Po:]] |
---|
41 | etc.</LI> |
---|
42 | <LI> |
---|
43 | You can only search strings that are encoded as sequences of wide characters, |
---|
44 | it is not possible to search UTF-8, or even UTF-16 on many platforms.</LI></UL> |
---|
45 | <H3>Use a Unicode Aware Regular Expression Type.</H3> |
---|
46 | <P>If you have the <A href="http://www.ibm.com/software/globalization/icu/">ICU |
---|
47 | library</A>, then Boost.Regex can be <A href="install.html#unicode">configured |
---|
48 | to make use of it</A>, and provide a distinct regular expression type |
---|
49 | (boost::u32regex), that supports both Unicode specific character properties, |
---|
50 | and the searching of text that is encoded in either UTF-8, UTF-16, or |
---|
51 | UTF-32. See: <A href="icu_strings.html">ICU string class support</A>.</P> |
---|
52 | <P> |
---|
53 | <HR> |
---|
54 | </P> |
---|
55 | <P></P> |
---|
56 | <p>Revised |
---|
57 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> |
---|
58 | 04 Jan 2005 |
---|
59 | <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> |
---|
60 | <p><i>© Copyright John Maddock 2005</i></p> |
---|
61 | <P><I>Use, modification and distribution are subject to the Boost Software License, |
---|
62 | Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> |
---|
63 | or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> |
---|
64 | </body> |
---|
65 | </html> |
---|
66 | |
---|