Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/regex/doc/unicode.html @ 45

Last change on this file since 45 was 29, checked in by landauf, 16 years ago

updated boost from 1_33_1 to 1_34_1

File size: 3.2 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3   <head>
4      <title>Boost.Regex: Index</title>
5      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6      <link rel="stylesheet" type="text/css" href="../../../boost.css">
7   </head>
8   <body>
9      <P>
10         <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
11            <TR>
12               <td valign="top" width="300">
13                  <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3>
14               </td>
15               <TD width="353">
16                  <H1 align="center">Boost.Regex</H1>
17                  <H2 align="center">Unicode Regular Expressions.</H2>
18               </TD>
19               <td width="50">
20                  <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
21               </td>
22            </TR>
23         </TABLE>
24      </P>
25      <HR>
26      <p></p>
27      <P>There are two ways to use Boost.Regex with Unicode strings:</P>
28      <H3>Rely on wchar_t</H3>
29      <P>If your platform's wchar_t type can hold Unicode strings, <EM>and</EM> your
30         platform's C/C++ runtime correctly handles wide character constants (when
31         passed to std::iswspace std::iswlower etc), then you can use boost::wregex to
32         process Unicode.&nbsp; However, there are several disadvantages to this
33         approach:</P>
34      <UL>
35         <LI>
36            It's not portable: there's no guarantee on the width of wchar_t, or even
37            whether the runtime treats wide characters as Unicode at all, most Windows
38            compilers do so, but many Unix systems do not.</LI>
39         <LI>
40            There's no support for Unicode-specific character classes: [[:Nd:]], [[:Po:]]
41            etc.</LI>
42         <LI>
43            You can only search strings that are encoded as sequences of wide characters,
44            it is not possible to search UTF-8, or even UTF-16 on many platforms.</LI></UL>
45      <H3>Use a Unicode Aware Regular Expression Type.</H3>
46      <P>If you have the <A href="http://www.ibm.com/software/globalization/icu/">ICU
47            library</A>, then Boost.Regex can be <A href="install.html#unicode">configured
48            to make use of it</A>, and provide a distinct regular expression type
49         (boost::u32regex), that supports both Unicode specific character properties,
50         and the searching of text that is encoded in either UTF-8, UTF-16, or
51         UTF-32.&nbsp; See: <A href="icu_strings.html">ICU string class support</A>.</P>
52      <P>
53         <HR>
54      </P>
55      <P></P>
56      <p>Revised&nbsp; 
57         <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> 
58         04 Jan 2005&nbsp; 
59         <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
60      <p><i>© Copyright John Maddock&nbsp;2005</i></p>
61      <P><I>Use, modification and distribution are subject to the Boost Software License,
62            Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
63            or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
64   </body>
65</html>
66
Note: See TracBrowser for help on using the repository browser.