Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_33_1/libs/serialization/doc/codecvt.html @ 12

Last change on this file since 12 was 12, checked in by landauf, 17 years ago

added boost

File size: 5.3 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<html>
3<!--
4  == Copyright (c) 2001 Ronald Garcia
5  ==
6  == Permission to use, copy, modify, distribute and sell this software
7  == and its documentation for any purpose is hereby granted without fee,
8  == provided that the above copyright notice appears in all copies and
9  == that both that copyright notice and this permission notice appear
10  == in supporting documentation.  Ronald Garcia makes no
11  == representations about the suitability of this software for any
12  == purpose.  It is provided "as is" without express or implied warranty.
13  -->
14<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
15<link rel="stylesheet" type="text/css" href="../../../boost.css">
16<link rel="stylesheet" type="text/css" href="style.css">
17<head>
18<title>UTF-8 Codecvt Facet</title>
19
20</head>
21
22<body bgcolor="#ffffff" link="#0000ee" text="#000000" 
23      vlink="#551a8b" alink="#ff0000">
24<img src="../../../boost.png" alt="C++ Boost" 
25width="277" height="86"> <br clear="all">
26
27
28<a name="sec:utf8-codecvt-facet-class"></a>
29
30
31<h1>UTF-8 Codecvt Facet</h1>
32
33
34<pre>
35template&lt;
36    typename InternType = wchar_t,
37    typename ExternType = char
38&gt; utf8_codecvt_facet
39</pre>
40
41
42<h2>Rationale</h2>
43
44
45    UTF-8 is a method of encoding Unicode text in environments where
46    where data is stored as 8-bit characters and some ascii characters
47    are considered special (i.e. Unix filesystem filenames) and tend
48    to appear more commonly than other characters.  While
49    UTF-8 is convenient and efficient for storing data on filesystems,
50    it was not meant to be manipulated in memory by
51    applications. While some applications (such as Unix's 'cat') can
52    simply ignore the encoding of data, others should convert
53    from UTF-8 to UCS-4 (the more canonical representation of Unicode)
54    on reading from file, and reversing the process on writing out to
55    file.
56   
57    <p>The C++ Standard IOStreams provides the <tt>std::codecvt</tt>
58    facet to handle specifically these cases.  On reading from or
59    writing to a file, the <tt>std::basic_filebuf</tt> can call out to
60    the codecvt facet to convert data representations from external
61    format (ie. UTF-8) to internal format (ie. UCS-4) and
62    vice-versa. <tt>utf8_codecvt_facet</tt> is a specialization of
63    <tt>std::codecvt</tt> specifically designed to handle the case
64    of translating between UTF-8 and UCS-4.
65
66
67<h2>Template Parameters</h2>
68
69<table border summary="template parameters">
70<tr>
71<th>Parameter</th><th>Description</th><th>Default</th>
72</tr>
73
74<tr>
75<td><tt>InternType</tt></td>
76<td>The internal type used to represent UCS-4 characters.</td>
77<td><tt>wchar_t</tt></td>
78</tr>
79
80<tr>
81<td><tt>ExternType</tt></td>
82<td>The external type used to represent UTF-8 octets.</td>
83<td><tt>char_t</tt></td>
84</tr>
85</table>
86
87
88<h2>Requirements</h2>
89
90    <tt>utf8_codecvt_facet</tt> defaults to using <tt>char</tt> as
91    it's external data type and <tt>wchar_t</tt> as it's internal
92    datatype, but on some architectures <tt>wchar_t</tt> is
93    not large enough to hold UCS-4 characters.  In order to use
94    another internal type.You must also specialize <tt>std::codecvt</tt>
95    to handle your internal and external types.
96    (<tt>std::codecvt&lt;char,wchar_t,std::mbstate_t&gt;</tt> is required to be
97    supplied by any standard-conforming compiler).
98
99
100<h2>Example Use</h2>
101    The following is a simple example of using this facet:
102
103<pre>
104  //...
105  // My encoding type
106  typedef wchar_t ucs4_t;
107
108  std::locale old_locale;
109  std::locale utf8_locale(old_locale,new utf8_codecvt_facet&lt;ucs4_t&gt;);
110
111  // Set a New global locale
112  std::locale::global(utf8_locale);
113
114  // Send the UCS-4 data out, converting to UTF-8
115  {
116    std::wofstream ofs("data.ucd");
117    ofs.imbue(utf8_locale);
118    std::copy(ucs4_data.begin(),ucs4_data.end(),
119          std::ostream_iterator&lt;ucs4_t,ucs4_t&gt;(ofs));
120  }
121
122  // Read the UTF-8 data back in, converting to UCS-4 on the way in
123  std::vector&lt;ucs4_t&gt; from_file;
124  {
125    std::wifstream ifs("data.ucd");
126    ifs.imbue(utf8_locale);
127    ucs4_t item = 0;
128    while (ifs &gt;&gt; item) from_file.push_back(item);
129  }
130  //...
131</pre>
132
133
134<h2>History</h2>
135
136    This code was originally written as an iterator adaptor over
137    containers for use with UTF-8 encoded strings in memory.
138    Dietmar Kuehl suggested that it would be better provided as a
139    codecvt facet.
140
141<h2>Resources</h2>
142
143<ul>
144<li> <a href="http://www.unicode.org">Unicode Homepage</a>
145<li> <a href="http://home.CameloT.de/langer/iostreams.htm">Standard
146      C++ IOStreams and Locales</a>
147<li> <a href="http://www.research.att.com/~bs/3rd.html">The C++
148      Programming Language Special Edition, Appendix D.</a> 
149</ul>
150
151<br>
152<hr>
153<table summary="Copyright information">
154<tr valign="top">
155<td nowrap>Copyright &copy; 2001</td>
156<td><a href="http://www.osl.iu.edu/~garcia">Ronald Garcia</a>,
157Indiana University
158(<a href="mailto:garcia@cs.indiana.edu">garcia@osl.iu.edu</a>)<br>
159<a href="http://www.osl.iu.edu/~lums">Andrew Lumsdaine</a>,
160Indiana University
161(<a href="mailto:lums@osl.iu.edu">lums@osl.iu.edu</a>)</td>
162</tr>
163</table>
164<p><i>&copy; Copyright <a href="http://www.rrsd.com">Robert Ramey</a> 2002-2004.
165Distributed under the Boost Software License, Version 1.0. (See
166accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
167</i></p>
168</body>
169</html>
170
171
Note: See TracBrowser for help on using the repository browser.