[25] | 1 | '\" |
---|
| 2 | '\" Copyright (c) 1998 by Scriptics Corporation. |
---|
| 3 | '\" |
---|
| 4 | '\" See the file "license.terms" for information on usage and redistribution |
---|
| 5 | '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. |
---|
| 6 | '\" |
---|
| 7 | '\" RCS: @(#) $Id: encoding.n,v 1.15 2007/12/13 15:22:32 dgp Exp $ |
---|
| 8 | '\" |
---|
| 9 | .so man.macros |
---|
| 10 | .TH encoding n "8.1" Tcl "Tcl Built-In Commands" |
---|
| 11 | .BS |
---|
| 12 | .SH NAME |
---|
| 13 | encoding \- Manipulate encodings |
---|
| 14 | .SH SYNOPSIS |
---|
| 15 | \fBencoding \fIoption\fR ?\fIarg arg ...\fR? |
---|
| 16 | .BE |
---|
| 17 | |
---|
| 18 | .SH INTRODUCTION |
---|
| 19 | .PP |
---|
| 20 | Strings in Tcl are encoded using 16-bit Unicode characters. Different |
---|
| 21 | operating system interfaces or applications may generate strings in |
---|
| 22 | other encodings such as Shift-JIS. The \fBencoding\fR command helps |
---|
| 23 | to bridge the gap between Unicode and these other formats. |
---|
| 24 | .SH DESCRIPTION |
---|
| 25 | .PP |
---|
| 26 | Performs one of several encoding related operations, depending on |
---|
| 27 | \fIoption\fR. The legal \fIoption\fRs are: |
---|
| 28 | .TP |
---|
| 29 | \fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR |
---|
| 30 | Convert \fIdata\fR to Unicode from the specified \fIencoding\fR. The |
---|
| 31 | characters in \fIdata\fR are treated as binary data where the lower |
---|
| 32 | 8-bits of each character is taken as a single byte. The resulting |
---|
| 33 | sequence of bytes is treated as a string in the specified |
---|
| 34 | \fIencoding\fR. If \fIencoding\fR is not specified, the current |
---|
| 35 | system encoding is used. |
---|
| 36 | .TP |
---|
| 37 | \fBencoding convertto\fR ?\fIencoding\fR? \fIstring\fR |
---|
| 38 | Convert \fIstring\fR from Unicode to the specified \fIencoding\fR. |
---|
| 39 | The result is a sequence of bytes that represents the converted |
---|
| 40 | string. Each byte is stored in the lower 8-bits of a Unicode |
---|
| 41 | character. If \fIencoding\fR is not specified, the current |
---|
| 42 | system encoding is used. |
---|
| 43 | .TP |
---|
| 44 | \fBencoding dirs\fR ?\fIdirectoryList\fR? |
---|
| 45 | .VS 8.5 |
---|
| 46 | Tcl can load encoding data files from the file system that describe |
---|
| 47 | additional encodings for it to work with. This command sets the search |
---|
| 48 | path for \fB*.enc\fR encoding data files to the list of directories |
---|
| 49 | \fIdirectoryList\fR. If \fIdirectoryList\fR is omitted then the |
---|
| 50 | command returns the current list of directories that make up the |
---|
| 51 | search path. It is an error for \fIdirectoryList\fR to not be a valid |
---|
| 52 | list. If, when a search for an encoding data file is happening, an |
---|
| 53 | element in \fIdirectoryList\fR does not refer to a readable, |
---|
| 54 | searchable directory, that element is ignored. |
---|
| 55 | .VE 8.5 |
---|
| 56 | .TP |
---|
| 57 | \fBencoding names\fR |
---|
| 58 | Returns a list containing the names of all of the encodings that are |
---|
| 59 | currently available. |
---|
| 60 | .TP |
---|
| 61 | \fBencoding system\fR ?\fIencoding\fR? |
---|
| 62 | Set the system encoding to \fIencoding\fR. If \fIencoding\fR is |
---|
| 63 | omitted then the command returns the current system encoding. The |
---|
| 64 | system encoding is used whenever Tcl passes strings to system calls. |
---|
| 65 | .SH EXAMPLE |
---|
| 66 | .PP |
---|
| 67 | It is common practice to write script files using a text editor that |
---|
| 68 | produces output in the euc-jp encoding, which represents the ASCII |
---|
| 69 | characters as singe bytes and Japanese characters as two bytes. This |
---|
| 70 | makes it easy to embed literal strings that correspond to non-ASCII |
---|
| 71 | characters by simply typing the strings in place in the script. |
---|
| 72 | However, because the \fBsource\fR command always reads files using the |
---|
| 73 | current system encoding, Tcl will only source such files correctly |
---|
| 74 | when the encoding used to write the file is the same. This tends not |
---|
| 75 | to be true in an internationalized setting. For example, if such a |
---|
| 76 | file was sourced in North America (where the ISO8859-1 is normally |
---|
| 77 | used), each byte in the file would be treated as a separate character |
---|
| 78 | that maps to the 00 page in Unicode. The resulting Tcl strings will |
---|
| 79 | not contain the expected Japanese characters. Instead, they will |
---|
| 80 | contain a sequence of Latin-1 characters that correspond to the bytes |
---|
| 81 | of the original string. The \fBencoding\fR command can be used to |
---|
| 82 | convert this string to the expected Japanese Unicode characters. For |
---|
| 83 | example, |
---|
| 84 | .CS |
---|
| 85 | set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"] |
---|
| 86 | .CE |
---|
| 87 | would return the Unicode string |
---|
| 88 | .QW "\eu306F" , |
---|
| 89 | which is the Hiragana letter HA. |
---|
| 90 | |
---|
| 91 | .SH "SEE ALSO" |
---|
| 92 | Tcl_GetEncoding(3) |
---|
| 93 | |
---|
| 94 | .SH KEYWORDS |
---|
| 95 | encoding |
---|