Context Navigation

← Previous Change
Ticket History
Next Change →

Changes between Initial Version and Version 2 of Ticket #379

Timestamp:: May 12, 2011, 6:51:31 PM (14 years ago)
Author:: rgrieder
Comment:

Legend:

: Unmodified
: Added
: Removed
: Modified

Ticket #379
- Property Keywords utf western 1252 codepage cegui added
- Property Priority changed from critical to minor

Ticket #379 – Description

-                      initial
+                      v2
 When starting Orxonox in a directory like 'ásdf' on Windows 7, the CEGUI logger will not accept the logging file, leading to an exception. [[br]]
 We need to investigate whether this is a just a communication problem between Orxonox and CEGUI or whether we have serious issues with international characters in paths.
+'''EDIT''' [[br]]
+It turns out that it was mostly a Problem in the CEGUI::DefaultLogger. However that's not all. So I have to make a little detour (for Windows only!):
+On Windows, characters are encoded using the Microsoft codepage currently in use, which could be any codepage on different systems. Codepages are simply 8 bit ASCII characters extended by another 128 characters to support whatever is needed. On systems in the US and Western Europe, codepage 1252 is the standard.
+CEGUI on the other hand uses UTF-32 (4 bytes) for their strings and converts them to UTF-8 when calling c_str(). That is of course different from the 1252 Western codepage used by Windows, so whatever we get from CEGUI might not be useful directly for the Windows API. [[br]]
+That's why for all the Windows API functions related to strings, there is a second function with a 'W' suffix (or prefix, don't remember) that accepts wchar_t. However, the usual standard is 4 bytes for that type (UNIX), but Microsoft decided to go for 2 bytes and UTF-16 encoding. [[br]]
+That's exactly where the bug occurred: CEGUI converted to UTF-8 and fed that to ofstream::open, which in turn was interpreted as a codepage 1252 character sequence. [[br]]
+[[br]]
+There is one more subtle detail left: How does CEGUI::String convert from 1252 to UTF-32 when assigning our std::string to it? Simple: according to the documentation, the characters are interpreted as unencoded 8-bit values. So a simple cast from 8 bit to 32 bit values is done. [[br]]
+And how on earth could that ever be correct (it actually was)? It turns out that 1252 is mostly identical to UTF-32 for the first 256 characters. [[br]]
+=== TODO ===
+Not every user will have the 1252 codepage and therefore a lot of things can go wrong. We somehow have to deal with this. [[br]]
+On the other hand, the CEGUI problem, that this ticket was issued for, is just a bug and not a general behaviour. CEGUI 0.6.2 might still have the issues though. But since that only concerns Windows where we use CEGUI 0.7.5, we're safe. [[br]]
+The other TODO is making a correct conversion from UTF-8 (standard Linux encoding if I'm not wrong) to CEGUI::String because that's just a simple cast and not a decoding.