| 3 | |
| 4 | '''EDIT''' [[br]] |
| 5 | It turns out that it was mostly a Problem in the CEGUI::DefaultLogger. However that's not all. So I have to make a little detour (for Windows only!): |
| 6 | |
| 7 | On Windows, characters are encoded using the Microsoft codepage currently in use, which could be any codepage on different systems. Codepages are simply 8 bit ASCII characters extended by another 128 characters to support whatever is needed. On systems in the US and Western Europe, codepage 1252 is the standard. |
| 8 | CEGUI on the other hand uses UTF-32 (4 bytes) for their strings and converts them to UTF-8 when calling c_str(). That is of course different from the 1252 Western codepage used by Windows, so whatever we get from CEGUI might not be useful directly for the Windows API. [[br]] |
| 9 | That's why for all the Windows API functions related to strings, there is a second function with a 'W' suffix (or prefix, don't remember) that accepts wchar_t. However, the usual standard is 4 bytes for that type (UNIX), but Microsoft decided to go for 2 bytes and UTF-16 encoding. [[br]] |
| 10 | That's exactly where the bug occurred: CEGUI converted to UTF-8 and fed that to ofstream::open, which in turn was interpreted as a codepage 1252 character sequence. [[br]] |
| 11 | [[br]] |
| 12 | There is one more subtle detail left: How does CEGUI::String convert from 1252 to UTF-32 when assigning our std::string to it? Simple: according to the documentation, the characters are interpreted as unencoded 8-bit values. So a simple cast from 8 bit to 32 bit values is done. [[br]] |
| 13 | And how on earth could that ever be correct (it actually was)? It turns out that 1252 is mostly identical to UTF-32 for the first 256 characters. [[br]] |
| 14 | |
| 15 | === TODO === |
| 16 | Not every user will have the 1252 codepage and therefore a lot of things can go wrong. We somehow have to deal with this. [[br]] |
| 17 | On the other hand, the CEGUI problem, that this ticket was issued for, is just a bug and not a general behaviour. CEGUI 0.6.2 might still have the issues though. But since that only concerns Windows where we use CEGUI 0.7.5, we're safe. [[br]] |
| 18 | The other TODO is making a correct conversion from UTF-8 (standard Linux encoding if I'm not wrong) to CEGUI::String because that's just a simple cast and not a decoding. |