Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

Changes between Initial Version and Version 2 of Ticket #379


Ignore:
Timestamp:
May 12, 2011, 6:51:31 PM (14 years ago)
Author:
rgrieder
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #379

    • Property Keywords utf western 1252 codepage cegui added
    • Property Priority changed from critical to minor
  • Ticket #379 – Description

    initial v2  
    11When starting Orxonox in a directory like 'ásdf' on Windows 7, the CEGUI logger will not accept the logging file, leading to an exception. [[br]]
    22We need to investigate whether this is a just a communication problem between Orxonox and CEGUI or whether we have serious issues with international characters in paths.
     3
     4'''EDIT''' [[br]]
     5It turns out that it was mostly a Problem in the CEGUI::DefaultLogger. However that's not all. So I have to make a little detour (for Windows only!):
     6
     7On Windows, characters are encoded using the Microsoft codepage currently in use, which could be any codepage on different systems. Codepages are simply 8 bit ASCII characters extended by another 128 characters to support whatever is needed. On systems in the US and Western Europe, codepage 1252 is the standard.
     8CEGUI on the other hand uses UTF-32 (4 bytes) for their strings and converts them to UTF-8 when calling c_str(). That is of course different from the 1252 Western codepage used by Windows, so whatever we get from CEGUI might not be useful directly for the Windows API. [[br]]
     9That's why for all the Windows API functions related to strings, there is a second function with a 'W' suffix (or prefix, don't remember) that accepts wchar_t. However, the usual standard is 4 bytes for that type (UNIX), but Microsoft decided to go for 2 bytes and UTF-16 encoding. [[br]]
     10That's exactly where the bug occurred: CEGUI converted to UTF-8 and fed that to ofstream::open, which in turn was interpreted as a codepage 1252 character sequence. [[br]]
     11[[br]]
     12There is one more subtle detail left: How does CEGUI::String convert from 1252 to UTF-32 when assigning our std::string to it? Simple: according to the documentation, the characters are interpreted as unencoded 8-bit values. So a simple cast from 8 bit to 32 bit values is done. [[br]]
     13And how on earth could that ever be correct (it actually was)? It turns out that 1252 is mostly identical to UTF-32 for the first 256 characters. [[br]]
     14
     15=== TODO ===
     16Not every user will have the 1252 codepage and therefore a lot of things can go wrong. We somehow have to deal with this. [[br]]
     17On the other hand, the CEGUI problem, that this ticket was issued for, is just a bug and not a general behaviour. CEGUI 0.6.2 might still have the issues though. But since that only concerns Windows where we use CEGUI 0.7.5, we're safe. [[br]]
     18The other TODO is making a correct conversion from UTF-8 (standard Linux encoding if I'm not wrong) to CEGUI::String because that's just a simple cast and not a decoding.