If I write the statement below in C++ under Visual Studio, what will be encoding here?
Under the Visual Studio project settings I have set the "Charset" to "Not set". ----------------------------- Setting the charset to 'Not Set' simply means that neither of the preprocessor macros _UNICODE and _MBCS will be set. This has no effect on what character sets are used by the compiler. The two settings that determine how the bytes of your source are converted to a string literal in the program are the 'source character set' and the 'execution character set'. The compiler will convert string literals from the source encoding to the execution encoding. Source encoding:The source encoding is the encoding used by the compiler to interpret the source file's bytes. It applies not just to string and character literals, but also to everything else in source including, for example, identifiers. If Visual Studio's compiler detects a Unicode 'signature' in a source file then it will use the corresponding Unicode encoding as the source encoding. Otherwise it will use the system's codepage encoding as the source encoding. Execution encoding:The execution encoding is the encoding the compiler stores string and character literals as, such that the string and character data created by literals will be encoded using the execution encoding. Visual Studio's compiler uses the system's codepage as the execution encoding. When Visual Studio performs the conversion of string and character literal data from the source encoding to the execution encoding it will replace characters that cannot be represented in the execution encoding set with '?'. So for your example:
Assuming that your source is saved using Microsoft's "UTF-8 with signature" format and your system uses CP1252 as most systems in the West do, the string literal will be converted to:
On the other hand, if the execution charset is something that doesn't include '£', such as cp1251 (Cyrillic, used in Window's Russian locale), then the string literal will end up:
If you want to avoid depending on the source code encoding you can use Universal Character Names (UCNs):
If you want to guarantee a UTF-8 representation you'll also need to avoid dependence on the execution encoding. You can do that by manually encoding it:
C++11 introduces UTF-8 string literals, which will be better when your compiler supports them:
or
|
|