日韩黑丝制服一区视频播放|日韩欧美人妻丝袜视频在线观看|九九影院一级蜜桃|亚洲中文在线导航|青草草视频在线观看|婷婷五月色伊人网站|日本一区二区在线|国产AV一二三四区毛片|正在播放久草视频|亚洲色图精品一区

分享

Char * encoding

 蘭亭文藝 2019-12-17

If I write the statement below in C++ under Visual Studio, what will be encoding here?

const char *c = "£";

Under the Visual Studio project settings I have set the "Charset" to "Not set".

-----------------------------

Setting the charset to 'Not Set' simply means that neither of the preprocessor macros _UNICODE and _MBCS will be set. This has no effect on what character sets are used by the compiler.

The two settings that determine how the bytes of your source are converted to a string literal in the program are the 'source character set' and the 'execution character set'. The compiler will convert string literals from the source encoding to the execution encoding.

Source encoding:

The source encoding is the encoding used by the compiler to interpret the source file's bytes. It applies not just to string and character literals, but also to everything else in source including, for example, identifiers.

If Visual Studio's compiler detects a Unicode 'signature' in a source file then it will use the corresponding Unicode encoding as the source encoding. Otherwise it will use the system's codepage encoding as the source encoding.

Execution encoding:

The execution encoding is the encoding the compiler stores string and character literals as, such that the string and character data created by literals will be encoded using the execution encoding.

Visual Studio's compiler uses the system's codepage as the execution encoding.


When Visual Studio performs the conversion of string and character literal data from the source encoding to the execution encoding it will replace characters that cannot be represented in the execution encoding set with '?'.

So for your example:

const char *c = "£";

Assuming that your source is saved using Microsoft's "UTF-8 with signature" format and your system uses CP1252 as most systems in the West do, the string literal will be converted to:

0xA3 0x00

On the other hand, if the execution charset is something that doesn't include '£', such as cp1251 (Cyrillic, used in Window's Russian locale), then the string literal will end up:

0x3F 0x00

If you want to avoid depending on the source code encoding you can use Universal Character Names (UCNs):

const char *c = "\u00A3"; // "£"

If you want to guarantee a UTF-8 representation you'll also need to avoid dependence on the execution encoding. You can do that by manually encoding it:

const char *c = "\xC2\xA3"; // UTF-8 encoding of "£"

C++11 introduces UTF-8 string literals, which will be better when your compiler supports them:

const char *c = u8"£";

or

const char *c = u8"\u00A3"; // "£"

    本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請(qǐng)點(diǎn)擊一鍵舉報(bào)。
    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評(píng)論

    發(fā)表

    請(qǐng)遵守用戶 評(píng)論公約

    類似文章 更多