網(wǎng)頁中的字符編碼:
1、編碼轉(zhuǎn)換(to Unicode)
(程序代碼來源于網(wǎng)絡(luò))
Js版
<script>
test = "你好abc"
str = ""
for( i=0; i<test.length; i++ )
{
temp = test.charCodeAt(i).toString(16);
str += "\\u"+ new Array(5-String(temp).length).join("0") +temp;
}
document.write (str)
</script>
vbs版
Function Unicode(str1)
Dim str,temp
str = ""
For i=1 to len(str1)
temp = Hex(AscW(Mid(str1,i,1)))
If len(temp) < 5 Then temp = right("0000" & temp, 4)
str = str & "\u" & temp
Next
Unicode = str
End Function
Function htmlentities(str)
For i = 1 to Len(str)
char = mid(str, i, 1)
If Ascw(char) > 128 then
htmlentities = htmlentities & "" & Ascw(char) & ";"
Else
htmlentities = htmlentities & char
End if
Next
End Function
coldfusion版
function nochaoscode(str)
{
var new_str = “”;
for(i=1; i lte len(str);i=i+1){
if(asc(mid(str,i,1)) lt 128){
new_str = new_str & mid(str,i,1);
}else{
new_str = new_str & “#” & asc(mid(str,i,1));
}
}
return new_str;
}
附:
在php中我們可以用mbstring的mb_convert_encoding函數(shù)實現(xiàn)這個正向及反向的轉(zhuǎn)化。 如:
mb_convert_encoding ("你好", "HTML-ENTITIES", "gb2312"); //輸出:你好 mb_convert_encoding ("你好", "gb2312", "HTML-ENTITIES"); //輸出:你好
如果需要對整個頁面轉(zhuǎn)化,則只需要在php文件的頭部加上這三行代碼:
mb_internal_encoding("gb2312"); // 這里的gb2312是你網(wǎng)站原來的編碼 mb_http_output("HTML-ENTITIES"); ob_start(‘mb_output_handler‘);
如果沒有打開mbstring擴展,可以參考上的這兩篇文章: 在任意字符集下正常顯示網(wǎng)頁的方法 在任意字符集下正常顯示網(wǎng)頁的方法(續(xù))
2、HTML實體
HTML 4.01 支持 ISO 8859-1 (Latin-1) 字符集。
提示 實體名是區(qū)分大小寫的。
備注 同一個符號,可以用“實體名稱”和“實體編號”兩種方式引用,“實體名稱”的優(yōu)勢在于便于記憶,但不能保證所有的瀏覽器都能順利識別它,而“實體編號”則沒有這種擔(dān)憂,但它實在不方便記憶。
ASCII中部分實體的新名字
顯示
|
描述
|
實體名稱
|
實體編號
|
" |
quotation mark
|
" |
" |
‘ |
apostrophe |
' (IE下無效)
|
' |
& |
ampersand |
& |
& |
< |
less-than |
< |
< |
> |
greater-than |
> |
> |
ISO 8859-1 符號實體
顯示
|
描述
|
實體名稱
|
實體編號
|
|
non-breaking space
|
|
|
¡ |
inverted exclamation mark
|
¡ |
¡ |
¤ |
currency |
¤ |
¤ |
¢
|
cent |
¢ |
¢ |
£
|
pound |
£ |
£ |
¥
|
yen |
¥ |
¥ |
¦ |
broken vertical bar
|
¦ |
¦ |
§ |
section |
§ |
§ |
¨ |
spacing diaeresis
|
¨ |
¨ |
© |
copyright |
© |
© |
a |
feminine ordinal indicator
|
ª |
ª |
« |
angle quotation mark (left)
|
« |
« |
|
negation |
¬ |
¬ |
- |
soft hyphen
|
­ |
|
® |
registered trademark
|
® |
® |
™ |
trademark |
™ |
™ |
ˉ |
spacing macron
|
¯ |
¯ |
° |
degree |
° |
° |
± |
plus-or-minus |
± |
± |
2 |
superscript 2
|
² |
² |
3 |
superscript 3
|
³ |
³ |
′ |
spacing acute
|
´
|
´ |
μ |
micro |
µ |
µ |
|
paragraph |
¶ |
¶ |
· |
middle dot
|
· |
· |
|
spacing cedilla
|
¸ |
¸ |
1 |
superscript 1
|
¹ |
¹ |
o |
masculine ordinal indicator
|
º |
º |
» |
angle quotation mark (right)
|
» |
» |
|
fraction 1/4
|
¼ |
¼ |
|
fraction 1/2
|
½ |
½ |
|
fraction 3/4
|
¾ |
¾ |
|
inverted question mark
|
¿ |
¿ |
× |
multiplication |
× |
× |
÷ |
division |
÷ |
÷ |
ISO 8859-1 字符實體
顯示
|
描述
|
實體名稱
|
實體編號
|
À |
capital a, grave accent
|
À |
À |
Á |
capital a, acute accent
|
Á |
Á |
 |
capital a, circumflex accent
|
 |
 |
à |
capital a, tilde
|
à |
à |
Ä |
capital a, umlaut mark
|
Ä |
Ä |
Å |
capital a, ring
|
Å |
Å |
Æ |
capital ae
|
Æ |
Æ |
Ç |
capital c, cedilla
|
Ç |
Ç |
È |
capital e, grave accent
|
È |
È |
É |
capital e, acute accent
|
É |
É |
Ê |
capital e, circumflex accent
|
Ê |
Ê |
Ë |
capital e, umlaut mark
|
Ë |
Ë |
Ì |
capital i, grave accent
|
Ì |
Ì |
Í |
capital i, acute accent
|
Í |
Í |
Î |
capital i, circumflex accent
|
Î |
Î |
Ï |
capital i, umlaut mark
|
Ï |
Ï |
Ð |
capital eth, Icelandic
|
Ð |
Ð |
Ñ |
capital n, tilde
|
Ñ |
Ñ |
Ò |
capital o, grave accent
|
Ò |
Ò |
Ó |
capital o, acute accent
|
Ó |
Ó |
Ô |
capital o, circumflex accent
|
Ô |
Ô |
Õ |
capital o, tilde
|
Õ |
Õ |
Ö |
capital o, umlaut mark
|
Ö |
Ö |
Ø |
capital o, slash
|
Ø |
Ø |
ù |
capital u, grave accent
|
Ù |
Ù |
ú |
capital u, acute accent
|
Ú |
Ú |
|
capital u, circumflex accent
|
Û |
Û |
ü |
capital u, umlaut mark
|
Ü |
Ü |
Y |
capital y, acute accent
|
Ý |
Ý |
T |
capital THORN, Icelandic
|
Þ |
Þ |
|
small sharp s, German
|
ß |
ß |
à |
small a, grave accent
|
à |
à |
á |
small a, acute accent
|
á |
á |
a |
small a, circumflex accent
|
â |
â |
|
small a, tilde
|
ã |
ã |
|
small a, umlaut mark
|
ä |
ä |
|
small a, ring
|
å |
å |
|
small ae
|
æ |
æ |
|
small c, cedilla
|
ç |
ç |
è |
small e, grave accent
|
è |
è |
é |
small e, acute accent
|
é |
é |
ê |
small e, circumflex accent
|
ê |
ê |
|
small e, umlaut mark
|
ë |
ë |
ì |
small i, grave accent
|
ì |
ì |
í |
small i, acute accent
|
í |
í |
|
small i, circumflex accent
|
î |
î |
|
small i, umlaut mark
|
ï |
ï |
e |
small eth, Icelandic
|
ð |
ð |
|
small n, tilde
|
ñ |
ñ |
ò |
small o, grave accent
|
ò |
ò |
ó |
small o, acute accent
|
ó |
ó |
|
small o, circumflex accent
|
ô |
ô |
|
small o, tilde
|
õ |
õ |
|
small o, umlaut mark
|
ö |
ö |
|
small o, slash
|
ø |
ø |
ù |
small u, grave accent
|
ù |
ù |
ú |
small u, acute accent
|
ú |
ú |
|
small u, circumflex accent
|
û |
û |
ü |
small u, umlaut mark
|
ü |
ü |
y |
small y, acute accent
|
ý |
ý |
t |
small thorn, Icelandic
|
þ |
þ |
|
small y, umlaut mark
|
ÿ |
ÿ |
其它一些 HTML 所支持的實體
顯示
|
描述
|
實體名稱
|
實體編號
|
Œ |
capital ligature OE
|
Œ |
Œ |
œ |
small ligature oe
|
œ |
œ |
Š |
capital S with caron
|
Š |
Š |
š |
small S with caron
|
š |
š |
Ÿ |
capital Y with diaeres
|
Ÿ |
Ÿ |
ˆ |
modifier letter circumflex accent
|
ˆ |
ˆ |
˜ |
small tilde
|
˜ |
˜ |
|
en space
|
  |
|
|
em space
|
  |
|
|
thin space
|
  |
|
|
zero width non-joiner
|
‌ |
|
|
zero width joiner
|
‍ |
|
|
left-to-right mark
|
‎ |
|
|
right-to-left mark
|
‏ |
|
– |
en dash
|
– |
– |
— |
em dash
|
— |
— |
‘ |
left single quotation mark
|
‘ |
‘ |
’ |
right single quotation mark
|
’ |
’ |
‚ |
single low-9 quotation mark
|
‚ |
‚ |
“ |
left double quotation mark
|
“ |
“ |
” |
right double quotation mark
|
” |
” |
„ |
double low-9 quotation mark
|
„ |
„ |
† |
dagger |
† |
† |
‡ |
double dagger
|
‡ |
‡ |
… |
horizontal ellipsis
|
… |
… |
‰ |
per mille
|
‰ |
‰ |
‹ |
single left-pointing angle quotation
|
‹ |
‹ |
› |
single right-pointing angle quotation
|
› |
› |
€ |
euro |
€ |
€ |
參考:
http://www./?p=72 http://www./forum/read.php?tid=258
|