[C] 跨平台使用TCHAR——让Linux等平台也支持tchar.h，解决跨平台时的格式控制字符问题，多国语言的同时显示（兼容vc/gcc/bcb，支持Windows/Linux/Mac）...

阅读量：5958 次

发布时间：2019-06-19

本文共 38782 字，大约阅读时间需要 129 分钟。

作者：

　　将Windows程序移植到Linux等平台时，经常会遇到tchar.h问题与字符串的格式控制字符问题（char串、wchar_t串、TCHAR串混合输出）。本文探讨如何解决这些问题。

一、背景

1.1 历史

　　传统的C程序使用char字符串，采用ANSI+DBCS方案来支持当地语言，不能实现多国语言同时显示。

　　当年微软在设计Windows NT时考虑到国际化，决定内核支持Unicode，对应wchar_t类型。那时的Unicode只有16位，于是Windows中的wchar_t是16位的。

　　为了兼容老程序，与字符串有关的API一般有两套——A结尾的表示是ANSI版，使用char字符串；W结尾的是Unicode版，使用wchar_t字符串。

　　两套API用起来不方便，于是微软设计了tchar.h，定义了TCHAR类型，使用宏来切换。只需编写一份代码，就可分别编译为ANSI版与Unicode版，分别兼容老系统（win9X）和新系统（winNT）。

　　Linux等平台较晚才支持Unicode，那时已经有成熟的UTF-8编码方案，兼容传统的char类型。于是Linux等平台将UTF-8作为默认编码，这样不仅支持Unicode多国语言，而且传统的C标准库、POSIX等API均能正常工作。两全其美，不再需要搞两套API，自然也不需要tchar.h。

　　UTF-8是变长编码，一个字符可能是1至4字节，处理起来不太方便。于是Linux等平台也提供了wchar_t类型，只不过它是32位的。

　　为什么是32位的的呢，这与Unicode的发展有关。由于Unicode需要收录的东西太多，16位早就不够用了。

　　UCS-4 提倡31位的编码空间，并提出了UTF-32和6字节UTF-8等编码方案。可是该方案的成本很高。

　　进过折衷考虑，Unicode组织将编码空间由16位的0至FFFF，升级至21位的0至10FFFF。将传统16位Unicode编码称为UTF-16，并提供代理对（surrogate）方案，用两个UTF-16字符单元来编码超过16位的字符。

　　也就是说，如果wchar_t类型是16位的话，那它实际上代表UTF-16编码——对于在U+0000至U+FFFF之间的字符，每个字符占1个wchar_t；对于在U+10000至U+10FFFF之间的字符，每个字符占2个wchar_t。

　　为了确保每个字符都只占1个wchar_t，那就得将wchar_t定义为32位。这也就是UTF-32编码。

　　虽然UTF-8编码方案本身能表达很大的编码空间（例如6字节UTF-8可编码31位），但为了规范化，RFC 3629规定UTF-8最长为4字节，即最高21位编码，超过10FFFF的编码点是无效的。

1.2 为什么需要让Linux等平台也支持tchar.h？

　　很多人认为Linux等平台没必要支持tchar.h，这主要是因为wchar_t的一些问题——

1. UTF-8编码的char类型能满足Unicode国际化需求。

2. char类型更容易跨平台。而wchar_t是C95修订中加入的，到C99标准才有比较完善的支持，故某些旧编译器对wchar_t支持性不佳、甚至完全不支持。

3. wchar_t的位数不固定。在Windows平台中它是16位，而在Linux等平台中它是32位的。C99标准并没有严格规定wchar_t的位数。

4. wchar_t版函数与char版函数不对称。在C99的C标准库中，只有部分字符串函数有wchar_t版。虽然Windows平台上有A、W两套对称的API，但其他平台只有一套API。

　　以前我也赞同上述观点，但是现在我觉得有一个tchar.h会方便很多，理由有——

1. 方便Windows程序移植。很多控制台程序只进行了一些很简单的字符串操作，不会遇到wchar_t的缺陷。如果仅因缺少tchar.h问题而改动代码的话，那就成本太高了。

2. 无副作用。对于Linux等只有一套API的平台，可以取消UNICODE宏，这样tchar.h会将TCHAR映射为char，使用传统的窄字符串版函数。

3. 避免printf/wprintf混用时的Bug。printf与wprintf内部使用的是不同的缓冲区，混用会造成Bug。统一使用TCHAR能避免该bug。

1.3 字符串的格式控制字符问题

　　除了tchar.h问题外，在跨平台操作字符串时还会遇到格式控制问题。例如这些问题——

1. 在printf中使用哪种格式控制字符来输出 char字符/字符串？

2. 在printf中使用哪种格式控制字符来输出 wchar_t字符/字符串？

3. 在printf中使用哪种格式控制字符来输出 TCHAR字符/字符串？

4. 在wprintf中使用哪种格式控制字符来输出 char字符/字符串？

5. 在wprintf中使用哪种格式控制字符来输出 wchar_t字符/字符串？

6. 在wprintf中使用哪种格式控制字符来输出 TCHAR字符/字符串？

　　C99标准比较保守，不能完全解决上述问题。C99标准中对c、s仅存在“l”长度修正——没“l”的是char字符串，有“l”的是wchar_t字符串。详见C99标准的“7.24.2.1 The fwprintf function”。

　　VC++因为需要处理两套字符串API，所以它对该问题的支持非常完善。VC++中上述6个问题的答案是——

1. hc/hs。

2. lc/ls。

3. c/s。

4. hc/hs。

5. lc/ls。

6. c/s。

　　对于BCB、MingGW等Windows平台上的编译器，它们也兼容VC++的做法，支持这些格式控制字符。

　　而对于Linux等平台的gcc，它紧跟C99标准，不支持那么多格式控制字符。

　　我以前做过测试，详见——

[C] wchar_t的格式控制字符（VC、BCB、GCC、C99标准）》

1.4 _tmain入口函数问题

　　标准C使用main函数作为程序入口，其格式为——

int main(int argc, char* argv[])

　　VC++考虑到到TCHAR类型的命令行参数，于是又定义_tmain程序入口，其格式为——

int _tmain(int argc, TCHAR* argv[])

　　目前VC++对_tmain的支持较好，而MinGW等编译器对_tmain较差，有些只支持C标准的main。

二、解决方案

2.1 auto_tchar.h：使各种编译器兼容tchar.h

　　我编写了auto_tchar.h，它根据编译预处理判断该编译器是否支持tchar.h。若支持，便包含编译器的tchar.h；若不支持，则自己实现tchar.h，参考了 MinGW 的 tchar.h. 。

　　在测试时发现，BCB6的tchar.h中没有定义TCHAR，只定义了_TCHAR。TCHAR是在winnt.h中定义的。于是做了如下修正——

// 修正BCB6的tchar.h只有_TCHAR却没有TCHAR的问题.    #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)        typedef _TCHAR    TCHAR, *PTCHAR;        typedef _TCHAR    TBYTE, *PTBYTE;        #define _TCHAR_DEFINED    #endif    // #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)

　　使用方法——

1. 将“auto_tchar.h”放在项目的include目录中。

2. 将原来的“#include <tchar.h>”改为“#include "auto_tchar.h"”。

2.2 prichar.h：解决字符串的格式控制字符问题

　　怎么解决各个编译器对格式控制字符的差异呢？

　　我从C99标准的inttypes.h找到了灵感。inttypes.h定义了一系列PRI开头的宏，解决了各种整数的格式控制字符问题。

　　我们也可以这样做，编写一个头文件，里面定义了一系列字符串的PRI宏。同时利用编译预处理判断各种编译器，定义合适的常量。

　　我编写了prichar.h，定义了这些宏——

SCNcA

SCNsA

SCNcW

SCNsW

SCNcT

SCNsT

PRIcA

PRIsA

PRIcW

PRIsW

PRIcT

PRIsT

　　前缀含义——

PRI: print, 输出.

SCN: scan, 输入.

　　中缀含义——

c: char, 字符.

s: string, 字符串.

　　后缀含义——

A: char, 窄字符版.

W: wchar_t, 宽字符版.

T: TCHAR, TCHAR版.

　　使用方法——

1. 将“prichar.h”放在项目的include目录中。

2. 包含该头文件（#include "prichar.h"）。

3. 代码示例——

char* psa = "A汉字ABC_Welcome_歡迎_ようこそ_환영.";wchar_t* psw = L"W汉字ABC_Welcome_歡迎_ようこそ_환영.";TCHAR* pst = _T("T汉字ABC_Welcome_歡迎_ようこそ_환영.");    _tprintf(_T("%")_T(PRIsA)_T("\n"), psa);    // 输出窄字符串.    _tprintf(_T("%")_T(PRIsW)_T("\n"), psw);    // 输出宽字符串.    _tprintf(_T("%")_T(PRIsT)_T("\n"), pst);    // 输出TCHAR字符串.

　　注：必须多次使用“_T”宏，不能省略。如果将格式字符串写成“_T("%"PRIsA"\n")”，在编译Unicode版时，编译器将其会展开为“L"%" "hs" "\n"”，然后报告宽字符串不能与窄字符串串联错误（例如VC++报告“error C2308: 串联不匹配的字符串”）。

2.3 auto_tmain.h：解决_tmain入口函数问题

　　根据编译预处理判断该编译器是否支持_tmain。若支持，便不做额外处理；若不支持，则做一些处理使其支持_tmain。

　　参考了

　　使用方法——

1. 将“auto_tmain.h”放在项目的include目录中。

2. 在主源文件包含该头文件（#include "auto_tmain.h"）。

3. 现在_tmain能正常使用了（int _tmain(int argc, TCHAR* argv[])）。

三、模块源码

3.1 auto_tchar.h

　　全部代码——

auto_tchar.h

/*auto_tchar.h: 使各种编译器兼容tchar.h .Author: zyl910Blog: http://www.cnblogs.com/zyl910URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.htmlVersion: V1.00Updata: 2013-01-17测试过的编译器--VC: 6, 2003, 2005, 2008, 2010, 2012.BCB: 6.GCC: 4.7.1(MinGW-w64), 4.7.0(Fedora 17), 4.6.2(MinGW), llvm-gcc-4.2(Mac OS X Lion 10.7.4, Xcode 4.4.1).Update~~~~~~[2013-01-17] V1.00* V1.0发布.* 为了避免包含目录问题，更名auto_tchar.h（原tchar.h）.* 解决BCB6的TCHAR问题（tchar.h中没有定义TCHAR，只定义了_TCHAR。TCHAR是在winnt.h中定义的）.[2012-11-08] V0.01* 初步完成.* 参考了 MinGW 的 tchar.h. http://www.mingw.org/*/#ifndef __AUTO_TCHAR_H_INCLUDED#define __AUTO_TCHAR_H_INCLUDED// __AUTO_TCHAR_H_USESYS: 判断编译器是否提供了
      
       #undef __AUTO_TCHAR_H_USESYS#if defined(_MSC_VER)    // MSVC.    #define __AUTO_TCHAR_H_USESYS#elif defined(__BORLANDC__)    // BCB.    #define __AUTO_TCHAR_H_USESYS#elif defined(_WIN32)||defined(_WIN64)||defined(__MINGW32__)||defined(__MINGW64__)||defined(__CYGWIN__)    // 假定Windows平台的编译器均支持
       
        .        #define __AUTO_TCHAR_H_USESYS#else    // 假设其他编译器不支持
        
         .#endif    // __AUTO_TCHAR_H_USESYS#ifdef __AUTO_TCHAR_H_USESYS// 使用编译器提供的tchar.h .    #include 
         
              // 修正BCB6的tchar.h只有_TCHAR却没有TCHAR的问题.    #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)        typedef _TCHAR    TCHAR, *PTCHAR;        typedef _TCHAR    TBYTE, *PTBYTE;        #define _TCHAR_DEFINED    #endif    // #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)#else// 采用自定义的tchar.h. 参考了 MinGW 的 tchar.h. http://www.mingw.org/#ifndef    _TCHAR_H_#define _TCHAR_H_///* All the headers include this file. *///#include <_mingw.h>/* * NOTE: This tests _UNICODE, which is different from the UNICODE define *       used to differentiate Win32 API calls. */#ifdef    _UNICODE/* * Include 
          
            for wchar_t and WEOF if _UNICODE. */#include 
           
            /* * Use TCHAR instead of char or wchar_t. It will be appropriately translated * if _UNICODE is correctly defined (or not). */#ifndef _TCHAR_DEFINED#ifndef RC_INVOKEDtypedef wchar_t TCHAR;typedef wchar_t _TCHAR;#endif /* Not RC_INVOKED */#define _TCHAR_DEFINED#endif/* * Use _TEOF instead of EOF or WEOF. It will be appropriately translated if * _UNICODE is correctly defined (or not). */#define _TEOF WEOF/* * __TEXT is a private macro whose specific use is to force the expansion of a * macro passed as an argument to the macros _T or _TEXT. DO NOT use this * macro within your programs. It's name and function could change without * notice. */#define __TEXT(q) L##q/* for porting from other Windows compilers */#if 0 /* no wide startup module */#define _tmain wmain#define _tWinMain wWinMain#define _tenviron _wenviron#define __targv __wargv#endif/* * Unicode functions */#define _tprintf wprintf#define _ftprintf fwprintf#define _stprintf swprintf#define _sntprintf _snwprintf#define _vtprintf vwprintf#define _vftprintf vfwprintf#define _vstprintf vswprintf#define _vsntprintf _vsnwprintf#define _vsctprintf _vscwprintf#define _tscanf wscanf#define _ftscanf fwscanf#define _stscanf swscanf#define _fgettc fgetwc#define _fgettchar _fgetwchar#define _fgetts fgetws#define _fputtc fputwc#define _fputtchar _fputwchar#define _fputts fputws#define _gettc getwc#define _getts _getws#define _puttc putwc#define _puttchar putwchar#define _putts _putws#define _ungettc ungetwc#define _tcstod wcstod#define _tcstol wcstol#define _tcstoul wcstoul#define _itot _itow#define _ltot _ltow#define _ultot _ultow#define _ttoi _wtoi#define _ttol _wtol#define _tcscat wcscat#define _tcschr wcschr#define _tcscmp wcscmp#define _tcscpy wcscpy#define _tcscspn wcscspn#define _tcslen wcslen#define _tcsncat wcsncat#define _tcsncmp wcsncmp#define _tcsncpy wcsncpy#define _tcspbrk wcspbrk#define _tcsrchr wcsrchr#define _tcsspn wcsspn#define _tcsstr wcsstr#define _tcstok wcstok#define _tcsdup _wcsdup#define _tcsicmp _wcsicmp#define _tcsnicmp _wcsnicmp#define _tcsnset _wcsnset#define _tcsrev _wcsrev#define _tcsset _wcsset#define _tcslwr _wcslwr#define _tcsupr _wcsupr#define _tcsxfrm wcsxfrm#define _tcscoll wcscoll#define _tcsicoll _wcsicoll#define _istalpha iswalpha#define _istupper iswupper#define _istlower iswlower#define _istdigit iswdigit#define _istxdigit iswxdigit#define _istspace iswspace#define _istpunct iswpunct#define _istalnum iswalnum#define _istprint iswprint#define _istgraph iswgraph#define _istcntrl iswcntrl#define _istascii iswascii#define _totupper towupper#define _totlower towlower#define _tcsftime wcsftime/* Macro functions */ #define _tcsdec _wcsdec#define _tcsinc _wcsinc#define _tcsnbcnt _wcsncnt#define _tcsnccnt _wcsncnt#define _tcsnextc _wcsnextc#define _tcsninc _wcsninc#define _tcsspnp _wcsspnp#define _wcsdec(_wcs1, _wcs2) ((_wcs1)>=(_wcs2) ? NULL : (_wcs2)-1)#define _wcsinc(_wcs) ((_wcs)+1)#define _wcsnextc(_wcs) ((unsigned int) *(_wcs))#define _wcsninc(_wcs, _inc) (((_wcs)+(_inc)))#define _wcsncnt(_wcs, _cnt) ((wcslen(_wcs)>_cnt) ? _count : wcslen(_wcs))#define _wcsspnp(_wcs1, _wcs2) ((*((_wcs1)+wcsspn(_wcs1,_wcs2))) ? ((_wcs1)+wcsspn(_wcs1,_wcs2)) : NULL)#if 1 /* defined __MSVCRT__ *//* * These wide functions not in crtdll.dll. * Define macros anyway so that _wfoo rather than _tfoo is undefined */#define _ttoi64 _wtoi64#define _i64tot _i64tow#define _ui64tot _ui64tow#define _tasctime _wasctime#define _tctime _wctime#if __MSVCRT_VERSION__ >= 0x0800#define _tctime32 _wctime32#define _tctime64 _wctime64#endif /* __MSVCRT_VERSION__ >= 0x0800 */#define _tstrdate _wstrdate#define _tstrtime _wstrtime#define _tutime _wutime#if __MSVCRT_VERSION__ >= 0x0800#define _tutime64 _wutime64#define _tutime32 _wutime32#endif /* __MSVCRT_VERSION__ > 0x0800 */#define _tcsnccoll _wcsncoll#define _tcsncoll _wcsncoll#define _tcsncicoll _wcsnicoll#define _tcsnicoll _wcsnicoll#define _taccess _waccess#define _tchmod _wchmod#define _tcreat _wcreat#define _tfindfirst _wfindfirst#define _tfindnext _wfindnext#if __MSVCRT_VERSION__ >= 0x0800#define _tfindfirst64 _wfindfirst64#define _tfindfirst32 _wfindfirst32#define _tfindnext64 _wfindnext64#define _tfindnext32 _wfindnext32#endif /* __MSVCRT_VERSION__ > 0x0800 */#define _tfdopen _wfdopen#define _tfopen _wfopen#define _tfreopen _wfreopen#define _tfsopen _wfsopen#define _tgetenv _wgetenv#define _tputenv _wputenv#define _tsearchenv _wsearchenv#define _tsystem _wsystem#define _tmakepath _wmakepath#define _tsplitpath _wsplitpath#define _tfullpath _wfullpath#define _tmktemp _wmktemp#define _topen _wopen#define _tremove _wremove#define _trename _wrename#define _tsopen _wsopen#define _tsetlocale _wsetlocale#define _tunlink _wunlink#define _tfinddata_t _wfinddata_t#define _tfindfirsti64 _wfindfirsti64#define _tfindnexti64 _wfindnexti64#define _tfinddatai64_t _wfinddatai64_t#if __MSVCRT_VERSION__ >= 0x0601#define _tfinddata64_t _wfinddata64_t#endif#if __MSVCRT_VERSION__ >= 0x0800#define _tfinddata32_t _wfinddata32_t#define _tfinddata32i64_t _wfinddata32i64_t#define _tfinddata64i32_t _wfinddata64i32_t#define _tfindfirst32i64 _wfindfirst32i64#define _tfindfirst64i32 _wfindfirst64i32#define _tfindnext32i64 _wfindnext32i64#define _tfindnext64i32 _wfindnext64i32#endif /* __MSVCRT_VERSION__ > 0x0800 */#define _tchdir _wchdir#define _tgetcwd _wgetcwd#define _tgetdcwd _wgetdcwd#define _tmkdir _wmkdir#define _trmdir _wrmdir#define _tstat _wstat#define _tstati64 _wstati64#define _tstat64 _wstat64#if __MSVCRT_VERSION__ >= 0x0800#define _tstat32 _wstat32#define _tstat32i64 _wstat32i64#define _tstat64i32 _wstat64i32#endif /* __MSVCRT_VERSION__ > 0x0800 */#endif /* __MSVCRT__ *//* dirent structures and functions */#define _tdirent _wdirent#define _TDIR _WDIR#define _topendir _wopendir#define _tclosedir _wclosedir#define _treaddir _wreaddir#define _trewinddir _wrewinddir#define _ttelldir _wtelldir#define _tseekdir _wseekdir#else /* Not _UNICODE *//* * TCHAR, the type you should use instead of char. */#ifndef _TCHAR_DEFINED#ifndef RC_INVOKEDtypedef char TCHAR;typedef char _TCHAR;#endif#define _TCHAR_DEFINED#endif/* * _TEOF, the constant you should use instead of EOF. */#define _TEOF EOF/* * __TEXT is a private macro whose specific use is to force the expansion of a * macro passed as an argument to the macros _T or _TEXT. DO NOT use this * macro within your programs. It's name and function could change without * notice. */#define __TEXT(q) q/* for porting from other Windows compilers */#define _tmain main#define _tWinMain WinMain#define _tenviron _environ#define __targv __argv/* * Non-unicode (standard) functions */#define _tprintf printf#define _ftprintf fprintf#define _stprintf sprintf#define _sntprintf _snprintf#define _vtprintf vprintf#define _vftprintf vfprintf#define _vstprintf vsprintf#define _vsntprintf _vsnprintf#define _vsctprintf _vscprintf#define _tscanf scanf#define _ftscanf fscanf#define _stscanf sscanf#define _fgettc fgetc#define _fgettchar _fgetchar#define _fgetts fgets#define _fputtc fputc#define _fputtchar _fputchar#define _fputts fputs#define _tfdopen _fdopen#define _tfopen fopen#define _tfreopen freopen#define _tfsopen _fsopen#define _tgetenv getenv#define _tputenv _putenv#define _tsearchenv _searchenv#define _tsystem system#define _tmakepath _makepath#define _tsplitpath _splitpath#define _tfullpath _fullpath#define _gettc getc#define _getts gets#define _puttc putc#define _puttchar putchar#define _putts puts#define _ungettc ungetc#define _tcstod strtod#define _tcstol strtol#define _tcstoul strtoul#define _itot _itoa#define _ltot _ltoa#define _ultot _ultoa#define _ttoi atoi#define _ttol atol#define _tcscat strcat#define _tcschr strchr#define _tcscmp strcmp#define _tcscpy strcpy#define _tcscspn strcspn#define _tcslen strlen#define _tcsncat strncat#define _tcsncmp strncmp#define _tcsncpy strncpy#define _tcspbrk strpbrk#define _tcsrchr strrchr#define _tcsspn strspn#define _tcsstr strstr#define _tcstok strtok#define _tcsdup _strdup#define _tcsicmp _stricmp#define _tcsnicmp _strnicmp#define _tcsnset _strnset#define _tcsrev _strrev#define _tcsset _strset#define _tcslwr _strlwr#define _tcsupr _strupr#define _tcsxfrm strxfrm#define _tcscoll strcoll#define _tcsicoll _stricoll#define _istalpha isalpha#define _istupper isupper#define _istlower islower#define _istdigit isdigit#define _istxdigit isxdigit#define _istspace isspace#define _istpunct ispunct#define _istalnum isalnum#define _istprint isprint#define _istgraph isgraph#define _istcntrl iscntrl#define _istascii isascii#define _totupper toupper#define _totlower tolower#define _tasctime asctime#define _tctime ctime#if __MSVCRT_VERSION__ >= 0x0800#define _tctime32 _ctime32#define _tctime64 _ctime64#endif /* __MSVCRT_VERSION__ >= 0x0800 */#define _tstrdate _strdate#define _tstrtime _strtime#define _tutime _utime#if __MSVCRT_VERSION__ >= 0x0800#define _tutime64 _utime64#define _tutime32 _utime32#endif /* __MSVCRT_VERSION__ > 0x0800 */#define _tcsftime strftime/* Macro functions */ #define _tcsdec _strdec#define _tcsinc _strinc#define _tcsnbcnt _strncnt#define _tcsnccnt _strncnt#define _tcsnextc _strnextc#define _tcsninc _strninc#define _tcsspnp _strspnp#define _strdec(_str1, _str2) ((_str1)>=(_str2) ? NULL : (_str2)-1)#define _strinc(_str) ((_str)+1)#define _strnextc(_str) ((unsigned int) *(_str))#define _strninc(_str, _inc) (((_str)+(_inc)))#define _strncnt(_str, _cnt) ((strlen(_str)>_cnt) ? _count : strlen(_str))#define _strspnp(_str1, _str2) ((*((_str1)+strspn(_str1,_str2))) ? ((_str1)+strspn(_str1,_str2)) : NULL)#define _tchmod _chmod#define _tcreat _creat#define _tfindfirst _findfirst#define _tfindnext _findnext#if __MSVCRT_VERSION__ >= 0x0800#define _tfindfirst64 _findfirst64#define _tfindfirst32 _findfirst32#define _tfindnext64 _findnext64#define _tfindnext32 _findnext32#endif /* __MSVCRT_VERSION__ > 0x0800 */#define _tmktemp _mktemp#define _topen _open#define _taccess _access#define _tremove remove#define _trename rename#define _tsopen _sopen#define _tsetlocale setlocale#define _tunlink _unlink#define _tfinddata_t _finddata_t#define _tchdir _chdir#define _tgetcwd _getcwd#define _tgetdcwd _getdcwd#define _tmkdir _mkdir#define _trmdir _rmdir#define _tstat _stat#if 1 /* defined __MSVCRT__ *//* Not in crtdll.dll. Define macros anyway? */#define _ttoi64 _atoi64#define _i64tot _i64toa#define _ui64tot _ui64toa#define _tcsnccoll _strncoll#define _tcsncoll _strncoll#define _tcsncicoll _strnicoll#define _tcsnicoll _strnicoll#define _tfindfirsti64 _findfirsti64#define _tfindnexti64 _findnexti64#define _tfinddatai64_t _finddatai64_t#if __MSVCRT_VERSION__ >= 0x0601#define _tfinddata64_t _finddata64_t#endif#if __MSVCRT_VERSION__ >= 0x0800#define _tfinddata32_t _finddata32_t#define _tfinddata32i64_t _finddata32i64_t#define _tfinddata64i32_t _finddata64i32_t#define _tfindfirst32i64 _findfirst32i64#define _tfindfirst64i32 _findfirst64i32#define _tfindnext32i64 _findnext32i64#define _tfindnext64i32 _findnext64i32#endif /* __MSVCRT_VERSION__ > 0x0800 */#define _tstati64 _stati64#define _tstat64 _stat64#if __MSVCRT_VERSION__ >= 0x0800#define _tstat32 _stat32#define _tstat32i64 _stat32i64#define _tstat64i32 _stat64i32#endif /* __MSVCRT_VERSION__ > 0x0800 */#endif /* __MSVCRT__ *//* dirent structures and functions */#define _tdirent dirent#define _TDIR DIR#define _topendir opendir#define _tclosedir closedir#define _treaddir readdir#define _trewinddir rewinddir#define _ttelldir telldir#define _tseekdir seekdir#endif /* Not _UNICODE *//* * UNICODE a constant string when _UNICODE is defined else returns the string * unmodified. Also defined in w32api/winnt.h. */#define _TEXT(x) __TEXT(x)#define _T(x) __TEXT(x)#endif /* Not _TCHAR_H_ */#endif // #ifdef __AUTO_TCHAR_H_USESYS#endif // #ifndef __AUTO_TCHAR_H_INCLUDED

3.2 prichar.h

　　全部代码——

prichar.h

/*prichar.h : 字符串的格式控制字符.Author: zyl910Blog: http://www.cnblogs.com/zyl910URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.htmlVersion: V1.00Updata: 2013-01-17测试过的编译器--VC: 6, 2003, 2005, 2008, 2010, 2012.BCB: 6.GCC: 4.7.1(MinGW-w64), 4.7.0(Fedora 17), 4.6.2(MinGW), llvm-gcc-4.2(Mac OS X Lion 10.7.4, Xcode 4.4.1).Update~~~~~~[2013-01-17] V1.00* V1.0发布.Manual~~~~~~参考了C99的“inttypes.h”，为字符类型设计的格式字符串。前缀--PRI: print, 输出.SCN: scan, 输入.中缀--c: char, 字符.s: string, 字符串.后缀--A: char, 窄字符版.W: wchar_t, 宽字符版.T: TCHAR, TCHAR版.*/#ifndef __PRICHAR_H_INCLUDED#define __PRICHAR_H_INCLUDED//#include "tchar.h"#if defined __cplusplusextern "C" {#endif////// char////#if defined(_MSC_VER)||defined(__BORLANDC__)    // VC、BCB 均支持hc/hs总是代表窄字符.    #define PRIcA    "hc"    #define PRIsA    "hs"#elif defined(__GNUC__)||defined(_WIN32)||defined(_WIN64)    // GCC的窄版函数有时无法识别hc/hs, 而宽版函数总是支持hc/hs. 假设其他Windows平台的编译器也是这样.    #if defined(_UNICODE)        #define PRIcA    "hc"        #define PRIsA    "hs"    #else        #define PRIcA    "c"        #define PRIsA    "s"    #endif#else    // 假定其他平台只支持c/s.    #define PRIcA    "c"    #define PRIsA    "s"#endif////// wchar_t////// C99标准规定lc/ls总是代表宽字符.#define PRIcW    "lc"#define PRIsW    "ls"////// TCHAR////#if defined(_WIN32)||defined(_WIN64)||defined(_MSC_VER)    // VC、BCB、MinGW等Windows平台上的编译器支持c为自适应, 对于窄字符函数是char, 对于宽字符函数是wchar_t.    #define PRIcT    "c"    #define PRIsT    "s"#else    // 其他平台.    #if defined(_UNICODE)        #define PRIcT    PRIcW        #define PRIsT    PRIsW    #else        #define PRIcT    PRIcA        #define PRIsT    PRIsA    #endif#endif////// SCN////#define SCNcA    PRIcA#define SCNsA    PRIsA#define SCNcW    PRIcW#define SCNsW    PRIsW#define SCNcT    PRIcT#define SCNsT    PRIsT#if defined __cplusplus};#endif#endif    // #ifndef __PRICHAR_H_INCLUDED

3.3 auto_tmain.h

　　全部代码——

auto_tmain.h

/*auto_tmain.h : 使各种编译器兼容_tmain .Author: zyl910Blog: http://www.cnblogs.com/zyl910URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.htmlVersion: V1.00Updata: 2013-01-17Update~~~~~~[2013-01-17] V1.00* V1.0发布.Manual~~~~~~智能地使_tmain可用.只需在主源文件中加上一行——#include "auto_tmain.h"兼容 VC、GCC、BCB。参考了 https://github.com/coderforlife/mingw-unicode-main/blob/master/mingw-unicode.c*/#ifndef __AUTO_TMAIN_H_INCLUDED#define __AUTO_TMAIN_H_INCLUDED#if defined(__GNUC__) && defined(_UNICODE)#ifndef __MSVCRT__#error Unicode main function requires linking to MSVCRT#endif#include 
      
       #include 
       
        #include "tchar.h"#undef _tmain#ifdef _UNICODE#define _tmain wmain#else#define _tmain main#endifextern int _CRT_glob;extern #ifdef __cplusplus"C" #endifvoid __wgetmainargs(int*,wchar_t***,wchar_t***,int,int*);#ifdef MAIN_USE_ENVPint wmain(int argc, wchar_t *argv[], wchar_t *envp[]);#elseint wmain(int argc, wchar_t *argv[]);#endifint main(void){    wchar_t **enpv, **argv;    int argc=0, si = 0;    __wgetmainargs(&argc, &argv, &enpv, _CRT_glob, &si); // this also creates the global variable __wargv#ifdef MAIN_USE_ENVP    return wmain(argc, argv, enpv);#else    return wmain(argc, argv);#endif    // #ifdef MAIN_USE_ENVP}#endif    // #if defined(__GNUC__) && defined(_UNICODE)#endif    // #ifndef __AUTO_TMAIN_H_INCLUDED

四、UTF-8编码下的测试

4.1 说明

　　为了保证代码的可移植性，推荐使用UTF-8编码来保存代码文件。

　　因现在Linux等类UNIX平台默认使用UTF-8编码，gcc等编译器也是默认使用UTF-8编码。而且它们既支持“不带BOM的UTF-8”（byte order mark，字节序标记），又支持“带BOM的UTF-8”。

　　VC++ 2003（或更高）开始支持“带BOM的UTF-8”编码的代码文件。但不支持“不带BOM的UTF-8”编码的代码文件，会被误认为系统默认编码（如简体中文平台上会误认为GBK编码）。

　　为了保证代码文件能兼容更多的编译器，我建议这样做——

1. 对于源文件（c、cpp），使用“带BOM的UTF-8”编码，这样能保证VC++、gcc等编译器均能正确编译。如果你确定程序中的字符串常量均在ASCII码范围内，也可尝试“不带BOM的UTF-8”编码。

2. 对于头文件（h、hpp），使用“不带BOM的UTF-8”编码。因为头文件会在预处理阶段包含到源代码中，多余的BOM字符可能会造成编译失败。

　　在VC++中，若想改变代码文件的编码，便点击菜单“文件”->“高级保存选项”，然后在“编码”复选框中选择所需编码，再点击“确定”。

4.2 测试代码

　　文件清单——

auto_tchar.h

auto_tmain.h

makefile

prichar.h

Release

tcharall.c

tcharall_2003.sln

tcharall_2003.vcproj

tcharall_2005.sln

tcharall_2005.vcproj

tcharall_2008.sln

tcharall_2008.vcproj

tcharall_2010.sln

tcharall_2010.vcxproj

tcharall_2010.vcxproj.filters

tcharall_2010.vcxproj.user

tcharall_2012.sln

tcharall_2012.vcxproj

tcharall_2012.vcxproj.filters

　　其中tcharall.c使用“带BOM的UTF-8”编码，而3个头文件使用“不带BOM的UTF-8”编码。

　　tcharall.c——

/*tcharall.c : 测试各种编译器使用tchar（UTF-8编码）.Author: zyl910Blog: http://www.cnblogs.com/zyl910URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.htmlVersion: V1.00Updata: 2013-01-17Update~~~~~~[2013-01-17] V1.00* V1.0发布.[2012-11-08] V0.01* 初步完成.*/#include 
     
      #include 
      
       #include 
       
        #include "auto_tchar.h"#include "prichar.h"#include "auto_tmain.h"// Compiler name#define MACTOSTR(x)    #x#define MACROVALUESTR(x)    MACTOSTR(x)#if defined(__ICL)    // Intel C++#  if defined(__VERSION__)#    define COMPILER_NAME    "Intel C++ " __VERSION__#  elif defined(__INTEL_COMPILER_BUILD_DATE)#    define COMPILER_NAME    "Intel C++ (" MACROVALUESTR(__INTEL_COMPILER_BUILD_DATE) ")"#  else#    define COMPILER_NAME    "Intel C++"#  endif    // #  if defined(__VERSION__)#elif defined(_MSC_VER)    // Microsoft VC++#  if defined(_MSC_FULL_VER)#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_FULL_VER) ")"#  elif defined(_MSC_VER)#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_VER) ")"#  else#    define COMPILER_NAME    "Microsoft VC++"#  endif    // #  if defined(_MSC_FULL_VER)#elif defined(__GNUC__)    // GCC#  if defined(__CYGWIN__)#    define COMPILER_NAME    "GCC(Cygmin) " __VERSION__#  elif defined(__MINGW32__)#    define COMPILER_NAME    "GCC(MinGW) " __VERSION__#  else#    define COMPILER_NAME    "GCC " __VERSION__#  endif    // #  if defined(__CYGWIN__)#elif defined(__TURBOC__)    // Borland C++#  if defined(__BCPLUSPLUS__)#    define COMPILER_NAME    "Borland C++ (" MACROVALUESTR(__BCPLUSPLUS__) ")"#  elif defined(__BORLANDC__)#    define COMPILER_NAME    "Borland C (" MACROVALUESTR(__BORLANDC__) ")"#  else#    define COMPILER_NAME    "Turbo C (" MACROVALUESTR(__TURBOC__) ")"#  endif    // #  if defined(_MSC_FULL_VER)#else#  define COMPILER_NAME    "Unknown Compiler"#endif    // #if defined(__ICL)    // Intel C++char* psa = "A汉字ABC_Welcome_歡迎_ようこそ_환영.";    // 后半段分别包含了 繁体中文、日文、韩文的“欢迎”.wchar_t* psw = L"W汉字ABC_Welcome_歡迎_ようこそ_환영.";TCHAR* pst = _T("T汉字ABC_Welcome_歡迎_ようこそ_환영.");int _tmain(int argc, TCHAR* argv[]){    // init.    setlocale(LC_ALL, "");    // 使用客户环境的缺省locale.    // title.    _tprintf(_T("tcharall v1.00 (%dbit)\n"), (int)(8*sizeof(int*)));    _tprintf(_T("Compiler: %")_T(PRIsA)_T("\n"), COMPILER_NAME);    _tprintf(_T("\n"));    // show    _tprintf(_T("%")_T(PRIsA)_T("\n"), psa);    // 输出窄字符串.    _tprintf(_T("%")_T(PRIsW)_T("\n"), psw);    // 输出宽字符串.    _tprintf(_T("%")_T(PRIsT)_T("\n"), pst);    // 输出TCHAR字符串.        return 0;}

　　makefile——

# flagsCC = gccCFS = -Wall# argsRELEASE =0UNICODE =0BITS =CFLAGS =# [args] 生成模式. 0代表debug模式, 1代表release模式. make RELEASE=1.ifeq ($(RELEASE),0)    # debug    CFS += -gelse    # release    CFS += -O3 -DNDEBUG    //CFS += -O3 -g -DNDEBUGendif# [args] UNICODE模式. 0代表ansi模式, 1代表unicode模式. make UNICODE=1.ifeq ($(UNICODE),0)    # ansi    CFS +=else    # unicode    CFS += -D_UNICODE -DUNICODEendif# [args] 程序位数. 32代表32位程序, 64代表64位程序, 其他默认. make BITS=32.ifeq ($(BITS),32)    CFS += -m32else    ifeq ($(BITS),64)        CFS += -m64    else    endifendif# [args] 使用 CFLAGS 添加新的参数. make CFLAGS="-mavx".CFS += $(CFLAGS).PHONY : all clean# filesTARGETS = tcharallOBJS = tcharall.oall : $(TARGETS)tcharall : $(OBJS)    $(CC) -o $@ $^ $(CFS)tcharall.o : tcharall.c    $(CC) -c $< $(CFS)clean :    rm -f $(OBJS) $(TARGETS) $(addsuffix .exe,$(TARGETS))

4.3 测试结果

　　在以下编译器中成功编译——

VC2003：x86版。Unicode=0。

VC2005：x86版、x64版。Unicode=1。

VC2008：x86版。Unicode=1。

VC2010：x86版、x64版。Unicode=1。

VC2012：x86版、x64版。Unicode=1。

GCC 4.6.2（MinGW(20120426)）：x86版。Unicode=0、Unicode=1。

GCC 4.7.1（TDM-GCC(MinGW-w64)）：x64版。Unicode=0、Unicode=1。

GCC 4.7.0（Fedora 17 x64）：x86版、x64版。Unicode=0。

llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1）：x86版、x64版。Unicode=0。

　　测试结果——

【VC2003，Unicode=0】tcharall v1.00 (32bit)Compiler: Microsoft VC++ (13106030)A姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.W汉字ABC_Welcome_歡迎_ようこそ_T姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.【VC2005，Unicode=1】tcharall v1.00 (32bit)Compiler: Microsoft VC++ (140050727)A汉字ABC_Welcome_歡迎_ようこそ_??.W汉字ABC_Welcome_歡迎_ようこそ_??.T汉字ABC_Welcome_歡迎_ようこそ_??.【VC2008，Unicode=1】tcharall v1.00 (64bit)Compiler: Microsoft VC++ (160040219)A汉字ABC_Welcome_歡迎_ようこそ_??.W汉字ABC_Welcome_歡迎_ようこそ_??.T汉字ABC_Welcome_歡迎_ようこそ_??.【VC2010，Unicode=1】tcharall v1.00 (64bit)Compiler: Microsoft VC++ (160040219)A汉字ABC_Welcome_歡迎_ようこそ_??.W汉字ABC_Welcome_歡迎_ようこそ_??.T汉字ABC_Welcome_歡迎_ようこそ_??.【VC2012，Unicode=1】tcharall v1.00 (64bit)Compiler: Microsoft VC++ (170051106)A汉字ABC_Welcome_歡迎_ようこそ_??.W汉字ABC_Welcome_歡迎_ようこそ_??.T汉字ABC_Welcome_歡迎_ようこそ_??.【GCC 4.6.2（MinGW (20120426)），Unicode=0】tcharall v1.00 (32bit)Compiler: GCC(MinGW) 4.6.2A姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.W汉字ABC_Welcome_歡迎_ようこそ_T姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.【GCC 4.6.2（MinGW (20120426)），Unicode=1】tcharall v1.00 (32bit)Compiler: GCC(MinGW) 4.6.2A姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.W汉字ABC_Welcome_歡迎_ようこそ_    T汉字ABC_Welcome_歡迎_ようこそ_【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=0】tcharall v1.00 (64bit)Compiler: GCC(MinGW) 4.7.1A姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.W汉字ABC_Welcome_歡迎_ようこそ_T姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=1】tcharall v1.00 (64bit)Compiler: GCC(MinGW) 4.7.1A姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.W汉字ABC_Welcome_歡迎_ようこそ_.T汉字ABC_Welcome_歡迎_ようこそ_.【GCC 4.7.0（Fedora 17 x64），Unicode=0】tcharall v1.00 (64bit)Compiler: GCC 4.7.0 20120507 (Red Hat 4.7.0-5)A汉字ABC_Welcome_歡迎_ようこそ_환영.W汉字ABC_Welcome_歡迎_ようこそ_환영.T汉字ABC_Welcome_歡迎_ようこそ_환영.【llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1），Unicode=0】tcharall v1.00 (64bit)Compiler: GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)A汉字ABC_Welcome_歡迎_ようこそ_환영.W汉字ABC_Welcome_歡迎_ようこそ_환영.T汉字ABC_Welcome_歡迎_ようこそ_환영.

4.4 测试结果分析

　　VC2003不支持执行字符集（execution character set）转换，对于窄字符串常量，它直接使用源文件中的UTF-8编码的字符串常量，而现在系统默认字符集是GBK（简体中文系统），导致出现“A姹夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.”这样的乱码。

　　从VC2005开始支持执行字符集转换，对于窄字符串常量，它会将源文件中的UTF-8编码字符串，转成执行字符集（简体中文系统下是GBK）的字符串常量。于是能正常显示包含中文的窄字符串。

　　由于简体中文Windows平台默认使用GBK编码，韩文“환영”不能转为GBK编码，于是输出“??”。

　　MinGW和MinGW-w64也是存在窄字符串乱码问题，这是因为它的执行字符集默认为UTF-8编码，该问题将在下一节详细讨论。

　　现在主要关注宽字符串的输出。MinGW和MinGW-w64对于能转码为窄字符串的（“W汉字ABC_Welcome_歡迎_ようこそ_”能转为GBK编码），能正常输出；但对于不能转码为窄字符串的（韩文“환영”不能转为GBK编码），会停止输出，这时MinGW与MinGW-w64存在细微差别——

a) 当编译为窄字符版时（不定义UNICODE宏，使用printf等窄字符版函数）：MinGW会停止输出，但能正确换行。而MinGW-w64不仅会停止输出，而且不能正确换行。

b) 当编译为宽字符版时（定义UNICODE宏，使用wprintf等宽字符版函数）：MinGW不仅会停止输出，而且不能正确换行。而MinGW-w64会停止输出，但能正确换行。

　　Linux和Mac平台默认使用UTF-8编码，所以能同时显示英文、中文、日文、韩文，完美的显示了多国语言。具体细节——

a) 对于窄字符串常量。因为现在的代码文件是以UTF-8编码保存的，所以窄字符串常量也是UTF-8编码。程序运行输出窄字符串时，终端也是UTF-8编码，编码匹配正常输出。

b) 对于宽字符串常量。编译器将UTF-8编码变为UTF-32编码，生成宽字符串常量。程序运行输出宽字符串时，因终端是UTF-8编码，C标准库将“UTF-32的宽字符串”转为“UTF-8编码的窄字符串”再输出，编码匹配正常输出。

　　小结——

1. Linux和Mac等Linux平台默认使用UTF-8编码，能在终端中完美显示多国语言。

2. Windows平台的控制台程序默认使用本地编码（简体中文系统下是GBK），所以只能显示本地编码范围内的文字。对于范围外的文字，VC++的库函数选择输出“?”号，而MinGW的库函数选择停止输出。

4.5 解决MinGW窄字符串乱码问题

　　前面提到MinGW和MinGW-w64的执行字符集默认为UTF-8编码，而Windows下系统默认字符集是GBK（简体中文系统），造成输出窄字符串时乱码。

　　该问题有两种解决办法——

1. 修改命令提示符的编码为UTF-8。

2. 让MinGW生成GBK编码的窄字符串。

4.5.1 修改命令提示符的编码为UTF-8

　　打开命令提示符，执行以下命令——

chcp 65001

注：chcp命令用于改变命令提示符的代码页。65001是UTF-8的代码页。

　　设置好编码后，还需要设置字体，这样才能正确显示文字。

　　于是在命令提示符的标题栏上点击鼠标右键，选择快捷菜单中的“属性”，打开命令提示符属性对话框。

　　切换到“字体”页面，选择“Lucida Console”字体。然后点击“确定”保存配置。若会出现对话框，继续点“确定”。

　　自此便设置好了UTF-8编码的命令提示符环境，我们可以运行先前MinGW编译好的可执行文件，测试结果——

tcharall v1.00 (32bit)Compiler: GCC(MinGW) 4.6.2A汉字ABC_Welcome_歡迎_ようこそ_환영.WººؖABC_Welcome_gӭ_¤褦¤³¤½_T汉字ABC_Welcome_歡迎_ようこそ_환영.

　　可见，窄字符串成功输出全部的字符。只是“Lucida Console”字体不支持韩文而显示为方框。

　　但意外的是，宽字符串却变成了乱码。这时因为C函数库仍将宽字符转为GBK编码的窄字符串，而现在实际上是使用UTF-8编码的窄字符串，造成乱码。深入分析见下一小节。

　　测试完成后，我们应该输入“chcp 936”命令，将命令提示符的代码页改回gbk编码。

4.5.1.1 UTF-8命令提示符乱码问题的深入分析

　　当使用chcp命令改变命令提示符的代码页时，它会调用SetConsoleCP、SetConsoleOutputCP这两个Windows API分别设置命令提示符输入、输出的代码页（65001：UTF-8）。

　　但是，活动代码页（Active Codepage，ACP）并没有发生编码，GetACP的返回值仍是原值（936：简体中文GBK）。

　　当输出宽字符串时，C库函数会将宽字符串转为窄字符串。因为现在调用了“setlocale(LC_ALL, "")”使用客户环境的缺省locale，C库函数会调用WideCharToMultiByte这个Windows API进行编码转换，代码页用的是CP_ACP，即使用GetACP的返回值（936：简体中文GBK）做代码页。于是将宽字符串转为了GBK编码的窄字符串。

　　但是现在命令提示符输入、输出用的是UTF-8编码（GetConsoleCP、GetConsoleOutputCP的返回值是65001）。编码不匹配，造成乱码。

4.5.2 让MinGW生成GBK编码的窄字符串

　　给gcc加上“-fexec-charset=<charset>”参数，能够设置执行字符集。

　　简体中文系统下默认是GBK编码，应该使用“-fexec-charset=GBK”参数。

　　但在实际编译时，gcc报告编译错误——

gcc -c tcharall.c -Wall -g -fexec-charset=GBK

tcharall.c:74:13: error: converting to execution character set: Illegal byte sequence

tcharall.c:76:65: error: converting to execution character set: Illegal byte sequence

make: *** [tcharall.o] Error 1

　　这是因为源码中含有韩文字符，它不在GBK编码范围内，gcc无法转换编码。这时得找一个编码范围更大的编码了。

　　简单介绍一下汉字编码标准——

GB2312：这是最早的国标汉字标准，采用双字节编码，收录了6763个简体汉字。

GB13000.1：此标准等同国际标准ISO/IEC 10646.1:1993《信息技术通用多八位编码字符集（UCS）第一部分：体系结构与基本多文种平面》中的CJK（中日韩统一汉字）子集。该标准专注于汉字的收录，共包含了20902个汉字（简体、繁体、日本、朝鲜常用汉字的统一收录）。

GBK：它是对GB13000.1标准的具体编码实现。它向下兼容GB2312编码，仍是采用双字节编码，但扩大了编码空间，以存放2万多汉字。简体中文Windows系统使用的就是GBK编码，所以能同时使用简体汉字与繁体汉字。

GB18030：这是最新的汉字编码标准。它向下兼容GBK、GB2312编码，除了传统的双字节编码外，还增加四字节编码方案，将编码空间扩展了260万。它又收录了CJK扩充A区、CJK扩充B区等汉字，目前共收录了70244个汉字。它不仅收录了汉字，而且还映射了Unicode中的非汉字字符，例如支持韩文字符。

　　所以我们可以使用GB18030编码，给gcc加上“-fexec-charset=GB18030”参数。

　　测试结果——

　　因简体中文系统下默认是GBK编码，GB18030的四字节编码不能正常显示，变成了“?”号。

　　一般情况下不会超过GBK编码范围，所以该办法是有效的。

五、GBK编码下的测试

5.1 说明

　　某些旧编译器不支持UTF-8编码，这时只能用本地的默认编码了。因我用的是简体中文版的Windows，所以代码文件的默认编码是GBK。

　　当代码文件不是UTF-8时，为了避免乱码，需要正确的配置输入字符集与执行字符集——

输入字符集（input character set）：编译器使用何种编码将源文件中的内容转为Unicode。vc（vc2005或更高版本）根据BOM标记识别输入字符集，若没有BOM标记，就使用本地编码（936：GBK）。gcc默认是UTF-8，使用“-finput-charset=<charset>”参数进行配置。

执行字符集（execution character set）：编译器使用何种编码将Unicode字符串转为窄字符串。vc默认使用本地编码（936：GBK），vc2010（或更高版本）可在源代码中写上“#pragma execution_character_set("utf-8")”进行配置。gcc默认是UTF-8，使用“-fexec-charset=<charset>”参数进行配置。

　　对于VC++，只需将代码文件保存为本地默认编码就行了。这正是VC++保存代码文件时的默认行为。若编码不符，可点击菜单“文件”->“高级保存选项”改变编码。

　　对于gcc，因它的输入字符集、执行字符集都是UTF-8，所以都要设置。即给gcc加上“-finput-charset=gbk -fexec-charset=gbk”参数。

　　注意源文件与头文件都要统一使用同一种编码，否则可能会因编码不一致而无法编译。例如gcc会报告以下错误——

tcharall_gbk.c:22:19: error: failure to convert gbk to UTF-8

tcharall_gbk.c:24:24: error: failure to convert gbk to UTF-8

tcharall_gbk.c:62:1: error: unknown type name 'TCHAR'

　　当使用“\u”转义符时，建议给gcc加上“-std=c99”参数，否则会出现以下警告——

tcharall_gbk.c:61:16: warning: universal character names are only valid in C++ and C99 [enabled by default]

5.2 测试代码

　　文件清单——

auto_tchar.h

auto_tmain.h

makefile

prichar.h

tcharall_gbk.c

tcharall_gbk.dsp

tcharall_gbk.dsw

tcharall_gbk_2003.sln

tcharall_gbk_2003.vcproj

tcharall_gbk_2005.sln

tcharall_gbk_2005.vcproj

tcharall_gbk_bcb6.bpf

tcharall_gbk_bcb6.bpr

tcharall_gbk_bcb6.res

　　其中tcharall_gbk.c和3个头文件使用GBK编码。

　　tcharall_gbk.c（因GBK不支持韩文字符，字符串常量稍有改动）——

/*tcharall.c : 测试各种编译器使用tchar（GBK编码）.Author: zyl910Blog: http://www.cnblogs.com/zyl910URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.htmlVersion: V1.00Updata: 2013-01-17Update~~~~~~[2013-01-17] V1.00* V1.0发布.[2012-11-08] V0.01* 初步完成.*/#include 
     
      #include 
      
       #include 
       
        #include "auto_tchar.h"#include "prichar.h"#include "auto_tmain.h"// Compiler name#define MACTOSTR(x)    #x#define MACROVALUESTR(x)    MACTOSTR(x)#if defined(__ICL)    // Intel C++#  if defined(__VERSION__)#    define COMPILER_NAME    "Intel C++ " __VERSION__#  elif defined(__INTEL_COMPILER_BUILD_DATE)#    define COMPILER_NAME    "Intel C++ (" MACROVALUESTR(__INTEL_COMPILER_BUILD_DATE) ")"#  else#    define COMPILER_NAME    "Intel C++"#  endif    // #  if defined(__VERSION__)#elif defined(_MSC_VER)    // Microsoft VC++#  if defined(_MSC_FULL_VER)#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_FULL_VER) ")"#  elif defined(_MSC_VER)#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_VER) ")"#  else#    define COMPILER_NAME    "Microsoft VC++"#  endif    // #  if defined(_MSC_FULL_VER)#elif defined(__GNUC__)    // GCC#  if defined(__CYGWIN__)#    define COMPILER_NAME    "GCC(Cygmin) " __VERSION__#  elif defined(__MINGW32__)#    define COMPILER_NAME    "GCC(MinGW) " __VERSION__#  else#    define COMPILER_NAME    "GCC " __VERSION__#  endif    // #  if defined(__CYGWIN__)#elif defined(__TURBOC__)    // Borland C++#  if defined(__BCPLUSPLUS__)#    define COMPILER_NAME    "Borland C++ (" MACROVALUESTR(__BCPLUSPLUS__) ")"#  elif defined(__BORLANDC__)#    define COMPILER_NAME    "Borland C (" MACROVALUESTR(__BORLANDC__) ")"#  else#    define COMPILER_NAME    "Turbo C (" MACROVALUESTR(__TURBOC__) ")"#  endif    // #  if defined(_MSC_FULL_VER)#else#  define COMPILER_NAME    "Unknown Compiler"#endif    // #if defined(__ICL)    // Intel C++char* psa = "A汉字ABC_Welcome_歡迎_ようこそ.";wchar_t* psw = L"W汉字ABC_Welcome_歡迎_ようこそ_\uD658\uC601.";    // \uD658\uC601是韩文欢迎.TCHAR* pst = _T("T汉字ABC_Welcome_歡迎_ようこそ.");int _tmain(int argc, TCHAR* argv[]){    // init.    setlocale(LC_ALL, "");    // 使用客户环境的缺省locale.    _tprintf(_T("tcharall_gbk v1.00 (%dbit)\n"), (int)(8*sizeof(int*)));    _tprintf(_T("Compiler: %")_T(PRIsA)_T("\n"), COMPILER_NAME);    _tprintf(_T("\n"));    // show    _tprintf(_T("%")_T(PRIsA)_T("\n"), psa);    // 输出窄字符串.    _tprintf(_T("%")_T(PRIsW)_T("\n"), psw);    // 输出宽字符串.    _tprintf(_T("%")_T(PRIsT)_T("\n"), pst);    // 输出TCHAR字符串.        return 0;}

　　makefile——

# flagsCC = gccCFS = -Wall -std=c99 -finput-charset=gbk -fexec-charset=gbk# argsRELEASE =0UNICODE =0BITS =CFLAGS =# [args] 生成模式. 0代表debug模式, 1代表release模式. make RELEASE=1.ifeq ($(RELEASE),0)    # debug    CFS += -gelse    # release    CFS += -static -O3 -DNDEBUG    //CFS += -O3 -g -DNDEBUGendif# [args] UNICODE模式. 0代表ansi模式, 1代表unicode模式. make UNICODE=1.ifeq ($(UNICODE),0)    # ansi    CFS +=else    # unicode    CFS += -D_UNICODE -DUNICODEendif# [args] 程序位数. 32代表32位程序, 64代表64位程序, 其他默认. make BITS=32.ifeq ($(BITS),32)    CFS += -m32else    ifeq ($(BITS),64)        CFS += -m64    else    endifendif# [args] 使用 CFLAGS 添加新的参数. make CFLAGS="-mavx".CFS += $(CFLAGS).PHONY : all clean# filesTARGETS = tcharall_gbkOBJS = tcharall_gbk.oall : $(TARGETS)tcharall_gbk : $(OBJS)    $(CC) -o $@ $^ $(CFS)tcharall_gbk.o : tcharall_gbk.c    $(CC) -c $< $(CFS)clean :    rm -f $(OBJS) $(TARGETS) $(addsuffix .exe,$(TARGETS))

5.3 测试结果

　　在以下编译器中成功编译——

VC6：x86版。Unicode=0。

VC2003：x86版。Unicode=0。

VC2005：x86版、x64版。Unicode=1。

BCB6：x86版。Unicode=0。

GCC 4.6.2（MinGW(20120426)）：x86版。Unicode=0、Unicode=1。

GCC 4.7.1（TDM-GCC(MinGW-w64)）：x86版、x64版。Unicode=0、Unicode=1。

GCC 4.7.0（Fedora 17 x64）：x64版。Unicode=0。

llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1）：x64版。Unicode=0。

　　测试结果——

【VC6，Unicode=0】tcharall v1.00 (32bit)Compiler: Microsoft VC++ (12008804)A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_uD658uC601.T汉字ABC_Welcome_歡迎_ようこそ.【VC2003，Unicode=0】tcharall v1.00 (32bit)Compiler: Microsoft VC++ (13106030)A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_T汉字ABC_Welcome_歡迎_ようこそ.【VC2005，Unicode=1】tcharall_gbk v1.00 (32bit)Compiler: Microsoft VC++ (140050727)A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_??.T汉字ABC_Welcome_歡迎_ようこそ.【BCB6，Unicode=0】tcharall_gbk v1.00 (32bit)Compiler: Borland C (0x0564)A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_T汉字ABC_Welcome_歡迎_ようこそ.【GCC 4.6.2（MinGW (20120426)），Unicode=0】tcharall_gbk v1.00 (32bit)Compiler: GCC(MinGW) 4.6.2A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_T汉字ABC_Welcome_歡迎_ようこそ.【GCC 4.6.2（MinGW (20120426)），Unicode=1】tcharall_gbk v1.00 (32bit)Compiler: GCC(MinGW) 4.6.2A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_T汉字ABC_Welcome_歡迎_ようこそ.【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=0】tcharall_gbk v1.00 (64bit)Compiler: GCC(MinGW) 4.7.1A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_T汉字ABC_Welcome_歡迎_ようこそ.【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=1】tcharall_gbk v1.00 (64bit)Compiler: GCC(MinGW) 4.7.1A汉字ABC_Welcome_歡迎_ようこそ.W汉字ABC_Welcome_歡迎_ようこそ_.T汉字ABC_Welcome_歡迎_ようこそ.【GCC 4.7.0（Fedora 17 x64），Unicode=0】tcharall_gbk v1.00 (64bit)Compiler: GCC 4.7.0 20120507 (Red Hat 4.7.0-5)A����ABC_Welcome_�gӭ_�褦����.W汉字ABC_Welcome_歡迎_ようこそ_환영.T����ABC_Welcome_�gӭ_�褦����.【llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1），Unicode=0】tcharall_gbk v1.00 (64bit)Compiler: GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)A????ABC_Welcome_?gӭ_?褦????.W汉字ABC_Welcome_歡迎_ようこそ_환영.T????ABC_Welcome_?gӭ_?褦????.