mirror of
				https://github.com/python/cpython.git
				synced 2025-10-31 13:41:24 +00:00 
			
		
		
		
	
		
			
	
	
		
			80 lines
		
	
	
	
		
			2.7 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
		
		
			
		
	
	
			80 lines
		
	
	
	
		
			2.7 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
|   | To generate or modify mapping headers | ||
|  | ------------------------------------- | ||
|  | Mapping headers are imported from CJKCodecs as pre-generated form. | ||
|  | If you need to tweak or add something on it, please look at tools/ | ||
|  | subdirectory of CJKCodecs' distribution. | ||
|  | 
 | ||
|  | 
 | ||
|  | 
 | ||
|  | Notes on implmentation characteristics of each codecs | ||
|  | ----------------------------------------------------- | ||
|  | 
 | ||
|  | 1) Big5 codec | ||
|  | 
 | ||
|  |   The big5 codec maps the following characters as cp950 does rather | ||
|  |   than conforming Unicode.org's that maps to 0xFFFD. | ||
|  | 
 | ||
|  |     BIG5        Unicode     Description | ||
|  | 
 | ||
|  |     0xA15A      0x2574      SPACING UNDERSCORE | ||
|  |     0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE | ||
|  |     0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE | ||
|  |     0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT | ||
|  |     0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT | ||
|  |     0xA2CC      0x5341      HANGZHOU NUMERAL TEN | ||
|  |     0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY | ||
|  | 
 | ||
|  |   Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another | ||
|  |   big5 codes already, a roundtrip compatibility is not guaranteed for | ||
|  |   them. | ||
|  | 
 | ||
|  | 
 | ||
|  | 2) cp932 codec | ||
|  | 
 | ||
|  |   To conform to Windows's real mapping, cp932 codec maps the following | ||
|  |   codepoints in addition of the official cp932 mapping. | ||
|  | 
 | ||
|  |     CP932     Unicode     Description | ||
|  | 
 | ||
|  |     0x80      0x80        UNDEFINED | ||
|  |     0xA0      0xF8F0      UNDEFINED | ||
|  |     0xFD      0xF8F1      UNDEFINED | ||
|  |     0xFE      0xF8F2      UNDEFINED | ||
|  |     0xFF      0xF8F3      UNDEFINED | ||
|  | 
 | ||
|  | 
 | ||
|  | 3) euc-jisx0213 codec | ||
|  | 
 | ||
|  |   The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into | ||
|  |   unicode U+FF3C instead of U+005C as on unicode.org's mapping. | ||
|  |   Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140 | ||
|  |   is shown as a full width character, mapping to U+FF3C can make | ||
|  |   more sense. | ||
|  | 
 | ||
|  |   The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on | ||
|  |   codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have | ||
|  |   overlapped by each other, it doesn't bother standard conformations | ||
|  |   (and JIS X 0213 Plane 2 is intended to use so.) On encoding | ||
|  |   sessions, the codec will try to encode kanji characters in this | ||
|  |   order: | ||
|  | 
 | ||
|  |     JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212 | ||
|  | 
 | ||
|  | 
 | ||
|  | 4) euc-jp codec | ||
|  | 
 | ||
|  |   The euc-jp codec is a compatibility instance on these points: | ||
|  |    - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa) | ||
|  |    - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way) | ||
|  |    - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way) | ||
|  | 
 | ||
|  | 
 | ||
|  | 5) shift-jis codec | ||
|  | 
 | ||
|  |   The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly | ||
|  |   instead of using JIS X 0201 for compatibility. The differences are: | ||
|  |    - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c. | ||
|  |    - U+007E TILDE is mapped to SHIFT-JIS 0x7e. | ||
|  |    - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f. | ||
|  | 
 |