mirror of
				https://github.com/python/cpython.git
				synced 2025-11-04 07:31:38 +00:00 
			
		
		
		
	
		
			
	
	
		
			80 lines
		
	
	
	
		
			2.7 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
		
		
			
		
	
	
			80 lines
		
	
	
	
		
			2.7 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| 
								 | 
							
								To generate or modify mapping headers
							 | 
						||
| 
								 | 
							
								-------------------------------------
							 | 
						||
| 
								 | 
							
								Mapping headers are imported from CJKCodecs as pre-generated form.
							 | 
						||
| 
								 | 
							
								If you need to tweak or add something on it, please look at tools/
							 | 
						||
| 
								 | 
							
								subdirectory of CJKCodecs' distribution.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Notes on implmentation characteristics of each codecs
							 | 
						||
| 
								 | 
							
								-----------------------------------------------------
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								1) Big5 codec
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  The big5 codec maps the following characters as cp950 does rather
							 | 
						||
| 
								 | 
							
								  than conforming Unicode.org's that maps to 0xFFFD.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    BIG5        Unicode     Description
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    0xA15A      0x2574      SPACING UNDERSCORE
							 | 
						||
| 
								 | 
							
								    0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
							 | 
						||
| 
								 | 
							
								    0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
							 | 
						||
| 
								 | 
							
								    0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
							 | 
						||
| 
								 | 
							
								    0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
							 | 
						||
| 
								 | 
							
								    0xA2CC      0x5341      HANGZHOU NUMERAL TEN
							 | 
						||
| 
								 | 
							
								    0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
							 | 
						||
| 
								 | 
							
								  big5 codes already, a roundtrip compatibility is not guaranteed for
							 | 
						||
| 
								 | 
							
								  them.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								2) cp932 codec
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  To conform to Windows's real mapping, cp932 codec maps the following
							 | 
						||
| 
								 | 
							
								  codepoints in addition of the official cp932 mapping.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    CP932     Unicode     Description
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    0x80      0x80        UNDEFINED
							 | 
						||
| 
								 | 
							
								    0xA0      0xF8F0      UNDEFINED
							 | 
						||
| 
								 | 
							
								    0xFD      0xF8F1      UNDEFINED
							 | 
						||
| 
								 | 
							
								    0xFE      0xF8F2      UNDEFINED
							 | 
						||
| 
								 | 
							
								    0xFF      0xF8F3      UNDEFINED
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								3) euc-jisx0213 codec
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
							 | 
						||
| 
								 | 
							
								  unicode U+FF3C instead of U+005C as on unicode.org's mapping.
							 | 
						||
| 
								 | 
							
								  Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
							 | 
						||
| 
								 | 
							
								  is shown as a full width character, mapping to U+FF3C can make
							 | 
						||
| 
								 | 
							
								  more sense.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
							 | 
						||
| 
								 | 
							
								  codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
							 | 
						||
| 
								 | 
							
								  overlapped by each other, it doesn't bother standard conformations
							 | 
						||
| 
								 | 
							
								  (and JIS X 0213 Plane 2 is intended to use so.) On encoding
							 | 
						||
| 
								 | 
							
								  sessions, the codec will try to encode kanji characters in this
							 | 
						||
| 
								 | 
							
								  order:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								4) euc-jp codec
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  The euc-jp codec is a compatibility instance on these points:
							 | 
						||
| 
								 | 
							
								   - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
							 | 
						||
| 
								 | 
							
								   - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
							 | 
						||
| 
								 | 
							
								   - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								5) shift-jis codec
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
							 | 
						||
| 
								 | 
							
								  instead of using JIS X 0201 for compatibility. The differences are:
							 | 
						||
| 
								 | 
							
								   - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
							 | 
						||
| 
								 | 
							
								   - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
							 | 
						||
| 
								 | 
							
								   - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
							 | 
						||
| 
								 | 
							
								
							 |