Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								6099a03202 
								
							 
						 
						
							
							
								
								Issue  #13624 : Write a specialized UTF-8 encoder to allow more optimization  
							
							... 
							
							
							
							The main bottleneck was the PyUnicode_READ() macro. 
							
						 
						
							2011-12-18 14:22:26 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								73f53b57d1 
								
							 
						 
						
							
							
								
								Optimize str * n for len(str)==1 and UCS-2 or UCS-4  
							
							
							
						 
						
							2011-12-18 03:26:31 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								f644110816 
								
							 
						 
						
							
							
								
								Issue  #13621 : Optimize str.replace(char1, char2)  
							
							... 
							
							
							
							Use findchar() which is more optimized than a dummy loop using
PyUnicode_READ().  PyUnicode_READ() is a complex and slow macro. 
							
						 
						
							2011-12-18 02:43:08 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								ab870218e3 
								
							 
						 
						
							
							
								
								Issue  #10951 : Fix compiler warnings in timemodule.c and unicodeobject.c  
							
							... 
							
							
							
							Thanks Jérémy Anger for the fix. 
							
						 
						
							2011-12-17 22:39:43 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								2f197078fb 
								
							 
						 
						
							
							
								
								The locale decoder raises a UnicodeDecodeError instead of an OSError  
							
							... 
							
							
							
							Search the invalid character using mbrtowc(). 
							
						 
						
							2011-12-17 07:08:30 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								1b57967b96 
								
							 
						 
						
							
							
								
								Issue  #13560 : Locale codec functions use the classic "errors" parameter,  
							
							... 
							
							
							
							instead of surrogateescape
So it would be possible to support more error handlers later. 
							
						 
						
							2011-12-17 05:47:23 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								ab59594326 
								
							 
						 
						
							
							
								
								What's New in Python 3.3: complete the deprecation list  
							
							... 
							
							
							
							Add also FIXMEs in unicodeobject.c 
							
						 
						
							2011-12-17 04:59:06 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								1f33f2b0c3 
								
							 
						 
						
							
							
								
								Issue  #13560 : os.strerror() now uses the current locale encoding instead of UTF-8  
							
							
							
						 
						
							2011-12-17 04:45:09 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								f2ea71fcc8 
								
							 
						 
						
							
							
								
								Issue  #13560 : Add PyUnicode_EncodeLocale()  
							
							... 
							
							
							
							* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
   available
 * Document my last changes in Misc/NEWS 
							
						 
						
							2011-12-17 04:13:41 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								af02e1c85a 
								
							 
						 
						
							
							
								
								Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()  
							
							... 
							
							
							
							* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
   from the current locale encoding
 * _Py_char2wchar() writes an "error code" in the size argument to indicate
   if the function failed because of memory allocation failure or because of a
   decoding error. The function doesn't write the error message directly to
   stderr.
 * Fix time.strftime() (if wcsftime() is missing): decode strftime() result
   from the current locale encoding, not from the filesystem encoding. 
							
						 
						
							2011-12-16 23:56:01 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								16e6a80923 
								
							 
						 
						
							
							
								
								PyUnicode_Resize(): warn about canonical representation  
							
							... 
							
							
							
							Call also directly unicode_resize() in unicodeobject.c 
							
						 
						
							2011-12-12 13:24:15 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								b0a82a6a7f 
								
							 
						 
						
							
							
								
								Fix PyUnicode_Resize() for compact string: leave the string unchanged on error  
							
							... 
							
							
							
							Fix also PyUnicode_Resize() doc 
							
						 
						
							2011-12-12 13:08:33 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								bf6e560d0c 
								
							 
						 
						
							
							
								
								Make PyUnicode_Copy() private => _PyUnicode_Copy()  
							
							... 
							
							
							
							Undocument the function.
Make also decode_utf8_errors() as private (static). 
							
						 
						
							2011-12-12 01:53:47 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								7a9105a380 
								
							 
						 
						
							
							
								
								resize_copy() now supports legacy ready strings  
							
							
							
						 
						
							2011-12-12 00:13:42 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								488fa49acf 
								
							 
						 
						
							
							
								
								Rewrite PyUnicode_Append(); unicode_modifiable() is more strict  
							
							... 
							
							
							
							* Rename unicode_resizable() to unicode_modifiable()
 * Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
   that the function is private
 * Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
   to simplify the code
 * unicode_modifiable() return 0 if the hash has been computed or if the string
   is not an exact unicode string
 * Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
   hash has already been computed, you cannot modify a string inplace anymore
 * PyUnicode_Concat() checks for integer overflow 
							
						 
						
							2011-12-12 00:01:39 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								c4b495497a 
								
							 
						 
						
							
							
								
								Create unicode_result_unchanged() subfunction  
							
							
							
						 
						
							2011-12-11 22:44:26 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								eaab604829 
								
							 
						 
						
							
							
								
								Fix fixup() for unchanged unicode subtype  
							
							... 
							
							
							
							If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u. 
							
						 
						
							2011-12-11 22:22:39 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								e6b2d4407a 
								
							 
						 
						
							
							
								
								unicode_fromascii() doesn't check string content twice in debug mode  
							
							... 
							
							
							
							_PyUnicode_CheckConsistency() also checks string content. 
							
						 
						
							2011-12-11 21:54:30 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								a1d12bb119 
								
							 
						 
						
							
							
								
								Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()  
							
							... 
							
							
							
							* Remove micro-optimization from PyUnicode_FromStringAndSize():
   PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
   and one ascii char).
 * Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
   useless variable 
							
						 
						
							2011-12-11 21:53:09 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								382955ff4e 
								
							 
						 
						
							
							
								
								Use directly unicode_empty instead of PyUnicode_New(0, 0)  
							
							
							
						 
						
							2011-12-11 21:44:00 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								785938eebd 
								
							 
						 
						
							
							
								
								Move the slowest UTF-8 decoder to its own subfunction  
							
							... 
							
							
							
							* Create decode_utf8_errors()
 * Reuse unicode_fromascii()
 * decode_utf8_errors() doesn't refit at the beginning
 * Remove refit_partial_string(), use unicode_adjust_maxchar() instead 
							
						 
						
							2011-12-11 20:09:03 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								84def3774d 
								
							 
						 
						
							
							
								
								Fix error handling in resize_compact()  
							
							
							
						 
						
							2011-12-11 20:04:56 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								8faf8216e4 
								
							 
						 
						
							
							
								
								PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a  
							
							... 
							
							
							
							character in not in range [U+0000; U+10ffff]. 
							
						 
						
							2011-12-08 22:14:11 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								551ac95733 
								
							 
						 
						
							
							
								
								Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros  
							
							... 
							
							
							
							And use surrogates macros everywhere in unicodeobject.c 
							
						 
						
							2011-11-29 22:58:13 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								6345be9a14 
								
							 
						 
						
							
							
								
								Close   #13093 : PyUnicode_EncodeDecimal() doesn't support error handlers  
							
							... 
							
							
							
							different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler. 
							
						 
						
							2011-11-25 20:09:01 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Peterson 
								
							 
						 
						
							
							
							
							
								
							
							
								1518e8713d 
								
							 
						 
						
							
							
								
								and back to the "magic" formula (with a comment) it is  
							
							
							
						 
						
							2011-11-23 10:44:52 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Peterson 
								
							 
						 
						
							
							
							
							
								
							
							
								5944c36931 
								
							 
						 
						
							
							
								
								cave to those who like readable code  
							
							
							
						 
						
							2011-11-22 19:05:49 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Peterson 
								
							 
						 
						
							
							
							
							
								
							
							
								0268675193 
								
							 
						 
						
							
							
								
								fix compiler warning by implementing this more cleverly  
							
							
							
						 
						
							2011-11-22 15:29:32 -05:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								ca4f20782e 
								
							 
						 
						
							
							
								
								find_maxchar_surrogates() reuses surrogate macros  
							
							
							
						 
						
							2011-11-22 03:38:40 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								0d3721d986 
								
							 
						 
						
							
							
								
								Issue  #13441 : Disable temporary the check on the maximum character until  
							
							... 
							
							
							
							the Solaris issue is solved.
But add assertion on the maximum character in various encoders: UTF-7, UTF-8,
wide character (wchar_t*, Py_UNICODE*), unicode-escape, raw-unicode-escape.
Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is
now always "wide". 
							
						 
						
							2011-11-22 03:27:53 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								f8facacf30 
								
							 
						 
						
							
							
								
								Fix compiler warnings  
							
							
							
						 
						
							2011-11-22 02:30:47 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								b84d723509 
								
							 
						 
						
							
							
								
								(Merge 3.2) Issue  #13093 : Fix error handling on PyUnicode_EncodeDecimal()  
							
							
							
						 
						
							2011-11-22 01:50:07 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								cfed46e00a 
								
							 
						 
						
							
							
								
								PyUnicode_FromKindAndData() fails with a ValueError if size < 0  
							
							
							
						 
						
							2011-11-22 01:29:14 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								42885206ec 
								
							 
						 
						
							
							
								
								UTF-8 decoder: set consumed value in the latin1 fast-path  
							
							
							
						 
						
							2011-11-22 01:23:02 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								d3df8ab377 
								
							 
						 
						
							
							
								
								Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready()  
							
							... 
							
							
							
							* unicode_ready() has a simpler API
 * try to reuse unicode_empty and latin1_char singleton everywhere
 * Fix a reference leak in _PyUnicode_TranslateCharmap()
 * PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid
   having to handle a failure 
							
						 
						
							2011-11-22 01:22:34 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								f01245067a 
								
							 
						 
						
							
							
								
								Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API  
							
							
							
						 
						
							2011-11-21 23:12:56 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								2d718f39a5 
								
							 
						 
						
							
							
								
								Remove an unused variable from PyUnicode_Copy()  
							
							
							
						 
						
							2011-11-21 23:11:52 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								87af4f2f3a 
								
							 
						 
						
							
							
								
								Simplify PyUnicode_Copy()  
							
							... 
							
							
							
							USe PyUnicode_Copy() in fixup() 
							
						 
						
							2011-11-21 23:03:47 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								5bbe5e7c85 
								
							 
						 
						
							
							
								
								Fix a compiler warning in _PyUnicode_CheckConsistency()  
							
							
							
						 
						
							2011-11-21 22:54:05 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								42bf77537e 
								
							 
						 
						
							
							
								
								Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API  
							
							... 
							
							
							
							Add tests for PyUnicode_EncodeDecimal() and
PyUnicode_TransformDecimalToASCII(). 
							
						 
						
							2011-11-21 22:52:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Antoine Pitrou 
								
							 
						 
						
							
							
							
							
								
							
							
								0a3229de6b 
								
							 
						 
						
							
							
								
								Issue  #13417 : speed up utf-8 decoding by around 2x for the non-fully-ASCII case.  
							
							... 
							
							
							
							This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass. 
							
						 
						
							2011-11-21 20:39:13 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								da29cc36aa 
								
							 
						 
						
							
							
								
								Issue  #13441 : _PyUnicode_CheckConsistency() dumps the string if the maximum  
							
							... 
							
							
							
							character is bigger than U+10FFFF and locale.localeconv() dumps the string
before decoding it.
Temporary hack to debug the issue #13441 . 
							
						 
						
							2011-11-21 14:31:41 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								9e30aa52fd 
								
							 
						 
						
							
							
								
								Fix misuse of PyUnicode_GET_SIZE() => PyUnicode_GET_LENGTH()  
							
							... 
							
							
							
							And PyUnicode_GetSize() => PyUnicode_GetLength() 
							
						 
						
							2011-11-21 02:49:52 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								4ead7c7be8 
								
							 
						 
						
							
							
								
								PyObject_Str() ensures that the result string is ready  
							
							... 
							
							
							
							and check the string consistency.
_PyUnicode_CheckConsistency() doesn't check the hash anymore. It should be
possible to call this function even if hash(str) was already called. 
							
						 
						
							2011-11-20 19:48:36 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								b960b34577 
								
							 
						 
						
							
							
								
								PyUnicode_AsUTF32String() calls directly _PyUnicode_EncodeUTF32(),  
							
							... 
							
							
							
							instead of calling the deprecated PyUnicode_EncodeUTF32() function 
							
						 
						
							2011-11-20 19:12:52 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								77faf69ca1 
								
							 
						 
						
							
							
								
								_PyUnicode_CheckConsistency() also checks maxchar maximum value,  
							
							... 
							
							
							
							not only its minimum value 
							
						 
						
							2011-11-20 18:56:05 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								d5c4022d2a 
								
							 
						 
						
							
							
								
								Remove the two ugly and unused WRITE_ASCII_OR_WSTR and WRITE_WSTR macros  
							
							
							
						 
						
							2011-11-20 18:41:31 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								2e9cfadd7c 
								
							 
						 
						
							
							
								
								Reuse surrogate macros in UTF-16 decoder  
							
							
							
						 
						
							2011-11-20 18:40:27 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								ae4f7c8e59 
								
							 
						 
						
							
							
								
								charmap_encoding_error() uses the new Unicode API  
							
							
							
						 
						
							2011-11-20 18:28:55 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
							
							
								
							
							
								ac931b1e5b 
								
							 
						 
						
							
							
								
								Use PyUnicode_EncodeCodePage() instead of PyUnicode_EncodeMBCS() with  
							
							... 
							
							
							
							PyUnicode_AsUnicodeAndSize() 
							
						 
						
							2011-11-20 18:27:03 +01:00