mirror of
				https://github.com/python/cpython.git
				synced 2025-11-04 07:31:38 +00:00 
			
		
		
		
	
		
			
	
	
		
			69 lines
		
	
	
	
		
			2.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
		
		
			
		
	
	
			69 lines
		
	
	
	
		
			2.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| 
								 | 
							
								stringbench is a set of performance tests comparing byte string
							 | 
						||
| 
								 | 
							
								operations with unicode operations.  The two string implementations
							 | 
						||
| 
								 | 
							
								are loosely based on each other and sometimes the algorithm for one is
							 | 
						||
| 
								 | 
							
								faster than the other.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								These test set was started at the Need For Speed sprint in Reykjavik
							 | 
						||
| 
								 | 
							
								to identify which string methods could be sped up quickly and to
							 | 
						||
| 
								 | 
							
								identify obvious places for improvement.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Here is an example of a benchmark
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								@bench('"Andrew".startswith("A")', 'startswith single character', 1000)
							 | 
						||
| 
								 | 
							
								def startswith_single(STR):
							 | 
						||
| 
								 | 
							
								    s1 = STR("Andrew")
							 | 
						||
| 
								 | 
							
								    s2 = STR("A")
							 | 
						||
| 
								 | 
							
								    s1_startswith = s1.startswith
							 | 
						||
| 
								 | 
							
								    for x in _RANGE_1000:
							 | 
						||
| 
								 | 
							
								        s1_startswith(s2)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The bench decorator takes three parameters.  The first is a short
							 | 
						||
| 
								 | 
							
								description of how the code works.  In most cases this is Python code
							 | 
						||
| 
								 | 
							
								snippet.  It is not the code which is actually run because the real
							 | 
						||
| 
								 | 
							
								code is hand-optimized to focus on the method being tested.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The second parameter is a group title.  All benchmarks with the same
							 | 
						||
| 
								 | 
							
								group title are listed together.  This lets you compare different
							 | 
						||
| 
								 | 
							
								implementations of the same algorithm, such as "t in s"
							 | 
						||
| 
								 | 
							
								vs. "s.find(t)".
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The last is a count.  Each benchmark loops over the algorithm either
							 | 
						||
| 
								 | 
							
								100 or 1000 times, depending on the algorithm performance.  The output
							 | 
						||
| 
								 | 
							
								time is the time per benchmark call so the reader needs a way to know
							 | 
						||
| 
								 | 
							
								how to scale the performance.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								These parameters become function attributes.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Here is an example of the output
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								========== count newlines
							 | 
						||
| 
								 | 
							
								38.54   41.60   92.7    ...text.with.2000.newlines.count("\n") (*100)
							 | 
						||
| 
								 | 
							
								========== early match, single character
							 | 
						||
| 
								 | 
							
								1.14    1.18    96.8    ("A"*1000).find("A") (*1000)
							 | 
						||
| 
								 | 
							
								0.44    0.41    105.6   "A" in "A"*1000 (*1000)
							 | 
						||
| 
								 | 
							
								1.15    1.17    98.1    ("A"*1000).index("A") (*1000)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The first column is the run time in milliseconds for byte strings.
							 | 
						||
| 
								 | 
							
								The second is the run time for unicode strings.  The third is a
							 | 
						||
| 
								 | 
							
								percentage; byte time / unicode time.  It's the percentage by which
							 | 
						||
| 
								 | 
							
								unicode is faster than byte strings.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The last column contains the code snippet and the repeat count for the
							 | 
						||
| 
								 | 
							
								internal benchmark loop.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The times are computed with 'timeit.py' which repeats the test more
							 | 
						||
| 
								 | 
							
								and more times until the total time takes over 0.2 seconds, returning
							 | 
						||
| 
								 | 
							
								the best time for a single iteration.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The final line of the output is the cumulative time for byte and
							 | 
						||
| 
								 | 
							
								unicode strings, and the overall performance of unicode relative to
							 | 
						||
| 
								 | 
							
								bytes.  For example
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								4079.83 5432.25 75.1    TOTAL
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								However, this has no meaning as it evenly weights every test.
							 | 
						||
| 
								 | 
							
								
							 |