mirror of
				https://github.com/python/cpython.git
				synced 2025-10-27 03:34:32 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			68 lines
		
	
	
	
		
			2.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			68 lines
		
	
	
	
		
			2.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| stringbench is a set of performance tests comparing byte string
 | |
| operations with unicode operations.  The two string implementations
 | |
| are loosely based on each other and sometimes the algorithm for one is
 | |
| faster than the other.
 | |
| 
 | |
| These test set was started at the Need For Speed sprint in Reykjavik
 | |
| to identify which string methods could be sped up quickly and to
 | |
| identify obvious places for improvement.
 | |
| 
 | |
| Here is an example of a benchmark
 | |
| 
 | |
| 
 | |
| @bench('"Andrew".startswith("A")', 'startswith single character', 1000)
 | |
| def startswith_single(STR):
 | |
|     s1 = STR("Andrew")
 | |
|     s2 = STR("A")
 | |
|     s1_startswith = s1.startswith
 | |
|     for x in _RANGE_1000:
 | |
|         s1_startswith(s2)
 | |
| 
 | |
| The bench decorator takes three parameters.  The first is a short
 | |
| description of how the code works.  In most cases this is Python code
 | |
| snippet.  It is not the code which is actually run because the real
 | |
| code is hand-optimized to focus on the method being tested.
 | |
| 
 | |
| The second parameter is a group title.  All benchmarks with the same
 | |
| group title are listed together.  This lets you compare different
 | |
| implementations of the same algorithm, such as "t in s"
 | |
| vs. "s.find(t)".
 | |
| 
 | |
| The last is a count.  Each benchmark loops over the algorithm either
 | |
| 100 or 1000 times, depending on the algorithm performance.  The output
 | |
| time is the time per benchmark call so the reader needs a way to know
 | |
| how to scale the performance.
 | |
| 
 | |
| These parameters become function attributes.
 | |
| 
 | |
| 
 | |
| Here is an example of the output
 | |
| 
 | |
| 
 | |
| ========== count newlines
 | |
| 38.54   41.60   92.7    ...text.with.2000.newlines.count("\n") (*100)
 | |
| ========== early match, single character
 | |
| 1.14    1.18    96.8    ("A"*1000).find("A") (*1000)
 | |
| 0.44    0.41    105.6   "A" in "A"*1000 (*1000)
 | |
| 1.15    1.17    98.1    ("A"*1000).index("A") (*1000)
 | |
| 
 | |
| The first column is the run time in milliseconds for byte strings.
 | |
| The second is the run time for unicode strings.  The third is a
 | |
| percentage; byte time / unicode time.  It's the percentage by which
 | |
| unicode is faster than byte strings.
 | |
| 
 | |
| The last column contains the code snippet and the repeat count for the
 | |
| internal benchmark loop.
 | |
| 
 | |
| The times are computed with 'timeit.py' which repeats the test more
 | |
| and more times until the total time takes over 0.2 seconds, returning
 | |
| the best time for a single iteration.
 | |
| 
 | |
| The final line of the output is the cumulative time for byte and
 | |
| unicode strings, and the overall performance of unicode relative to
 | |
| bytes.  For example
 | |
| 
 | |
| 4079.83 5432.25 75.1    TOTAL
 | |
| 
 | |
| However, this has no meaning as it evenly weights every test.
 | |
| 
 | 
