mirror of
				https://github.com/python/cpython.git
				synced 2025-11-03 23:21:29 +00:00 
			
		
		
		
	
		
			
	
	
		
			268 lines
		
	
	
	
		
			7.9 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
		
		
			
		
	
	
			268 lines
		
	
	
	
		
			7.9 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| 
								 | 
							
								\documentclass{howto}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\title{Sorting Mini-HOWTO}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								% Increment the release number whenever significant changes are made.
							 | 
						||
| 
								 | 
							
								% The author and/or editor can define 'significant' however they like.
							 | 
						||
| 
								 | 
							
								\release{0.01}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\author{Andrew Dalke}
							 | 
						||
| 
								 | 
							
								\authoraddress{\email{dalke@bioreason.com}}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{document}
							 | 
						||
| 
								 | 
							
								\maketitle
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{abstract}
							 | 
						||
| 
								 | 
							
								\noindent
							 | 
						||
| 
								 | 
							
								This document is a little tutorial
							 | 
						||
| 
								 | 
							
								showing a half dozen ways to sort a list with the built-in
							 | 
						||
| 
								 | 
							
								\method{sort()} method.  
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This document is available from the Python HOWTO page at
							 | 
						||
| 
								 | 
							
								\url{http://www.python.org/doc/howto}.
							 | 
						||
| 
								 | 
							
								\end{abstract}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\tableofcontents
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Python lists have a built-in \method{sort()} method.  There are many
							 | 
						||
| 
								 | 
							
								ways to use it to sort a list and there doesn't appear to be a single,
							 | 
						||
| 
								 | 
							
								central place in the various manuals describing them, so I'll do so
							 | 
						||
| 
								 | 
							
								here.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\section{Sorting basic data types}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								A simple ascending sort is easy; just call the \method{sort()} method of a list.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> a = [5, 2, 3, 1, 4]
							 | 
						||
| 
								 | 
							
								>>> a.sort()
							 | 
						||
| 
								 | 
							
								>>> print a
							 | 
						||
| 
								 | 
							
								[1, 2, 3, 4, 5]
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Sort takes an optional function which can be called for doing the
							 | 
						||
| 
								 | 
							
								comparisons.  The default sort routine is equivalent to
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> a = [5, 2, 3, 1, 4]
							 | 
						||
| 
								 | 
							
								>>> a.sort(cmp)
							 | 
						||
| 
								 | 
							
								>>> print a
							 | 
						||
| 
								 | 
							
								[1, 2, 3, 4, 5]
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								where \function{cmp} is the built-in function which compares two objects, \code{x} and
							 | 
						||
| 
								 | 
							
								\code{y}, and returns -1, 0 or 1 depending on whether $x<y$, $x==y$, or $x>y$.  During
							 | 
						||
| 
								 | 
							
								the course of the sort the relationships must stay the same for the
							 | 
						||
| 
								 | 
							
								final list to make sense.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If you want, you can define your own function for the comparison.  For 
							 | 
						||
| 
								 | 
							
								integers (and numbers in general) we can do:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> def numeric_compare(x, y):
							 | 
						||
| 
								 | 
							
								>>>    return x-y
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> a = [5, 2, 3, 1, 4]
							 | 
						||
| 
								 | 
							
								>>> a.sort(numeric_compare)
							 | 
						||
| 
								 | 
							
								>>> print a
							 | 
						||
| 
								 | 
							
								[1, 2, 3, 4, 5]
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								By the way, this function won't work if result of the subtraction
							 | 
						||
| 
								 | 
							
								is out of range, as in \code{sys.maxint - (-1)}.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Or, if you don't want to define a new named function you can create an
							 | 
						||
| 
								 | 
							
								anonymous one using \keyword{lambda}, as in:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> a = [5, 2, 3, 1, 4]
							 | 
						||
| 
								 | 
							
								>>> a.sort(lambda x, y: x-y)
							 | 
						||
| 
								 | 
							
								>>> print a
							 | 
						||
| 
								 | 
							
								[1, 2, 3, 4, 5]
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If you want the numbers sorted in reverse you can do
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> a = [5, 2, 3, 1, 4]
							 | 
						||
| 
								 | 
							
								>>> def reverse_numeric(x, y):
							 | 
						||
| 
								 | 
							
								>>>     return y-x
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> a.sort(reverse_numeric)
							 | 
						||
| 
								 | 
							
								>>> print a
							 | 
						||
| 
								 | 
							
								[5, 4, 3, 2, 1]
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								(a more general implementation could return \code{cmp(y,x)} or \code{-cmp(x,y)}).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								However, it's faster if Python doesn't have to call a function for
							 | 
						||
| 
								 | 
							
								every comparison, so if you want a reverse-sorted list of basic data
							 | 
						||
| 
								 | 
							
								types, do the forward sort first, then use the \method{reverse()} method.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> a = [5, 2, 3, 1, 4]
							 | 
						||
| 
								 | 
							
								>>> a.sort()
							 | 
						||
| 
								 | 
							
								>>> a.reverse()
							 | 
						||
| 
								 | 
							
								>>> print a
							 | 
						||
| 
								 | 
							
								[5, 4, 3, 2, 1]
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Here's a case-insensitive string comparison using a \keyword{lambda} function:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> import string
							 | 
						||
| 
								 | 
							
								>>> a = string.split("This is a test string from Andrew.")
							 | 
						||
| 
								 | 
							
								>>> a.sort(lambda x, y: cmp(string.lower(x), string.lower(y)))
							 | 
						||
| 
								 | 
							
								>>> print a
							 | 
						||
| 
								 | 
							
								['a', 'Andrew.', 'from', 'is', 'string', 'test', 'This']
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This goes through the overhead of converting a word to lower case
							 | 
						||
| 
								 | 
							
								every time it must be compared.  At times it may be faster to compute
							 | 
						||
| 
								 | 
							
								these once and use those values, and the following example shows how.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> words = string.split("This is a test string from Andrew.")
							 | 
						||
| 
								 | 
							
								>>> offsets = []
							 | 
						||
| 
								 | 
							
								>>> for i in range(len(words)):
							 | 
						||
| 
								 | 
							
								>>>     offsets.append( (string.lower(words[i]), i) )
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> offsets.sort()
							 | 
						||
| 
								 | 
							
								>>> new_words = []
							 | 
						||
| 
								 | 
							
								>>> for dontcare, i in offsets:
							 | 
						||
| 
								 | 
							
								>>>      new_words.append(words[i])
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> print new_words
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The \code{offsets} list is initialized to a tuple of the lower-case string
							 | 
						||
| 
								 | 
							
								and its position in the \code{words} list.  It is then sorted.  Python's
							 | 
						||
| 
								 | 
							
								sort method sorts tuples by comparing terms; given \code{x} and \code{y}, compare
							 | 
						||
| 
								 | 
							
								\code{x[0]} to \code{y[0]}, then \code{x[1]} to \code{y[1]}, etc. until there is a difference.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The result is that the \code{offsets} list is ordered by its first
							 | 
						||
| 
								 | 
							
								term, and the second term can be used to figure out where the original
							 | 
						||
| 
								 | 
							
								data was stored.  (The \code{for} loop assigns \code{dontcare} and
							 | 
						||
| 
								 | 
							
								\code{i} to the two fields of each term in the list, but we only need the
							 | 
						||
| 
								 | 
							
								index value.)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Another way to implement this is to store the original data as the
							 | 
						||
| 
								 | 
							
								second term in the \code{offsets} list, as in:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> words = string.split("This is a test string from Andrew.")
							 | 
						||
| 
								 | 
							
								>>> offsets = []
							 | 
						||
| 
								 | 
							
								>>> for word in words:
							 | 
						||
| 
								 | 
							
								>>>     offsets.append( (string.lower(word), word) )
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> offsets.sort()
							 | 
						||
| 
								 | 
							
								>>> new_words = []
							 | 
						||
| 
								 | 
							
								>>> for word in offsets:
							 | 
						||
| 
								 | 
							
								>>>     new_words.append(word[1])
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> print new_words
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This isn't always appropriate because the second terms in the list
							 | 
						||
| 
								 | 
							
								(the word, in this example) will be compared when the first terms are
							 | 
						||
| 
								 | 
							
								the same.  If this happens many times, then there will be the unneeded
							 | 
						||
| 
								 | 
							
								performance hit of comparing the two objects.  This can be a large
							 | 
						||
| 
								 | 
							
								cost if most terms are the same and the objects define their own
							 | 
						||
| 
								 | 
							
								\method{__cmp__} method, but there will still be some overhead to determine if
							 | 
						||
| 
								 | 
							
								\method{__cmp__} is defined.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Still, for large lists, or for lists where the comparison information
							 | 
						||
| 
								 | 
							
								is expensive to calculate, the last two examples are likely to be the
							 | 
						||
| 
								 | 
							
								fastest way to sort a list.  It will not work on weakly sorted data,
							 | 
						||
| 
								 | 
							
								like complex numbers, but if you don't know what that means, you
							 | 
						||
| 
								 | 
							
								probably don't need to worry about it.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\section{Comparing classes}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The comparison for two basic data types, like ints to ints or string to
							 | 
						||
| 
								 | 
							
								string, is built into Python and makes sense.  There is a default way
							 | 
						||
| 
								 | 
							
								to compare class instances, but the default manner isn't usually very
							 | 
						||
| 
								 | 
							
								useful.  You can define your own comparison with the \method{__cmp__} method,
							 | 
						||
| 
								 | 
							
								as in:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> class Spam:
							 | 
						||
| 
								 | 
							
								>>>     def __init__(self, spam, eggs):
							 | 
						||
| 
								 | 
							
								>>>         self.spam = spam
							 | 
						||
| 
								 | 
							
								>>>         self.eggs = eggs
							 | 
						||
| 
								 | 
							
								>>>     def __cmp__(self, other):
							 | 
						||
| 
								 | 
							
								>>>         return cmp(self.spam+self.eggs, other.spam+other.eggs)
							 | 
						||
| 
								 | 
							
								>>>     def __str__(self):
							 | 
						||
| 
								 | 
							
								>>>         return str(self.spam + self.eggs)
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> a = [Spam(1, 4), Spam(9, 3), Spam(4,6)]
							 | 
						||
| 
								 | 
							
								>>> a.sort()
							 | 
						||
| 
								 | 
							
								>>> for spam in a:
							 | 
						||
| 
								 | 
							
								>>>   print str(spam)
							 | 
						||
| 
								 | 
							
								5
							 | 
						||
| 
								 | 
							
								10
							 | 
						||
| 
								 | 
							
								12
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Sometimes you may want to sort by a specific attribute of a class.  If
							 | 
						||
| 
								 | 
							
								appropriate you should just define the \method{__cmp__} method to compare
							 | 
						||
| 
								 | 
							
								those values, but you cannot do this if you want to compare between
							 | 
						||
| 
								 | 
							
								different attributes at different times.  Instead, you'll need to go
							 | 
						||
| 
								 | 
							
								back to passing a comparison function to sort, as in:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> a = [Spam(1, 4), Spam(9, 3), Spam(4,6)]
							 | 
						||
| 
								 | 
							
								>>> a.sort(lambda x, y: cmp(x.eggs, y.eggs))
							 | 
						||
| 
								 | 
							
								>>> for spam in a:
							 | 
						||
| 
								 | 
							
								>>>   print spam.eggs, str(spam)
							 | 
						||
| 
								 | 
							
								3 12
							 | 
						||
| 
								 | 
							
								4 5
							 | 
						||
| 
								 | 
							
								6 10
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If you want to compare two arbitrary attributes (and aren't overly
							 | 
						||
| 
								 | 
							
								concerned about performance) you can even define your own comparison
							 | 
						||
| 
								 | 
							
								function object.  This uses the ability of a class instance to emulate
							 | 
						||
| 
								 | 
							
								an function by defining the \method{__call__} method, as in:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								>>> class CmpAttr:
							 | 
						||
| 
								 | 
							
								>>>     def __init__(self, attr):
							 | 
						||
| 
								 | 
							
								>>>         self.attr = attr
							 | 
						||
| 
								 | 
							
								>>>     def __call__(self, x, y):
							 | 
						||
| 
								 | 
							
								>>>         return cmp(getattr(x, self.attr), getattr(y, self.attr))
							 | 
						||
| 
								 | 
							
								>>> 
							 | 
						||
| 
								 | 
							
								>>> a = [Spam(1, 4), Spam(9, 3), Spam(4,6)]
							 | 
						||
| 
								 | 
							
								>>> a.sort(CmpAttr("spam"))  # sort by the "spam" attribute
							 | 
						||
| 
								 | 
							
								>>> for spam in a:
							 | 
						||
| 
								 | 
							
								>>>    print spam.spam, spam.eggs, str(spam)
							 | 
						||
| 
								 | 
							
								1 4 5
							 | 
						||
| 
								 | 
							
								4 6 10
							 | 
						||
| 
								 | 
							
								9 3 12
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								>>> a.sort(CmpAttr("eggs"))   # re-sort by the "eggs" attribute
							 | 
						||
| 
								 | 
							
								>>> for spam in a:
							 | 
						||
| 
								 | 
							
								>>>    print spam.spam, spam.eggs, str(spam)
							 | 
						||
| 
								 | 
							
								9 3 12
							 | 
						||
| 
								 | 
							
								1 4 5
							 | 
						||
| 
								 | 
							
								4 6 10
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Of course, if you want a faster sort you can extract the attributes
							 | 
						||
| 
								 | 
							
								into an intermediate list and sort that list.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								So, there you have it; about a half-dozen different ways to define how
							 | 
						||
| 
								 | 
							
								to sort a list:
							 | 
						||
| 
								 | 
							
								\begin{itemize}
							 | 
						||
| 
								 | 
							
								 \item sort using the default method
							 | 
						||
| 
								 | 
							
								 \item sort using a comparison function
							 | 
						||
| 
								 | 
							
								 \item reverse sort not using a comparison function
							 | 
						||
| 
								 | 
							
								 \item sort on an intermediate list (two forms)
							 | 
						||
| 
								 | 
							
								 \item sort using class defined __cmp__ method
							 | 
						||
| 
								 | 
							
								 \item sort using a sort function object
							 | 
						||
| 
								 | 
							
								\end{itemize}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\end{document}
							 | 
						||
| 
								 | 
							
								% LocalWords:  maxint
							 |