mirror of
				https://github.com/python/cpython.git
				synced 2025-11-04 07:31:38 +00:00 
			
		
		
		
	
		
			
	
	
		
			461 lines
		
	
	
	
		
			21 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
		
		
			
		
	
	
			461 lines
		
	
	
	
		
			21 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| 
								 | 
							
								\documentclass{howto}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\title{Socket Programming HOWTO}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\release{0.00}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\author{Gordon McMillan}
							 | 
						||
| 
								 | 
							
								\authoraddress{\email{gmcm@hypernet.com}}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{document}
							 | 
						||
| 
								 | 
							
								\maketitle
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{abstract}
							 | 
						||
| 
								 | 
							
								\noindent
							 | 
						||
| 
								 | 
							
								Sockets are used nearly everywhere, but are one of the most severely
							 | 
						||
| 
								 | 
							
								misunderstood technologies around. This is a 10,000 foot overview of
							 | 
						||
| 
								 | 
							
								sockets. It's not really a tutorial - you'll still have work to do in
							 | 
						||
| 
								 | 
							
								getting things operational. It doesn't cover the fine points (and there
							 | 
						||
| 
								 | 
							
								are a lot of them), but I hope it will give you enough background to
							 | 
						||
| 
								 | 
							
								begin using them decently.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This document is available from the Python HOWTO page at
							 | 
						||
| 
								 | 
							
								\url{http://www.python.org/doc/howto}.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\end{abstract}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\tableofcontents
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\section{Sockets}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Sockets are used nearly everywhere, but are one of the most severely
							 | 
						||
| 
								 | 
							
								misunderstood technologies around. This is a 10,000 foot overview of
							 | 
						||
| 
								 | 
							
								sockets. It's not really a tutorial - you'll still have work to do in
							 | 
						||
| 
								 | 
							
								getting things working. It doesn't cover the fine points (and there
							 | 
						||
| 
								 | 
							
								are a lot of them), but I hope it will give you enough background to
							 | 
						||
| 
								 | 
							
								begin using them decently.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								I'm only going to talk about INET sockets, but they account for at
							 | 
						||
| 
								 | 
							
								least 99\% of the sockets in use. And I'll only talk about STREAM
							 | 
						||
| 
								 | 
							
								sockets - unless you really know what you're doing (in which case this
							 | 
						||
| 
								 | 
							
								HOWTO isn't for you!), you'll get better behavior and performance from
							 | 
						||
| 
								 | 
							
								a STREAM socket than anything else. I will try to clear up the mystery
							 | 
						||
| 
								 | 
							
								of what a socket is, as well as some hints on how to work with
							 | 
						||
| 
								 | 
							
								blocking and non-blocking sockets. But I'll start by talking about
							 | 
						||
| 
								 | 
							
								blocking sockets. You'll need to know how they work before dealing
							 | 
						||
| 
								 | 
							
								with non-blocking sockets.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Part of the trouble with understanding these things is that "socket"
							 | 
						||
| 
								 | 
							
								can mean a number of subtly different things, depending on context. So
							 | 
						||
| 
								 | 
							
								first, let's make a distinction between a "client" socket - an
							 | 
						||
| 
								 | 
							
								endpoint of a conversation, and a "server" socket, which is more like
							 | 
						||
| 
								 | 
							
								a switchboard operator. The client application (your browser, for
							 | 
						||
| 
								 | 
							
								example) uses "client" sockets exclusively; the web server it's
							 | 
						||
| 
								 | 
							
								talking to uses both "server" sockets and "client" sockets.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\subsection{History}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Of the various forms of IPC (\emph{Inter Process Communication}),
							 | 
						||
| 
								 | 
							
								sockets are by far the most popular.  On any given platform, there are
							 | 
						||
| 
								 | 
							
								likely to be other forms of IPC that are faster, but for
							 | 
						||
| 
								 | 
							
								cross-platform communication, sockets are about the only game in town.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								They were invented in Berkeley as part of the BSD flavor of Unix. They
							 | 
						||
| 
								 | 
							
								spread like wildfire with the Internet. With good reason --- the
							 | 
						||
| 
								 | 
							
								combination of sockets with INET makes talking to arbitrary machines
							 | 
						||
| 
								 | 
							
								around the world unbelievably easy (at least compared to other
							 | 
						||
| 
								 | 
							
								schemes).  
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\section{Creating a Socket}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Roughly speaking, when you clicked on the link that brought you to
							 | 
						||
| 
								 | 
							
								this page, your browser did something like the following:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								    #create an INET, STREAMing socket
							 | 
						||
| 
								 | 
							
								    s = socket.socket(
							 | 
						||
| 
								 | 
							
								        socket.AF_INET, socket.SOCK_STREAM)
							 | 
						||
| 
								 | 
							
								    #now connect to the web server on port 80 
							 | 
						||
| 
								 | 
							
								    # - the normal http port
							 | 
						||
| 
								 | 
							
								    s.connect(("www.mcmillan-inc.com", 80))
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								When the \code{connect} completes, the socket \code{s} can
							 | 
						||
| 
								 | 
							
								now be used to send in a request for the text of this page. The same
							 | 
						||
| 
								 | 
							
								socket will read the reply, and then be destroyed. That's right -
							 | 
						||
| 
								 | 
							
								destroyed. Client sockets are normally only used for one exchange (or
							 | 
						||
| 
								 | 
							
								a small set of sequential exchanges).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								What happens in the web server is a bit more complex. First, the web
							 | 
						||
| 
								 | 
							
								server creates a "server socket".
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								    #create an INET, STREAMing socket
							 | 
						||
| 
								 | 
							
								    serversocket = socket.socket(
							 | 
						||
| 
								 | 
							
								        socket.AF_INET, socket.SOCK_STREAM)
							 | 
						||
| 
								 | 
							
								    #bind the socket to a public host, 
							 | 
						||
| 
								 | 
							
								    # and a well-known port
							 | 
						||
| 
								 | 
							
								    serversocket.bind((socket.gethostname(), 80))
							 | 
						||
| 
								 | 
							
								    #become a server socket
							 | 
						||
| 
								 | 
							
								    serversocket.listen(5)
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								A couple things to notice: we used \code{socket.gethostname()}
							 | 
						||
| 
								 | 
							
								so that the socket would be visible to the outside world. If we had
							 | 
						||
| 
								 | 
							
								used \code{s.bind(('', 80))} or \code{s.bind(('localhost',
							 | 
						||
| 
								 | 
							
								80))} or \code{s.bind(('127.0.0.1', 80))} we would still
							 | 
						||
| 
								 | 
							
								have a "server" socket, but one that was only visible within the same
							 | 
						||
| 
								 | 
							
								machine.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								A second thing to note: low number ports are usually reserved for
							 | 
						||
| 
								 | 
							
								"well known" services (HTTP, SNMP etc). If you're playing around, use
							 | 
						||
| 
								 | 
							
								a nice high number (4 digits).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Finally, the argument to \code{listen} tells the socket library that
							 | 
						||
| 
								 | 
							
								we want it to queue up as many as 5 connect requests (the normal max)
							 | 
						||
| 
								 | 
							
								before refusing outside connections. If the rest of the code is
							 | 
						||
| 
								 | 
							
								written properly, that should be plenty.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								OK, now we have a "server" socket, listening on port 80. Now we enter
							 | 
						||
| 
								 | 
							
								the mainloop of the web server:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								    while 1:
							 | 
						||
| 
								 | 
							
								        #accept connections from outside
							 | 
						||
| 
								 | 
							
								        (clientsocket, address) = serversocket.accept()
							 | 
						||
| 
								 | 
							
								        #now do something with the clientsocket
							 | 
						||
| 
								 | 
							
								        #in this case, we'll pretend this is a threaded server
							 | 
						||
| 
								 | 
							
								        ct = client_thread(clientsocket)
							 | 
						||
| 
								 | 
							
								        ct.run()
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								There's actually 3 general ways in which this loop could work -
							 | 
						||
| 
								 | 
							
								dispatching a thread to handle \code{clientsocket}, create a new
							 | 
						||
| 
								 | 
							
								process to handle \code{clientsocket}, or restructure this app
							 | 
						||
| 
								 | 
							
								to use non-blocking sockets, and mulitplex between our "server" socket
							 | 
						||
| 
								 | 
							
								and any active \code{clientsocket}s using
							 | 
						||
| 
								 | 
							
								\code{select}. More about that later. The important thing to
							 | 
						||
| 
								 | 
							
								understand now is this: this is \emph{all} a "server" socket
							 | 
						||
| 
								 | 
							
								does. It doesn't send any data. It doesn't receive any data. It just
							 | 
						||
| 
								 | 
							
								produces "client" sockets. Each \code{clientsocket} is created
							 | 
						||
| 
								 | 
							
								in response to some \emph{other} "client" socket doing a
							 | 
						||
| 
								 | 
							
								\code{connect()} to the host and port we're bound to. As soon as
							 | 
						||
| 
								 | 
							
								we've created that \code{clientsocket}, we go back to listening
							 | 
						||
| 
								 | 
							
								for more connections. The two "clients" are free to chat it up - they
							 | 
						||
| 
								 | 
							
								are using some dynamically allocated port which will be recycled when
							 | 
						||
| 
								 | 
							
								the conversation ends.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\subsection{IPC} If you need fast IPC between two processes
							 | 
						||
| 
								 | 
							
								on one machine, you should look into whatever form of shared memory
							 | 
						||
| 
								 | 
							
								the platform offers. A simple protocol based around shared memory and
							 | 
						||
| 
								 | 
							
								locks or semaphores is by far the fastest technique.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If you do decide to use sockets, bind the "server" socket to
							 | 
						||
| 
								 | 
							
								\code{'localhost'}. On most platforms, this will take a shortcut
							 | 
						||
| 
								 | 
							
								around a couple of layers of network code and be quite a bit faster.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\section{Using a Socket}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The first thing to note, is that the web browser's "client" socket and
							 | 
						||
| 
								 | 
							
								the web server's "client" socket are identical beasts. That is, this
							 | 
						||
| 
								 | 
							
								is a "peer to peer" conversation. Or to put it another way, \emph{as the
							 | 
						||
| 
								 | 
							
								designer, you will have to decide what the rules of etiquette are for
							 | 
						||
| 
								 | 
							
								a conversation}. Normally, the \code{connect}ing socket
							 | 
						||
| 
								 | 
							
								starts the conversation, by sending in a request, or perhaps a
							 | 
						||
| 
								 | 
							
								signon. But that's a design decision - it's not a rule of sockets.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Now there are two sets of verbs to use for communication. You can use
							 | 
						||
| 
								 | 
							
								\code{send} and \code{recv}, or you can transform your
							 | 
						||
| 
								 | 
							
								client socket into a file-like beast and use \code{read} and
							 | 
						||
| 
								 | 
							
								\code{write}. The latter is the way Java presents their
							 | 
						||
| 
								 | 
							
								sockets. I'm not going to talk about it here, except to warn you that
							 | 
						||
| 
								 | 
							
								you need to use \code{flush} on sockets. These are buffered
							 | 
						||
| 
								 | 
							
								"files", and a common mistake is to \code{write} something, and
							 | 
						||
| 
								 | 
							
								then \code{read} for a reply. Without a \code{flush} in
							 | 
						||
| 
								 | 
							
								there, you may wait forever for the reply, because the request may
							 | 
						||
| 
								 | 
							
								still be in your output buffer.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Now we come the major stumbling block of sockets - \code{send}
							 | 
						||
| 
								 | 
							
								and \code{recv} operate on the network buffers. They do not
							 | 
						||
| 
								 | 
							
								necessarily handle all the bytes you hand them (or expect from them),
							 | 
						||
| 
								 | 
							
								because their major focus is handling the network buffers. In general,
							 | 
						||
| 
								 | 
							
								they return when the associated network buffers have been filled
							 | 
						||
| 
								 | 
							
								(\code{send}) or emptied (\code{recv}). They then tell you
							 | 
						||
| 
								 | 
							
								how many bytes they handled. It is \emph{your} responsibility to call
							 | 
						||
| 
								 | 
							
								them again until your message has been completely dealt with.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								When a \code{recv} returns 0 bytes, it means the other side has
							 | 
						||
| 
								 | 
							
								closed (or is in the process of closing) the connection.  You will not
							 | 
						||
| 
								 | 
							
								receive any more data on this connection. Ever.  You may be able to
							 | 
						||
| 
								 | 
							
								send data successfully; I'll talk about that some on the next page.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								A protocol like HTTP uses a socket for only one transfer. The client
							 | 
						||
| 
								 | 
							
								sends a request, the reads a reply.  That's it. The socket is
							 | 
						||
| 
								 | 
							
								discarded. This means that a client can detect the end of the reply by
							 | 
						||
| 
								 | 
							
								receiving 0 bytes.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								But if you plan to reuse your socket for further transfers, you need
							 | 
						||
| 
								 | 
							
								to realize that \emph{there is no "EOT" (End of Transfer) on a
							 | 
						||
| 
								 | 
							
								socket.} I repeat: if a socket \code{send} or
							 | 
						||
| 
								 | 
							
								\code{recv} returns after handling 0 bytes, the connection has
							 | 
						||
| 
								 | 
							
								been broken.  If the connection has \emph{not} been broken, you may
							 | 
						||
| 
								 | 
							
								wait on a \code{recv} forever, because the socket will
							 | 
						||
| 
								 | 
							
								\emph{not} tell you that there's nothing more to read (for now).  Now
							 | 
						||
| 
								 | 
							
								if you think about that a bit, you'll come to realize a fundamental
							 | 
						||
| 
								 | 
							
								truth of sockets: \emph{messages must either be fixed length} (yuck),
							 | 
						||
| 
								 | 
							
								\emph{or be delimited} (shrug), \emph{or indicate how long they are}
							 | 
						||
| 
								 | 
							
								(much better), \emph{or end by shutting down the connection}. The
							 | 
						||
| 
								 | 
							
								choice is entirely yours, (but some ways are righter than others).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Assuming you don't want to end the connection, the simplest solution
							 | 
						||
| 
								 | 
							
								is a fixed length message:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}
							 | 
						||
| 
								 | 
							
								    class mysocket:
							 | 
						||
| 
								 | 
							
								        '''demonstration class only 
							 | 
						||
| 
								 | 
							
								          - coded for clarity, not efficiency'''
							 | 
						||
| 
								 | 
							
								        def __init__(self, sock=None):
							 | 
						||
| 
								 | 
							
								            if sock is None:
							 | 
						||
| 
								 | 
							
								                self.sock = socket.socket(
							 | 
						||
| 
								 | 
							
								                    socket.AF_INET, socket.SOCK_STREAM)
							 | 
						||
| 
								 | 
							
								            else:
							 | 
						||
| 
								 | 
							
								                self.sock = sock
							 | 
						||
| 
								 | 
							
								        def connect(host, port):
							 | 
						||
| 
								 | 
							
								            self.sock.connect((host, port))
							 | 
						||
| 
								 | 
							
								        def mysend(msg):
							 | 
						||
| 
								 | 
							
								            totalsent = 0
							 | 
						||
| 
								 | 
							
								            while totalsent < MSGLEN:
							 | 
						||
| 
								 | 
							
								                sent = self.sock.send(msg[totalsent:])
							 | 
						||
| 
								 | 
							
								                if sent == 0:
							 | 
						||
| 
								 | 
							
								                    raise RuntimeError, \\
							 | 
						||
| 
								 | 
							
								                        "socket connection broken"
							 | 
						||
| 
								 | 
							
								                totalsent = totalsent + sent
							 | 
						||
| 
								 | 
							
								        def myreceive():
							 | 
						||
| 
								 | 
							
								            msg = ''
							 | 
						||
| 
								 | 
							
								            while len(msg) < MSGLEN:
							 | 
						||
| 
								 | 
							
								                chunk = self.sock.recv(MSGLEN-len(msg))
							 | 
						||
| 
								 | 
							
								                if chunk == '':
							 | 
						||
| 
								 | 
							
								                    raise RuntimeError, \\
							 | 
						||
| 
								 | 
							
								                        "socket connection broken"
							 | 
						||
| 
								 | 
							
								                msg = msg + chunk
							 | 
						||
| 
								 | 
							
								            return msg
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The sending code here is usable for almost any messaging scheme - in
							 | 
						||
| 
								 | 
							
								Python you send strings, and you can use \code{len()} to
							 | 
						||
| 
								 | 
							
								determine its length (even if it has embedded \code{\e 0}
							 | 
						||
| 
								 | 
							
								characters). It's mostly the receiving code that gets more
							 | 
						||
| 
								 | 
							
								complex. (And in C, it's not much worse, except you can't use
							 | 
						||
| 
								 | 
							
								\code{strlen} if the message has embedded \code{\e 0}s.)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The easiest enhancement is to make the first character of the message
							 | 
						||
| 
								 | 
							
								an indicator of message type, and have the type determine the
							 | 
						||
| 
								 | 
							
								length. Now you have two \code{recv}s - the first to get (at
							 | 
						||
| 
								 | 
							
								least) that first character so you can look up the length, and the
							 | 
						||
| 
								 | 
							
								second in a loop to get the rest. If you decide to go the delimited
							 | 
						||
| 
								 | 
							
								route, you'll be receiving in some arbitrary chunk size, (4096 or 8192
							 | 
						||
| 
								 | 
							
								is frequently a good match for network buffer sizes), and scanning
							 | 
						||
| 
								 | 
							
								what you've received for a delimiter.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								One complication to be aware of: if your conversational protocol
							 | 
						||
| 
								 | 
							
								allows multiple messages to be sent back to back (without some kind of
							 | 
						||
| 
								 | 
							
								reply), and you pass \code{recv} an arbitrary chunk size, you
							 | 
						||
| 
								 | 
							
								may end up reading the start of a following message. You'll need to
							 | 
						||
| 
								 | 
							
								put that aside and hold onto it, until it's needed.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Prefixing the message with it's length (say, as 5 numeric characters)
							 | 
						||
| 
								 | 
							
								gets more complex, because (believe it or not), you may not get all 5
							 | 
						||
| 
								 | 
							
								characters in one \code{recv}. In playing around, you'll get
							 | 
						||
| 
								 | 
							
								away with it; but in high network loads, your code will very quickly
							 | 
						||
| 
								 | 
							
								break unless you use two \code{recv} loops - the first to
							 | 
						||
| 
								 | 
							
								determine the length, the second to get the data part of the
							 | 
						||
| 
								 | 
							
								message. Nasty. This is also when you'll discover that
							 | 
						||
| 
								 | 
							
								\code{send} does not always manage to get rid of everything in
							 | 
						||
| 
								 | 
							
								one pass. And despite having read this, you will eventually get bit by
							 | 
						||
| 
								 | 
							
								it!
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In the interests of space, building your character, (and preserving my
							 | 
						||
| 
								 | 
							
								competitive position), these enhancements are left as an exercise for
							 | 
						||
| 
								 | 
							
								the reader. Lets move on to cleaning up.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\subsection{Binary Data}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								It is perfectly possible to send binary data over a socket. The major
							 | 
						||
| 
								 | 
							
								problem is that not all machines use the same formats for binary
							 | 
						||
| 
								 | 
							
								data. For example, a Motorola chip will represent a 16 bit integer
							 | 
						||
| 
								 | 
							
								with the value 1 as the two hex bytes 00 01. Intel and DEC, however,
							 | 
						||
| 
								 | 
							
								are byte-reversed - that same 1 is 01 00. Socket libraries have calls
							 | 
						||
| 
								 | 
							
								for converting 16 and 32 bit integers - \code{ntohl, htonl, ntohs,
							 | 
						||
| 
								 | 
							
								htons} where "n" means \emph{network} and "h" means \emph{host},
							 | 
						||
| 
								 | 
							
								"s" means \emph{short} and "l" means \emph{long}. Where network order
							 | 
						||
| 
								 | 
							
								is host order, these do nothing, but where the machine is
							 | 
						||
| 
								 | 
							
								byte-reversed, these swap the bytes around appropriately.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In these days of 32 bit machines, the ascii representation of binary
							 | 
						||
| 
								 | 
							
								data is frequently smaller than the binary representation. That's
							 | 
						||
| 
								 | 
							
								because a surprising amount of the time, all those longs have the
							 | 
						||
| 
								 | 
							
								value 0, or maybe 1. The string "0" would be two bytes, while binary
							 | 
						||
| 
								 | 
							
								is four. Of course, this doesn't fit well with fixed-length
							 | 
						||
| 
								 | 
							
								messages. Decisions, decisions.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\section{Disconnecting}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Strictly speaking, you're supposed to use \code{shutdown} on a
							 | 
						||
| 
								 | 
							
								socket before you \code{close} it.  The \code{shutdown} is
							 | 
						||
| 
								 | 
							
								an advisory to the socket at the other end.  Depending on the argument
							 | 
						||
| 
								 | 
							
								you pass it, it can mean "I'm not going to send anymore, but I'll
							 | 
						||
| 
								 | 
							
								still listen", or "I'm not listening, good riddance!".  Most socket
							 | 
						||
| 
								 | 
							
								libraries, however, are so used to programmers neglecting to use this
							 | 
						||
| 
								 | 
							
								piece of etiquette that normally a \code{close} is the same as
							 | 
						||
| 
								 | 
							
								\code{shutdown(); close()}.  So in most situations, an explicit
							 | 
						||
| 
								 | 
							
								\code{shutdown} is not needed.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								One way to use \code{shutdown} effectively is in an HTTP-like
							 | 
						||
| 
								 | 
							
								exchange. The client sends a request and then does a
							 | 
						||
| 
								 | 
							
								\code{shutdown(1)}. This tells the server "This client is done
							 | 
						||
| 
								 | 
							
								sending, but can still receive."  The server can detect "EOF" by a
							 | 
						||
| 
								 | 
							
								receive of 0 bytes. It can assume it has the complete request.  The
							 | 
						||
| 
								 | 
							
								server sends a reply. If the \code{send} completes successfully
							 | 
						||
| 
								 | 
							
								then, indeed, the client was still receiving.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Python takes the automatic shutdown a step further, and says that when a socket is garbage collected, it will automatically do a \code{close} if it's needed. But relying on this is a very bad habit. If your socket just disappears without doing a \code{close}, the socket at the other end may hang indefinitely, thinking you're just being slow. \emph{Please} \code{close} your sockets when you're done.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\subsection{When Sockets Die}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Probably the worst thing about using blocking sockets is what happens
							 | 
						||
| 
								 | 
							
								when the other side comes down hard (without doing a
							 | 
						||
| 
								 | 
							
								\code{close}). Your socket is likely to hang. SOCKSTREAM is a
							 | 
						||
| 
								 | 
							
								reliable protocol, and it will wait a long, long time before giving up
							 | 
						||
| 
								 | 
							
								on a connection. If you're using threads, the entire thread is
							 | 
						||
| 
								 | 
							
								essentially dead. There's not much you can do about it. As long as you
							 | 
						||
| 
								 | 
							
								aren't doing something dumb, like holding a lock while doing a
							 | 
						||
| 
								 | 
							
								blocking read, the thread isn't really consuming much in the way of
							 | 
						||
| 
								 | 
							
								resources. Do \emph{not} try to kill the thread - part of the reason
							 | 
						||
| 
								 | 
							
								that threads are more efficient than processes is that they avoid the
							 | 
						||
| 
								 | 
							
								overhead associated with the automatic recycling of resources. In
							 | 
						||
| 
								 | 
							
								other words, if you do manage to kill the thread, your whole process
							 | 
						||
| 
								 | 
							
								is likely to be screwed up.  
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\section{Non-blocking Sockets}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If you've understood the preceeding, you already know most of what you
							 | 
						||
| 
								 | 
							
								need to know about the mechanics of using sockets. You'll still use
							 | 
						||
| 
								 | 
							
								the same calls, in much the same ways. It's just that, if you do it
							 | 
						||
| 
								 | 
							
								right, your app will be almost inside-out.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In Python, you use \code{socket.setblocking(0)} to make it
							 | 
						||
| 
								 | 
							
								non-blocking. In C, it's more complex, (for one thing, you'll need to
							 | 
						||
| 
								 | 
							
								choose between the BSD flavor \code{O_NONBLOCK} and the almost
							 | 
						||
| 
								 | 
							
								indistinguishable Posix flavor \code{O_NDELAY}, which is
							 | 
						||
| 
								 | 
							
								completely different from \code{TCP_NODELAY}), but it's the
							 | 
						||
| 
								 | 
							
								exact same idea. You do this after creating the socket, but before
							 | 
						||
| 
								 | 
							
								using it. (Actually, if you're nuts, you can switch back and forth.)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The major mechanical difference is that \code{send},
							 | 
						||
| 
								 | 
							
								\code{recv}, \code{connect} and \code{accept} can
							 | 
						||
| 
								 | 
							
								return without having done anything. You have (of course) a number of
							 | 
						||
| 
								 | 
							
								choices. You can check return code and error codes and generally drive
							 | 
						||
| 
								 | 
							
								yourself crazy. If you don't believe me, try it sometime. Your app
							 | 
						||
| 
								 | 
							
								will grow large, buggy and suck CPU. So let's skip the brain-dead
							 | 
						||
| 
								 | 
							
								solutions and do it right.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Use \code{select}.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In C, coding \code{select} is fairly complex. In Python, it's a
							 | 
						||
| 
								 | 
							
								piece of cake, but it's close enough to the C version that if you
							 | 
						||
| 
								 | 
							
								understand \code{select} in Python, you'll have little trouble
							 | 
						||
| 
								 | 
							
								with it in C.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\begin{verbatim}    ready_to_read, ready_to_write, in_error = \\
							 | 
						||
| 
								 | 
							
								                   select.select(
							 | 
						||
| 
								 | 
							
								                      potential_readers, 
							 | 
						||
| 
								 | 
							
								                      potential_writers, 
							 | 
						||
| 
								 | 
							
								                      potential_errs, 
							 | 
						||
| 
								 | 
							
								                      timeout)
							 | 
						||
| 
								 | 
							
								\end{verbatim}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								You pass \code{select} three lists: the first contains all
							 | 
						||
| 
								 | 
							
								sockets that you might want to try reading; the second all the sockets
							 | 
						||
| 
								 | 
							
								you might want to try writing to, and the last (normally left empty)
							 | 
						||
| 
								 | 
							
								those that you want to check for errors.  You should note that a
							 | 
						||
| 
								 | 
							
								socket can go into more than one list. The \code{select} call is
							 | 
						||
| 
								 | 
							
								blocking, but you can give it a timeout. This is generally a sensible
							 | 
						||
| 
								 | 
							
								thing to do - give it a nice long timeout (say a minute) unless you
							 | 
						||
| 
								 | 
							
								have good reason to do otherwise.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In return, you will get three lists. They have the sockets that are
							 | 
						||
| 
								 | 
							
								actually readable, writable and in error. Each of these lists is a
							 | 
						||
| 
								 | 
							
								subset (possbily empty) of the corresponding list you passed in. And
							 | 
						||
| 
								 | 
							
								if you put a socket in more than one input list, it will only be (at
							 | 
						||
| 
								 | 
							
								most) in one output list.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If a socket is in the output readable list, you can be
							 | 
						||
| 
								 | 
							
								as-close-to-certain-as-we-ever-get-in-this-business that a
							 | 
						||
| 
								 | 
							
								\code{recv} on that socket will return \emph{something}. Same
							 | 
						||
| 
								 | 
							
								idea for the writable list. You'll be able to send
							 | 
						||
| 
								 | 
							
								\emph{something}. Maybe not all you want to, but \emph{something} is
							 | 
						||
| 
								 | 
							
								better than nothing. (Actually, any reasonably healthy socket will
							 | 
						||
| 
								 | 
							
								return as writable - it just means outbound network buffer space is
							 | 
						||
| 
								 | 
							
								available.)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If you have a "server" socket, put it in the potential_readers
							 | 
						||
| 
								 | 
							
								list. If it comes out in the readable list, your \code{accept}
							 | 
						||
| 
								 | 
							
								will (almost certainly) work. If you have created a new socket to
							 | 
						||
| 
								 | 
							
								\code{connect} to someone else, put it in the ptoential_writers
							 | 
						||
| 
								 | 
							
								list. If it shows up in the writable list, you have a decent chance
							 | 
						||
| 
								 | 
							
								that it has connected.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								One very nasty problem with \code{select}: if somewhere in those
							 | 
						||
| 
								 | 
							
								input lists of sockets is one which has died a nasty death, the
							 | 
						||
| 
								 | 
							
								\code{select} will fail. You then need to loop through every
							 | 
						||
| 
								 | 
							
								single damn socket in all those lists and do a
							 | 
						||
| 
								 | 
							
								\code{select([sock],[],[],0)} until you find the bad one. That
							 | 
						||
| 
								 | 
							
								timeout of 0 means it won't take long, but it's ugly.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Actually, \code{select} can be handy even with blocking sockets.
							 | 
						||
| 
								 | 
							
								It's one way of determining whether you will block - the socket
							 | 
						||
| 
								 | 
							
								returns as readable when there's something in the buffers.  However,
							 | 
						||
| 
								 | 
							
								this still doesn't help with the problem of determining whether the
							 | 
						||
| 
								 | 
							
								other end is done, or just busy with something else.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\textbf{Portability alert}: On Unix, \code{select} works both with
							 | 
						||
| 
								 | 
							
								the sockets and files. Don't try this on Windows. On Windows,
							 | 
						||
| 
								 | 
							
								\code{select} works with sockets only. Also note that in C, many
							 | 
						||
| 
								 | 
							
								of the more advanced socket options are done differently on
							 | 
						||
| 
								 | 
							
								Windows. In fact, on Windows I usually use threads (which work very,
							 | 
						||
| 
								 | 
							
								very well) with my sockets. Face it, if you want any kind of
							 | 
						||
| 
								 | 
							
								performance, your code will look very different on Windows than on
							 | 
						||
| 
								 | 
							
								Unix. (I haven't the foggiest how you do this stuff on a Mac.)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\subsection{Performance}
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								There's no question that the fastest sockets code uses non-blocking
							 | 
						||
| 
								 | 
							
								sockets and select to multiplex them. You can put together something
							 | 
						||
| 
								 | 
							
								that will saturate a LAN connection without putting any strain on the
							 | 
						||
| 
								 | 
							
								CPU. The trouble is that an app written this way can't do much of
							 | 
						||
| 
								 | 
							
								anything else - it needs to be ready to shuffle bytes around at all
							 | 
						||
| 
								 | 
							
								times.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Assuming that your app is actually supposed to do something more than
							 | 
						||
| 
								 | 
							
								that, threading is the optimal solution, (and using non-blocking
							 | 
						||
| 
								 | 
							
								sockets will be faster than using blocking sockets). Unfortunately,
							 | 
						||
| 
								 | 
							
								threading support in Unixes varies both in API and quality. So the
							 | 
						||
| 
								 | 
							
								normal Unix solution is to fork a subprocess to deal with each
							 | 
						||
| 
								 | 
							
								connection. The overhead for this is significant (and don't do this on
							 | 
						||
| 
								 | 
							
								Windows - the overhead of process creation is enormous there). It also
							 | 
						||
| 
								 | 
							
								means that unless each subprocess is completely independent, you'll
							 | 
						||
| 
								 | 
							
								need to use another form of IPC, say a pipe, or shared memory and
							 | 
						||
| 
								 | 
							
								semaphores, to communicate between the parent and child processes.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Finally, remember that even though blocking sockets are somewhat
							 | 
						||
| 
								 | 
							
								slower than non-blocking, in many cases they are the "right"
							 | 
						||
| 
								 | 
							
								solution. After all, if your app is driven by the data it receives
							 | 
						||
| 
								 | 
							
								over a socket, there's not much sense in complicating the logic just
							 | 
						||
| 
								 | 
							
								so your app can wait on \code{select} instead of
							 | 
						||
| 
								 | 
							
								\code{recv}.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								\end{document}
							 |