| 
									
										
										
										
											2012-02-09 17:54:17 +08:00
										 |  |  | .. _socket-howto:
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | ****************************
 | 
					
						
							| 
									
										
										
										
											2009-01-03 21:18:54 +00:00
										 |  |  |   Socket Programming HOWTO
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | ****************************
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :Author: Gordon McMillan
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. topic:: Abstract
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Sockets are used nearly everywhere, but are one of the most severely
 | 
					
						
							|  |  |  |    misunderstood technologies around. This is a 10,000 foot overview of sockets.
 | 
					
						
							|  |  |  |    It's not really a tutorial - you'll still have work to do in getting things
 | 
					
						
							|  |  |  |    operational. It doesn't cover the fine points (and there are a lot of them), but
 | 
					
						
							|  |  |  |    I hope it will give you enough background to begin using them decently.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Sockets
 | 
					
						
							|  |  |  | =======
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-29 16:54:08 +02:00
										 |  |  | I'm only going to talk about INET (i.e. IPv4) sockets, but they account for at least 99% of
 | 
					
						
							|  |  |  | the sockets in use. And I'll only talk about STREAM (i.e. TCP) sockets - unless you really
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | know what you're doing (in which case this HOWTO isn't for you!), you'll get
 | 
					
						
							|  |  |  | better behavior and performance from a STREAM socket than anything else. I will
 | 
					
						
							|  |  |  | try to clear up the mystery of what a socket is, as well as some hints on how to
 | 
					
						
							|  |  |  | work with blocking and non-blocking sockets. But I'll start by talking about
 | 
					
						
							|  |  |  | blocking sockets. You'll need to know how they work before dealing with
 | 
					
						
							|  |  |  | non-blocking sockets.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Part of the trouble with understanding these things is that "socket" can mean a
 | 
					
						
							|  |  |  | number of subtly different things, depending on context. So first, let's make a
 | 
					
						
							|  |  |  | distinction between a "client" socket - an endpoint of a conversation, and a
 | 
					
						
							|  |  |  | "server" socket, which is more like a switchboard operator. The client
 | 
					
						
							|  |  |  | application (your browser, for example) uses "client" sockets exclusively; the
 | 
					
						
							|  |  |  | web server it's talking to uses both "server" sockets and "client" sockets.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | History
 | 
					
						
							|  |  |  | -------
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | Of the various forms of :abbr:`IPC (Inter Process Communication)`,
 | 
					
						
							|  |  |  | sockets are by far the most popular.  On any given platform, there are
 | 
					
						
							|  |  |  | likely to be other forms of IPC that are faster, but for
 | 
					
						
							|  |  |  | cross-platform communication, sockets are about the only game in town.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | They were invented in Berkeley as part of the BSD flavor of Unix. They spread
 | 
					
						
							|  |  |  | like wildfire with the Internet. With good reason --- the combination of sockets
 | 
					
						
							|  |  |  | with INET makes talking to arbitrary machines around the world unbelievably easy
 | 
					
						
							|  |  |  | (at least compared to other schemes).
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Creating a Socket
 | 
					
						
							|  |  |  | =================
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Roughly speaking, when you clicked on the link that brought you to this page,
 | 
					
						
							|  |  |  | your browser did something like the following::
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:37:34 +01:00
										 |  |  |    # create an INET, STREAMing socket
 | 
					
						
							| 
									
										
										
										
											2007-09-10 00:47:20 +00:00
										 |  |  |    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:37:34 +01:00
										 |  |  |    # now connect to the web server on port 80 - the normal http port
 | 
					
						
							|  |  |  |    s.connect(("www.python.org", 80))
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | When the ``connect`` completes, the socket ``s`` can be used to send
 | 
					
						
							|  |  |  | in a request for the text of the page. The same socket will read the
 | 
					
						
							|  |  |  | reply, and then be destroyed. That's right, destroyed. Client sockets
 | 
					
						
							|  |  |  | are normally only used for one exchange (or a small set of sequential
 | 
					
						
							|  |  |  | exchanges).
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | What happens in the web server is a bit more complex. First, the web server
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | creates a "server socket"::
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:37:34 +01:00
										 |  |  |    # create an INET, STREAMing socket
 | 
					
						
							|  |  |  |    serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 | 
					
						
							|  |  |  |    # bind the socket to a public host, and a well-known port
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |    serversocket.bind((socket.gethostname(), 80))
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:37:34 +01:00
										 |  |  |    # become a server socket
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |    serversocket.listen(5)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A couple things to notice: we used ``socket.gethostname()`` so that the socket
 | 
					
						
							| 
									
										
										
										
											2013-04-14 10:59:04 +02:00
										 |  |  | would be visible to the outside world.  If we had used ``s.bind(('localhost',
 | 
					
						
							|  |  |  | 80))`` or ``s.bind(('127.0.0.1', 80))`` we would still have a "server" socket,
 | 
					
						
							|  |  |  | but one that was only visible within the same machine.  ``s.bind(('', 80))``
 | 
					
						
							|  |  |  | specifies that the socket is reachable by any address the machine happens to
 | 
					
						
							|  |  |  | have.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | A second thing to note: low number ports are usually reserved for "well known"
 | 
					
						
							|  |  |  | services (HTTP, SNMP etc). If you're playing around, use a nice high number (4
 | 
					
						
							|  |  |  | digits).
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Finally, the argument to ``listen`` tells the socket library that we want it to
 | 
					
						
							|  |  |  | queue up as many as 5 connect requests (the normal max) before refusing outside
 | 
					
						
							|  |  |  | connections. If the rest of the code is written properly, that should be plenty.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | Now that we have a "server" socket, listening on port 80, we can enter the
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | mainloop of the web server::
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-09-10 00:47:20 +00:00
										 |  |  |    while True:
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:37:34 +01:00
										 |  |  |        # accept connections from outside
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |        (clientsocket, address) = serversocket.accept()
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:37:34 +01:00
										 |  |  |        # now do something with the clientsocket
 | 
					
						
							|  |  |  |        # in this case, we'll pretend this is a threaded server
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |        ct = client_thread(clientsocket)
 | 
					
						
							|  |  |  |        ct.run()
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | There's actually 3 general ways in which this loop could work - dispatching a
 | 
					
						
							|  |  |  | thread to handle ``clientsocket``, create a new process to handle
 | 
					
						
							|  |  |  | ``clientsocket``, or restructure this app to use non-blocking sockets, and
 | 
					
						
							| 
									
										
										
										
											2016-07-28 01:11:04 +00:00
										 |  |  | multiplex between our "server" socket and any active ``clientsocket``\ s using
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | ``select``. More about that later. The important thing to understand now is
 | 
					
						
							|  |  |  | this: this is *all* a "server" socket does. It doesn't send any data. It doesn't
 | 
					
						
							|  |  |  | receive any data. It just produces "client" sockets. Each ``clientsocket`` is
 | 
					
						
							|  |  |  | created in response to some *other* "client" socket doing a ``connect()`` to the
 | 
					
						
							|  |  |  | host and port we're bound to. As soon as we've created that ``clientsocket``, we
 | 
					
						
							|  |  |  | go back to listening for more connections. The two "clients" are free to chat it
 | 
					
						
							|  |  |  | up - they are using some dynamically allocated port which will be recycled when
 | 
					
						
							|  |  |  | the conversation ends.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | IPC
 | 
					
						
							|  |  |  | ---
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you need fast IPC between two processes on one machine, you should look into
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:43:32 +01:00
										 |  |  | pipes or shared memory.  If you do decide to use AF_INET sockets, bind the
 | 
					
						
							|  |  |  | "server" socket to ``'localhost'``. On most platforms, this will take a
 | 
					
						
							|  |  |  | shortcut around a couple of layers of network code and be quite a bit faster.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:43:32 +01:00
										 |  |  | .. seealso::
 | 
					
						
							|  |  |  |    The :mod:`multiprocessing` integrates cross-platform IPC into a higher-level
 | 
					
						
							|  |  |  |    API.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Using a Socket
 | 
					
						
							|  |  |  | ==============
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The first thing to note, is that the web browser's "client" socket and the web
 | 
					
						
							|  |  |  | server's "client" socket are identical beasts. That is, this is a "peer to peer"
 | 
					
						
							|  |  |  | conversation. Or to put it another way, *as the designer, you will have to
 | 
					
						
							|  |  |  | decide what the rules of etiquette are for a conversation*. Normally, the
 | 
					
						
							|  |  |  | ``connect``\ ing socket starts the conversation, by sending in a request, or
 | 
					
						
							|  |  |  | perhaps a signon. But that's a design decision - it's not a rule of sockets.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Now there are two sets of verbs to use for communication. You can use ``send``
 | 
					
						
							|  |  |  | and ``recv``, or you can transform your client socket into a file-like beast and
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | use ``read`` and ``write``. The latter is the way Java presents its sockets.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | I'm not going to talk about it here, except to warn you that you need to use
 | 
					
						
							|  |  |  | ``flush`` on sockets. These are buffered "files", and a common mistake is to
 | 
					
						
							|  |  |  | ``write`` something, and then ``read`` for a reply. Without a ``flush`` in
 | 
					
						
							|  |  |  | there, you may wait forever for the reply, because the request may still be in
 | 
					
						
							|  |  |  | your output buffer.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-04-23 19:45:07 +02:00
										 |  |  | Now we come to the major stumbling block of sockets - ``send`` and ``recv`` operate
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | on the network buffers. They do not necessarily handle all the bytes you hand
 | 
					
						
							|  |  |  | them (or expect from them), because their major focus is handling the network
 | 
					
						
							|  |  |  | buffers. In general, they return when the associated network buffers have been
 | 
					
						
							|  |  |  | filled (``send``) or emptied (``recv``). They then tell you how many bytes they
 | 
					
						
							|  |  |  | handled. It is *your* responsibility to call them again until your message has
 | 
					
						
							|  |  |  | been completely dealt with.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When a ``recv`` returns 0 bytes, it means the other side has closed (or is in
 | 
					
						
							|  |  |  | the process of closing) the connection.  You will not receive any more data on
 | 
					
						
							|  |  |  | this connection. Ever.  You may be able to send data successfully; I'll talk
 | 
					
						
							| 
									
										
										
										
											2012-04-23 19:45:07 +02:00
										 |  |  | more about this later.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | A protocol like HTTP uses a socket for only one transfer. The client sends a
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | request, then reads a reply.  That's it. The socket is discarded. This means that
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | a client can detect the end of the reply by receiving 0 bytes.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | But if you plan to reuse your socket for further transfers, you need to realize
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | that *there is no* :abbr:`EOT (End of Transfer)` *on a socket.* I repeat: if a socket
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | ``send`` or ``recv`` returns after handling 0 bytes, the connection has been
 | 
					
						
							|  |  |  | broken.  If the connection has *not* been broken, you may wait on a ``recv``
 | 
					
						
							|  |  |  | forever, because the socket will *not* tell you that there's nothing more to
 | 
					
						
							|  |  |  | read (for now).  Now if you think about that a bit, you'll come to realize a
 | 
					
						
							|  |  |  | fundamental truth of sockets: *messages must either be fixed length* (yuck), *or
 | 
					
						
							|  |  |  | be delimited* (shrug), *or indicate how long they are* (much better), *or end by
 | 
					
						
							|  |  |  | shutting down the connection*. The choice is entirely yours, (but some ways are
 | 
					
						
							|  |  |  | righter than others).
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Assuming you don't want to end the connection, the simplest solution is a fixed
 | 
					
						
							|  |  |  | length message::
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-06-27 16:34:14 +03:00
										 |  |  |    class MySocket:
 | 
					
						
							| 
									
										
										
										
											2009-01-03 21:18:54 +00:00
										 |  |  |        """demonstration class only
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |          - coded for clarity, not efficiency
 | 
					
						
							| 
									
										
										
										
											2007-09-10 00:47:20 +00:00
										 |  |  |        """
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |        def __init__(self, sock=None):
 | 
					
						
							| 
									
										
										
										
											2009-01-03 21:26:05 +00:00
										 |  |  |            if sock is None:
 | 
					
						
							|  |  |  |                self.sock = socket.socket(
 | 
					
						
							|  |  |  |                                socket.AF_INET, socket.SOCK_STREAM)
 | 
					
						
							| 
									
										
										
										
											2014-06-27 16:34:14 +03:00
										 |  |  |            else:
 | 
					
						
							|  |  |  |                self.sock = sock
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |        def connect(self, host, port):
 | 
					
						
							| 
									
										
										
										
											2007-09-10 00:47:20 +00:00
										 |  |  |            self.sock.connect((host, port))
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |        def mysend(self, msg):
 | 
					
						
							| 
									
										
										
										
											2009-01-03 21:26:05 +00:00
										 |  |  |            totalsent = 0
 | 
					
						
							|  |  |  |            while totalsent < MSGLEN:
 | 
					
						
							|  |  |  |                sent = self.sock.send(msg[totalsent:])
 | 
					
						
							|  |  |  |                if sent == 0:
 | 
					
						
							|  |  |  |                    raise RuntimeError("socket connection broken")
 | 
					
						
							|  |  |  |                totalsent = totalsent + sent
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |        def myreceive(self):
 | 
					
						
							| 
									
										
										
										
											2014-05-18 21:02:25 +01:00
										 |  |  |            chunks = []
 | 
					
						
							|  |  |  |            bytes_recd = 0
 | 
					
						
							|  |  |  |            while bytes_recd < MSGLEN:
 | 
					
						
							|  |  |  |                chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048))
 | 
					
						
							| 
									
										
										
										
											2011-05-29 17:15:44 +02:00
										 |  |  |                if chunk == b'':
 | 
					
						
							| 
									
										
										
										
											2009-01-03 21:26:05 +00:00
										 |  |  |                    raise RuntimeError("socket connection broken")
 | 
					
						
							| 
									
										
										
										
											2014-05-26 15:10:42 -07:00
										 |  |  |                chunks.append(chunk)
 | 
					
						
							| 
									
										
										
										
											2014-05-18 21:02:25 +01:00
										 |  |  |                bytes_recd = bytes_recd + len(chunk)
 | 
					
						
							|  |  |  |            return b''.join(chunks)
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | The sending code here is usable for almost any messaging scheme - in Python you
 | 
					
						
							|  |  |  | send strings, and you can use ``len()`` to determine its length (even if it has
 | 
					
						
							|  |  |  | embedded ``\0`` characters). It's mostly the receiving code that gets more
 | 
					
						
							|  |  |  | complex. (And in C, it's not much worse, except you can't use ``strlen`` if the
 | 
					
						
							|  |  |  | message has embedded ``\0``\ s.)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The easiest enhancement is to make the first character of the message an
 | 
					
						
							|  |  |  | indicator of message type, and have the type determine the length. Now you have
 | 
					
						
							|  |  |  | two ``recv``\ s - the first to get (at least) that first character so you can
 | 
					
						
							|  |  |  | look up the length, and the second in a loop to get the rest. If you decide to
 | 
					
						
							|  |  |  | go the delimited route, you'll be receiving in some arbitrary chunk size, (4096
 | 
					
						
							|  |  |  | or 8192 is frequently a good match for network buffer sizes), and scanning what
 | 
					
						
							|  |  |  | you've received for a delimiter.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | One complication to be aware of: if your conversational protocol allows multiple
 | 
					
						
							|  |  |  | messages to be sent back to back (without some kind of reply), and you pass
 | 
					
						
							|  |  |  | ``recv`` an arbitrary chunk size, you may end up reading the start of a
 | 
					
						
							|  |  |  | following message. You'll need to put that aside and hold onto it, until it's
 | 
					
						
							|  |  |  | needed.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-20 12:58:38 -04:00
										 |  |  | Prefixing the message with its length (say, as 5 numeric characters) gets more
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | complex, because (believe it or not), you may not get all 5 characters in one
 | 
					
						
							|  |  |  | ``recv``. In playing around, you'll get away with it; but in high network loads,
 | 
					
						
							|  |  |  | your code will very quickly break unless you use two ``recv`` loops - the first
 | 
					
						
							|  |  |  | to determine the length, the second to get the data part of the message. Nasty.
 | 
					
						
							|  |  |  | This is also when you'll discover that ``send`` does not always manage to get
 | 
					
						
							|  |  |  | rid of everything in one pass. And despite having read this, you will eventually
 | 
					
						
							|  |  |  | get bit by it!
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In the interests of space, building your character, (and preserving my
 | 
					
						
							|  |  |  | competitive position), these enhancements are left as an exercise for the
 | 
					
						
							|  |  |  | reader. Lets move on to cleaning up.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Binary Data
 | 
					
						
							|  |  |  | -----------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | It is perfectly possible to send binary data over a socket. The major problem is
 | 
					
						
							|  |  |  | that not all machines use the same formats for binary data. For example, a
 | 
					
						
							|  |  |  | Motorola chip will represent a 16 bit integer with the value 1 as the two hex
 | 
					
						
							|  |  |  | bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
 | 
					
						
							|  |  |  | Socket libraries have calls for converting 16 and 32 bit integers - ``ntohl,
 | 
					
						
							|  |  |  | htonl, ntohs, htons`` where "n" means *network* and "h" means *host*, "s" means
 | 
					
						
							|  |  |  | *short* and "l" means *long*. Where network order is host order, these do
 | 
					
						
							|  |  |  | nothing, but where the machine is byte-reversed, these swap the bytes around
 | 
					
						
							|  |  |  | appropriately.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In these days of 32 bit machines, the ascii representation of binary data is
 | 
					
						
							|  |  |  | frequently smaller than the binary representation. That's because a surprising
 | 
					
						
							|  |  |  | amount of the time, all those longs have the value 0, or maybe 1. The string "0"
 | 
					
						
							|  |  |  | would be two bytes, while binary is four. Of course, this doesn't fit well with
 | 
					
						
							|  |  |  | fixed-length messages. Decisions, decisions.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Disconnecting
 | 
					
						
							|  |  |  | =============
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Strictly speaking, you're supposed to use ``shutdown`` on a socket before you
 | 
					
						
							|  |  |  | ``close`` it.  The ``shutdown`` is an advisory to the socket at the other end.
 | 
					
						
							|  |  |  | Depending on the argument you pass it, it can mean "I'm not going to send
 | 
					
						
							|  |  |  | anymore, but I'll still listen", or "I'm not listening, good riddance!".  Most
 | 
					
						
							|  |  |  | socket libraries, however, are so used to programmers neglecting to use this
 | 
					
						
							|  |  |  | piece of etiquette that normally a ``close`` is the same as ``shutdown();
 | 
					
						
							|  |  |  | close()``.  So in most situations, an explicit ``shutdown`` is not needed.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | One way to use ``shutdown`` effectively is in an HTTP-like exchange. The client
 | 
					
						
							|  |  |  | sends a request and then does a ``shutdown(1)``. This tells the server "This
 | 
					
						
							|  |  |  | client is done sending, but can still receive."  The server can detect "EOF" by
 | 
					
						
							|  |  |  | a receive of 0 bytes. It can assume it has the complete request.  The server
 | 
					
						
							|  |  |  | sends a reply. If the ``send`` completes successfully then, indeed, the client
 | 
					
						
							|  |  |  | was still receiving.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Python takes the automatic shutdown a step further, and says that when a socket
 | 
					
						
							|  |  |  | is garbage collected, it will automatically do a ``close`` if it's needed. But
 | 
					
						
							|  |  |  | relying on this is a very bad habit. If your socket just disappears without
 | 
					
						
							|  |  |  | doing a ``close``, the socket at the other end may hang indefinitely, thinking
 | 
					
						
							|  |  |  | you're just being slow. *Please* ``close`` your sockets when you're done.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When Sockets Die
 | 
					
						
							|  |  |  | ----------------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Probably the worst thing about using blocking sockets is what happens when the
 | 
					
						
							|  |  |  | other side comes down hard (without doing a ``close``). Your socket is likely to
 | 
					
						
							| 
									
										
										
										
											2011-12-05 01:46:35 +01:00
										 |  |  | hang. TCP is a reliable protocol, and it will wait a long, long time
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | before giving up on a connection. If you're using threads, the entire thread is
 | 
					
						
							|  |  |  | essentially dead. There's not much you can do about it. As long as you aren't
 | 
					
						
							|  |  |  | doing something dumb, like holding a lock while doing a blocking read, the
 | 
					
						
							|  |  |  | thread isn't really consuming much in the way of resources. Do *not* try to kill
 | 
					
						
							|  |  |  | the thread - part of the reason that threads are more efficient than processes
 | 
					
						
							|  |  |  | is that they avoid the overhead associated with the automatic recycling of
 | 
					
						
							|  |  |  | resources. In other words, if you do manage to kill the thread, your whole
 | 
					
						
							|  |  |  | process is likely to be screwed up.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Non-blocking Sockets
 | 
					
						
							|  |  |  | ====================
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2010-10-06 08:56:53 +00:00
										 |  |  | If you've understood the preceding, you already know most of what you need to
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | know about the mechanics of using sockets. You'll still use the same calls, in
 | 
					
						
							|  |  |  | much the same ways. It's just that, if you do it right, your app will be almost
 | 
					
						
							|  |  |  | inside-out.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In Python, you use ``socket.setblocking(0)`` to make it non-blocking. In C, it's
 | 
					
						
							|  |  |  | more complex, (for one thing, you'll need to choose between the BSD flavor
 | 
					
						
							|  |  |  | ``O_NONBLOCK`` and the almost indistinguishable Posix flavor ``O_NDELAY``, which
 | 
					
						
							|  |  |  | is completely different from ``TCP_NODELAY``), but it's the exact same idea. You
 | 
					
						
							|  |  |  | do this after creating the socket, but before using it. (Actually, if you're
 | 
					
						
							|  |  |  | nuts, you can switch back and forth.)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The major mechanical difference is that ``send``, ``recv``, ``connect`` and
 | 
					
						
							|  |  |  | ``accept`` can return without having done anything. You have (of course) a
 | 
					
						
							|  |  |  | number of choices. You can check return code and error codes and generally drive
 | 
					
						
							|  |  |  | yourself crazy. If you don't believe me, try it sometime. Your app will grow
 | 
					
						
							|  |  |  | large, buggy and suck CPU. So let's skip the brain-dead solutions and do it
 | 
					
						
							|  |  |  | right.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Use ``select``.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In C, coding ``select`` is fairly complex. In Python, it's a piece of cake, but
 | 
					
						
							|  |  |  | it's close enough to the C version that if you understand ``select`` in Python,
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | you'll have little trouble with it in C::
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |    ready_to_read, ready_to_write, in_error = \
 | 
					
						
							|  |  |  |                   select.select(
 | 
					
						
							| 
									
										
										
										
											2009-01-03 21:18:54 +00:00
										 |  |  |                      potential_readers,
 | 
					
						
							|  |  |  |                      potential_writers,
 | 
					
						
							|  |  |  |                      potential_errs,
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |                      timeout)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You pass ``select`` three lists: the first contains all sockets that you might
 | 
					
						
							|  |  |  | want to try reading; the second all the sockets you might want to try writing
 | 
					
						
							|  |  |  | to, and the last (normally left empty) those that you want to check for errors.
 | 
					
						
							|  |  |  | You should note that a socket can go into more than one list. The ``select``
 | 
					
						
							|  |  |  | call is blocking, but you can give it a timeout. This is generally a sensible
 | 
					
						
							|  |  |  | thing to do - give it a nice long timeout (say a minute) unless you have good
 | 
					
						
							|  |  |  | reason to do otherwise.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-14 09:17:52 +03:00
										 |  |  | In return, you will get three lists. They contain the sockets that are actually
 | 
					
						
							| 
									
										
											  
											
												Merged revisions 60481,60485,60489-60492,60494-60496,60498-60499,60501-60503,60505-60506,60508-60509,60523-60524,60532,60543,60545,60547-60548,60552,60554,60556-60559,60561-60562,60569,60571-60572,60574,60576-60583,60585-60586,60589,60591,60594-60595,60597-60598,60600-60601,60606-60612,60615,60617,60619-60621,60623-60625,60627-60629,60631,60633,60635,60647,60650,60652,60654,60656,60658-60659,60664-60666,60668-60670,60672,60676,60678,60680-60683,60685-60686,60688,60690,60692-60694,60697-60700,60705-60706,60708,60711,60714,60720,60724-60730,60732,60736,60742,60744,60746,60748,60750-60751,60753,60756-60757,60759-60761,60763-60764,60766,60769-60770,60774-60784,60787-60789,60793,60796,60799-60809,60812-60813,60815-60821,60823-60826,60828-60829,60831-60834,60836,60838-60839,60846-60849,60852-60854,60856-60859,60861-60870,60874-60875,60880-60881,60886,60888-60890,60892,60894-60898,60900-60931,60933-60958 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
  r60901 | eric.smith | 2008-02-19 14:21:56 +0100 (Tue, 19 Feb 2008) | 1 line
  Added PEP 3101.
........
  r60907 | georg.brandl | 2008-02-20 20:12:36 +0100 (Wed, 20 Feb 2008) | 2 lines
  Fixes contributed by Ori Avtalion.
........
  r60909 | eric.smith | 2008-02-21 00:34:22 +0100 (Thu, 21 Feb 2008) | 1 line
  Trim leading zeros from a floating point exponent, per C99.  See issue 1600.  As far as I know, this only affects Windows.  Add float type 'n' to PyOS_ascii_formatd (see PEP 3101 for 'n' description).
........
  r60910 | eric.smith | 2008-02-21 00:39:28 +0100 (Thu, 21 Feb 2008) | 1 line
  Now that PyOS_ascii_formatd supports the 'n' format, simplify the float formatting code to just call it.
........
  r60918 | andrew.kuchling | 2008-02-21 15:23:38 +0100 (Thu, 21 Feb 2008) | 2 lines
  Close manifest file.
  This change doesn't make any difference to CPython, but is a necessary fix for Jython.
........
  r60921 | guido.van.rossum | 2008-02-21 18:46:16 +0100 (Thu, 21 Feb 2008) | 2 lines
  Remove news about float repr() -- issue 1580 is still in limbo.
........
  r60923 | guido.van.rossum | 2008-02-21 19:18:37 +0100 (Thu, 21 Feb 2008) | 5 lines
  Removed uses of dict.has_key() from distutils, and uses of
  callable() from copy_reg.py, so the interpreter now starts up
  without warnings when '-3' is given.  More work like this needs to
  be done in the rest of the stdlib.
........
  r60924 | thomas.heller | 2008-02-21 19:28:48 +0100 (Thu, 21 Feb 2008) | 4 lines
  configure.ac: Remove the configure check for _Bool, it is already done in the
  top-level Python configure script.
  configure, fficonfig.h.in: regenerated.
........
  r60925 | thomas.heller | 2008-02-21 19:52:20 +0100 (Thu, 21 Feb 2008) | 3 lines
  Replace 'has_key()' with 'in'.
  Replace 'raise Error, stuff' with 'raise Error(stuff)'.
........
  r60927 | raymond.hettinger | 2008-02-21 20:24:53 +0100 (Thu, 21 Feb 2008) | 1 line
  Update more instances of has_key().
........
  r60928 | guido.van.rossum | 2008-02-21 20:46:35 +0100 (Thu, 21 Feb 2008) | 3 lines
  Fix a few typos and layout glitches (more work is needed).
  Move 2.5 news to Misc/HISTORY.
........
  r60936 | georg.brandl | 2008-02-21 21:33:38 +0100 (Thu, 21 Feb 2008) | 2 lines
  #2079: typo in userdict docs.
........
  r60938 | georg.brandl | 2008-02-21 21:38:13 +0100 (Thu, 21 Feb 2008) | 2 lines
  Part of #2154: minimal syntax fixes in doc example snippets.
........
  r60942 | raymond.hettinger | 2008-02-22 04:16:42 +0100 (Fri, 22 Feb 2008) | 1 line
  First draft for itertools.product().  Docs and other updates forthcoming.
........
  r60955 | nick.coghlan | 2008-02-22 11:54:06 +0100 (Fri, 22 Feb 2008) | 1 line
  Try to make command line error messages from runpy easier to understand (and suppress traceback cruft from the implicitly invoked runpy machinery)
........
  r60956 | georg.brandl | 2008-02-22 13:31:45 +0100 (Fri, 22 Feb 2008) | 2 lines
  A lot more typo fixes by Ori Avtalion.
........
  r60957 | georg.brandl | 2008-02-22 13:56:34 +0100 (Fri, 22 Feb 2008) | 2 lines
  Don't reference pyshell.
........
  r60958 | georg.brandl | 2008-02-22 13:57:05 +0100 (Fri, 22 Feb 2008) | 2 lines
  Another fix.
........
											
										 
											2008-02-22 16:37:40 +00:00
										 |  |  | readable, writable and in error. Each of these lists is a subset (possibly
 | 
					
						
							| 
									
										
										
										
											2011-05-22 06:56:15 +03:00
										 |  |  | empty) of the corresponding list you passed in.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | If a socket is in the output readable list, you can be
 | 
					
						
							|  |  |  | as-close-to-certain-as-we-ever-get-in-this-business that a ``recv`` on that
 | 
					
						
							|  |  |  | socket will return *something*. Same idea for the writable list. You'll be able
 | 
					
						
							|  |  |  | to send *something*. Maybe not all you want to, but *something* is better than
 | 
					
						
							|  |  |  | nothing.  (Actually, any reasonably healthy socket will return as writable - it
 | 
					
						
							|  |  |  | just means outbound network buffer space is available.)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you have a "server" socket, put it in the potential_readers list. If it comes
 | 
					
						
							|  |  |  | out in the readable list, your ``accept`` will (almost certainly) work. If you
 | 
					
						
							|  |  |  | have created a new socket to ``connect`` to someone else, put it in the
 | 
					
						
							| 
									
										
											  
											
												Merged revisions 60481,60485,60489-60492,60494-60496,60498-60499,60501-60503,60505-60506,60508-60509,60523-60524,60532,60543,60545,60547-60548,60552,60554,60556-60559,60561-60562,60569,60571-60572,60574,60576-60583,60585-60586,60589,60591,60594-60595,60597-60598,60600-60601,60606-60612,60615,60617,60619-60621,60623-60625,60627-60629,60631,60633,60635,60647,60650,60652,60654,60656,60658-60659,60664-60666,60668-60670,60672,60676,60678,60680-60683,60685-60686,60688,60690,60692-60694,60697-60700,60705-60706,60708,60711,60714,60720,60724-60730,60732,60736,60742,60744,60746,60748,60750-60751,60753,60756-60757,60759-60761,60763-60764,60766,60769-60770,60774-60784,60787-60789,60793,60796,60799-60809,60812-60813,60815-60821,60823-60826,60828-60829,60831-60834,60836,60838-60839,60846-60849,60852-60854,60856-60859,60861-60870,60874-60875,60880-60881,60886,60888-60890,60892,60894-60898,60900-60931,60933-60958 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
  r60901 | eric.smith | 2008-02-19 14:21:56 +0100 (Tue, 19 Feb 2008) | 1 line
  Added PEP 3101.
........
  r60907 | georg.brandl | 2008-02-20 20:12:36 +0100 (Wed, 20 Feb 2008) | 2 lines
  Fixes contributed by Ori Avtalion.
........
  r60909 | eric.smith | 2008-02-21 00:34:22 +0100 (Thu, 21 Feb 2008) | 1 line
  Trim leading zeros from a floating point exponent, per C99.  See issue 1600.  As far as I know, this only affects Windows.  Add float type 'n' to PyOS_ascii_formatd (see PEP 3101 for 'n' description).
........
  r60910 | eric.smith | 2008-02-21 00:39:28 +0100 (Thu, 21 Feb 2008) | 1 line
  Now that PyOS_ascii_formatd supports the 'n' format, simplify the float formatting code to just call it.
........
  r60918 | andrew.kuchling | 2008-02-21 15:23:38 +0100 (Thu, 21 Feb 2008) | 2 lines
  Close manifest file.
  This change doesn't make any difference to CPython, but is a necessary fix for Jython.
........
  r60921 | guido.van.rossum | 2008-02-21 18:46:16 +0100 (Thu, 21 Feb 2008) | 2 lines
  Remove news about float repr() -- issue 1580 is still in limbo.
........
  r60923 | guido.van.rossum | 2008-02-21 19:18:37 +0100 (Thu, 21 Feb 2008) | 5 lines
  Removed uses of dict.has_key() from distutils, and uses of
  callable() from copy_reg.py, so the interpreter now starts up
  without warnings when '-3' is given.  More work like this needs to
  be done in the rest of the stdlib.
........
  r60924 | thomas.heller | 2008-02-21 19:28:48 +0100 (Thu, 21 Feb 2008) | 4 lines
  configure.ac: Remove the configure check for _Bool, it is already done in the
  top-level Python configure script.
  configure, fficonfig.h.in: regenerated.
........
  r60925 | thomas.heller | 2008-02-21 19:52:20 +0100 (Thu, 21 Feb 2008) | 3 lines
  Replace 'has_key()' with 'in'.
  Replace 'raise Error, stuff' with 'raise Error(stuff)'.
........
  r60927 | raymond.hettinger | 2008-02-21 20:24:53 +0100 (Thu, 21 Feb 2008) | 1 line
  Update more instances of has_key().
........
  r60928 | guido.van.rossum | 2008-02-21 20:46:35 +0100 (Thu, 21 Feb 2008) | 3 lines
  Fix a few typos and layout glitches (more work is needed).
  Move 2.5 news to Misc/HISTORY.
........
  r60936 | georg.brandl | 2008-02-21 21:33:38 +0100 (Thu, 21 Feb 2008) | 2 lines
  #2079: typo in userdict docs.
........
  r60938 | georg.brandl | 2008-02-21 21:38:13 +0100 (Thu, 21 Feb 2008) | 2 lines
  Part of #2154: minimal syntax fixes in doc example snippets.
........
  r60942 | raymond.hettinger | 2008-02-22 04:16:42 +0100 (Fri, 22 Feb 2008) | 1 line
  First draft for itertools.product().  Docs and other updates forthcoming.
........
  r60955 | nick.coghlan | 2008-02-22 11:54:06 +0100 (Fri, 22 Feb 2008) | 1 line
  Try to make command line error messages from runpy easier to understand (and suppress traceback cruft from the implicitly invoked runpy machinery)
........
  r60956 | georg.brandl | 2008-02-22 13:31:45 +0100 (Fri, 22 Feb 2008) | 2 lines
  A lot more typo fixes by Ori Avtalion.
........
  r60957 | georg.brandl | 2008-02-22 13:56:34 +0100 (Fri, 22 Feb 2008) | 2 lines
  Don't reference pyshell.
........
  r60958 | georg.brandl | 2008-02-22 13:57:05 +0100 (Fri, 22 Feb 2008) | 2 lines
  Another fix.
........
											
										 
											2008-02-22 16:37:40 +00:00
										 |  |  | potential_writers list. If it shows up in the writable list, you have a decent
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | chance that it has connected.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Actually, ``select`` can be handy even with blocking sockets. It's one way of
 | 
					
						
							|  |  |  | determining whether you will block - the socket returns as readable when there's
 | 
					
						
							|  |  |  | something in the buffers.  However, this still doesn't help with the problem of
 | 
					
						
							|  |  |  | determining whether the other end is done, or just busy with something else.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Portability alert**: On Unix, ``select`` works both with the sockets and
 | 
					
						
							|  |  |  | files. Don't try this on Windows. On Windows, ``select`` works with sockets
 | 
					
						
							|  |  |  | only. Also note that in C, many of the more advanced socket options are done
 | 
					
						
							|  |  |  | differently on Windows. In fact, on Windows I usually use threads (which work
 | 
					
						
							| 
									
										
										
										
											2011-06-06 10:25:55 +02:00
										 |  |  | very, very well) with my sockets.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 |