< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039199167010047123209178152124239215162148045048066095090075027

HTTP

HTTP (Hypertext Transfer Protocol) is a simple text-based protocol used for World Wide Web Applications. Both Web servers and Web browsers implement this protocol.

The HTTP protocol works by having a client that opens a connection, and sends a request header to a Web server. This request is a simple text-based form that contains the request method (GET, POST, PUT, …), the name of the file that should be opened, and so forth.

The server interprets the request and returns a response to the client. This response contains the HTTP protocol version number, as well as a lot of information—such as cookies, document type and size, and so on—about the returned document.

For details about the HTTP specification, you'd better check:

					
http://www.w3.org/Protocols
				

Next, I list some Python projects that somehow use HTTP techniques.

M2Crypto, by Ng Pheng Siong's

M2Crypto makes the following features available to the Python programmer: RSA, DH, DSA, HMACs, message digests, symmetric ciphers, SSL functionality to implement clients and servers, and S/MIME v2.

						
http://mars.post1.com/home/ngps/m2/
					

Note

With Python-2.0, the socket module can be compiled with support for the OpenSSL library, so it can handle SSL without trouble.



CTC (Cut The Crap), by Constantinos Kotsokalis

This is a http proxy software written in Python, which cuts advertisement banners from your Web browser display.

						
http://softlab.ntua.gr/~ckotso/CTC/
					

Alfajor, by Andrew Cooke

Alfajor is an HTTP cookie filter, written in Python with an optional GUI. It acts as an HTTP proxy (you must configure your browser to use it) and can either contact sites directly or work with a second proxy (for example, a cache). Note that Alfajor does not fully conform to any HTTP version. However, in practice, it works with the vast majority of sites.

						
http://www.andrewcooke.free-online.co.uk/jara/alfajor/
					

Building Web Servers

In order to build Internet servers using Python, you can use the following modules:

SocketServer—   It is a generic socket-based IP server.

BaseHTTPServer—   It provides the infrastructed required by the next two modules.

SimpleHTTPServer—   It allows you to have a simple Web server.

CGIHTTPServer—   It enables the implementation of a CGI-compliant HTTP server.

The SocketServer Module

The SocketServer module exposes a framework that simplifies the task of writing network servers. Rather than having to implement servers using the low-level socket module, this module provides four basic server classes that implement interfaces to the protocols used most often: TCPServer, UDPServer, StreamRequestHandler, and DatagramRequestHandler. All these classes process requests synchronously. Each request must be completed before the next request can be started.

This kind of behavior is not appropriate if each request takes a long time to complete because it requires a lot of computation and the client might be slow to process all data. In order to handle the requests as separate threads, you can use the following classes: ThreadingTCPServer, ThreadingUDPServer, ForkingUDPServer, and ForkingTCPServer.

Both the StreamRequestHandler and DatagramRequestHandler classes provide two file attributes that can be used to read and write data from and to the client program. These attributes are self.rfile and self.wfile.

The following code demonstrates the usage of the StreamRequestHandler class, which is exposed by the SocketServer module.

							
import SocketServer
port = 8000
class myRequestHandler(SocketServer.StreamRequestHandler):
    def handle(self):
        print "connection from ", self.client_address
        self.wfile.write("data")

srvsocket = SocketServer.TCPServer(("", port), myRequestHandler)
print "The socket is listening to port", port
srvsocket.serve_forever()

						

Tip

Always remember that you need to use user-accessible ports numbers.



Next, you have the classes provided by this module:

TCPServer((hostname, port), request_handler)—   Implements a server that supports the TCP protocol.

UDPServer((hostname, port), request_handler)—   Implements a server that supports the UDP protocol.

UnixStreamServer((hostname, port), request_handler)—   Implements a server that supports a stream-oriented protocol using UNIX domain sockets.

UnixDatagramServer((hostname, port), request_handler)—   Implements a server that supports a datagram-oriented protocol using UNIX domain sockets.

In all four classes, the request_handler must be an instance of the BaseRequestHandler class, and usually, hostname is left blank.

Each one of these classes has its own instances of class variables.

request_queue_size stores the size of the request queue that is passed to the socket's listen() method.

socket_type returns the socket type used by the server. The possible values are socket.SOCK_STREAM and socket.SOCK_DGRAM.

The class instances implement the following methods and attributes:

fileno()—   Returns the server socket's integer file descriptor.

handle_request()—   Processes a single request, by creating an instance of the handler class and invoking its handle() method.

serve_forever()—   Implements a loop to handle infinite requests.

address_family—   Returns either socket.AF_INET or socket.AF_UNIX.

RequestHandlerClass—   Holds the request handler class, which was provided by the user.

server_address—   Returns the IP address and the port number being used by the server for listening.

socket—   Returns the socket object used for approaching requests.

The BaseHTTPServer Module

The BaseHTTPServer module defines two base classes for implementing basic HTTP servers (also known as Web servers). This module is built on top of the SocketServer module. Note that this module is rarely used directly. Instead, you should consider using the modules CGIHTTPServer and SimpleHTTPServer.

The following code demonstrates the usage of the BaseHTTPRequestHandler class, which is exposed by the BaseHTTPServer module, to implement a simple HTTP Server.

							
import BaseHTTPServer
htmlpage = """
<html><head><title>Web Page</title></head>
<body>Hello Python World</body>
</html>"""
notfound = "File not found"
class WelcomeHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path = "/":
            self.send_response(200)
            self.send_header("Content-type","text/html")
            self.end_headers()
            self.wfile.write(htmlpage)
        else:
            self.send_error(404, notfound)
httpserver = BaseHTTPServer.HTTPServer(("",80), WelcomeHandler)
httpserver.serve_forever()
					
						

The HTTPServer((hostname, port), request_handler_class) base class is derived from the SocketServer.TCPServer, hence, it implements the same methods. This class creates a HTTPServer object that listens to the hostname+port, and uses the request_handler_class to handle requests.

The second base class is called BaseHTTPRequestHandler(request, client_address, server). You need to create a subclass of this class in order to handle HTTP requests. If you need to handle GET requests, you must redefine the do_GET() method. On the other hand, if you need to handle POST requests, you must redefine the do_POST() method.

This class also implements some class variables:

  • BaseHTTPRequestHandler.server_version

  • BaseHTTPRequestHandler.sys_version

  • BaseHTTPRequestHandler.protocol_version

  • BaseHTTPRequestHandler.error_message_format

This string should contain the code for a complete Web page that must be sent to the client in case an error message must be displayed. Within the string, you can reference some error attributes because this string is dynamically linked to the contents of an error dictionary.

							
"""<head><title></title></head><body>
Error code = %(code)d<br>
Error message = %(message)s<br>
Error explanation = %(explain)s<br></body>"""

						

Each instance of the BaseHTTPRequestHandler class implements some methods and attributes:

handle()—   Implements a request dispatcher. It calls the methods that start with "do_", such as do_GET() and do_POST().

send_error(error_code [, error_message])—   Sends an error signal to the client.

send_response(response_code [, response_message])—   Sends a response header according to the Table 10.2.

Table 10.2. List of Response Codes and Messages Returned by the Web Server
Code Code Description
200 OK
201 Created
202 Accepted
204 No content available
300 Multiple choices
301 Moved permanently
302 Moved temporarily
303 Not modified
400 Bad request
401 Unauthorized
403 Forbidden
500 Internal server error
501 Not implemented
502 Bad gateway
503 Service unavailable

send_header(keyword, value)—   Writes a MIME header, which contains the header keyword and its value, to the output stream.

end_header()—   Identifies the end of the MIME headers.

The following object attributes are also exposed:

client_address—   Returns a 2-tuple (hostname, port) that compounds the client address.

command—   Identifies the request type, which can be POST, GET, and so on.

path—   Returns the request path.

request_version—   Returns the HTTP version string from the request.

headers—   Returns the HTTP headers.

rfile—   Exposes the input stream.

wfile—   Exposes the output stream.

The SimpleHTTPServer Module

The SimpleHTTPServer module provides a simple HTTP server request-handler class. It has an interface compatible with the BaseHTTPServer module that enables it to serve files from a base directory. This module implements both standard GET and HEAD request handlers, as shown in this example:

							
import SimpleHTTPServer
import SocketServer
ServerHandler = SimpleHTTPServer.SimpleHTTPRequestHandler
httpserver = BaseHTTPServer.HTTPServer(("", 80), ServerHandler)
httpserver.serve_forever()

						

The current directory used to start up the server is used as the relative reference for all files requested by the client. This module implements the SimpleHTTPRequestHandler(request, (hostname, port), server) class. This class exposes the following two attributes:

  • SimpleHTTPRequestHandler.server_version

  • SimpleHTTPRequestHandler.extensions_map—A dictionary that maps file suffixes and MIME types

The CGIHTTPServer Module

The CGIHTTPServer module defines another simple HTTP server request-handler class. This module has an interface compatible with BaseHTTPServer, which enables it to server files from a base directory (the current directory and its subdirectories), and also allow clients to run CGI (Common Gateway Interface) scripts.

Requests are handled using the do_GET and do_POST methods. You can override them in order to meet your needs. Note that the CGI scripts are executed as the user nobody. The next example demonstrates the implementation of a simple HTTP Server that accepts CGI requests.

							
import CGIHTTPServer
import BaseHTTPServer
class ServerHandler(CGIHTTPServer.CGIHTTPRequestHandler):
    cgi_directories = ['/cgi-bin']
httpserver = BaseHTTPServer.HTTPServer(("", 80), Handler)
httpserver.serve_forever()
					
						

The CGIHTTPRequestHandler(request, (hostname, port), server) class is provided by this module. This handler class supports both GET and POST requests. It also implements the CGIHTTPRequestHandler.cgi_directories attribute, which contains a list of directories that can store CGI scripts.

Setting Up the Client Side of the HTTP Protocol

The httplib module implements the client side of the HTTP (Hypertext Transfer Protocol) protocol, and is illustrated as follows:

						
import httplib
url = "www.lessaworld.com"
urlpath = "/default.html"
host = httplib.HTTP(url)
host.putrequest("GET", urlpath)
host.putheader("Accept", "text/html")
host.endheaders()

errcode, errmsg, headers host.getreply()
if errcode != 200:
    raise RuntimeError
htmlfile = host.getfile()
htmlpage = htmlfile.read()
htmlfile.close()
return htmlpage

					

The previous example doesn't allow you to handle multiple requests in parallel because the getreply() method blocks the application while waiting for the server to respond. You should consider using the asyncore module for a more efficient and asynchronous solution.

This module exposes the HTTP class. The HTTP([hostname [,port]]) class creates and returns a connection object. If no port is informed, port 80 is used; and if no arguments are informed at all, you need to use the connect() method to make the connection yourself. This class exposes the following methods:

connect(hostname [,port])—   Establishes a connection.

send(data)—   Sends data to the server after the endheaders() method is called.

putrequest(request, selector)—   Writes the first line in the client request header. The request option can be one of the following most common request methods: GET, POST, PUT, or HEAD. selector is the name of the document to be opened.

putheader(header, argument1 [, …])—   Writes a header line in the client request header. Each line consists of the header, a colon and a space, and the list of arguments.

endheaders()—   Indicates the end of the headers in the client request header by writing a blank line to the server.

getreply()—   Returns a tuple (requestcode, requestmsg, headers) that is read after closing the client side of the connection. This tuple comes from the server's reply to the client message. The pair requestcode and requestmsg is something like (500, "Internal server error"). headers is an instance of the mimetools.Message class, which contains the HTTP headers that were received from the server.

getfile()—   Wraps the data returned by the server as a file object in order to make reading it easy.

Note

Note that the httplib module packed with Python 2.0 has been rewritten by Greg Stein, in order to provide new interfaces and support for HTTP/1.1 features, such as pipelining. Backward compatibility with the 1.5 version of httplib is provided, but you should consider taking a look at the documentation strings of the module for details.

Also note that Python 2.0's version of the httplib module has support to " https:// " URLs over SSL.




Last updated on 1/30/2002
Python Developer's Handbook, © 2002 Sams Publishing

< BACKMake Note | BookmarkCONTINUE >

Index terms contained in this section

applications
      M2Crypto
asyncore module
attributes
      BaseHTTPRequestHandler class 2nd 3rd
      CGIHTTPRequestHandler class
      DatagramRequestHandler class
      SimpleHTTPRequestHandler class
      SimpleHTTPRequestHandler.extensions_map
      SimpleHTTPRequestHandler.server_version
      SocketServer module 2nd
      StreamRequestHandler class
BaseHTTPRequestHandler class 2nd 3rd
BaseHTTPServer module 2nd
building
      Web servers
CGIHTTPRequestHandler class
CGIHTTPServer module
classes
      BaseHTTPRequestHandler 2nd 3rd
      CGIHTTPRequestHandler class
      DatagramRequestHandler
      HTTP 2nd
      HTTPServer
      SimpleHTTPRequestHandler
      SocketServer module 2nd
      StreamRequestHandler
clients
      setting up, Hypertext Transfer Protocol (HTTP) 2nd
codes
      response, returned by Web servers
connect() method
creating
      Web servers
DatagramRequestHandler class
do_GET() method 2nd
do_POST() method 2nd
GET request handler
GET requests
getfile() method
getreply() method
handlers
      request, GET and HEAD
handling
      requests, CGIHTTPServer module
      requests, GET and POST
HEAD request handler
HTTP (Hypertext Transfer Protocol) 2nd 3rd 4th 5th 6th 7th 8th
HTTP class 2nd
httplib module 2nd 3rd 4th
HTTPServer class
Hypertext Transfer Protocol (HTTP) 2nd 3rd 4th 5th 6th 7th 8th
listen() method
M2Crypto
methods
      BaseHTTPRequestHandler class 2nd 3rd
      connect()
      do_GET() 2nd
      do_POST() 2nd
      getfile()
      getreply()
      HTTP class 2nd
      listen()
      SocketServer module 2nd
modules
      asyncore
      BaseHTTPServer 2nd
      CGIHTTPServer
      httplib 2nd 3rd 4th
      SimpleHTTPServer
      socket
      SocketServer 2nd 3rd
networking
      protocols 2nd 3rd 4th 5th 6th 7th 8th
POST requests
programs
      M2Crypto
protocols
      Hypertext Transfer (HTTP) 2nd 3rd 4th 5th 6th 7th 8th
request handlers
      GET and HEAD
request_queue_size variable
requests
      handling, CGIHTTPServer module
      handling, GET and POST
response codes
      returned by Web servers
self.rfile attribute
self.wfile attribute
servers
     Web
            building
     WebÓ
            Ò
setting up
      clients, Hypertext Transfer Protocol (HTTP) 2nd
SimpleHTTPRequestHandler class
SimpleHTTPRequestHandler.extensions_map attribute
SimpleHTTPRequestHandler.server_version attribute
SimpleHTTPServer module
socket module
socket_type variable
SocketServer module 2nd 3rd
software
      M2Crypto
Stein, Greg
StreamRequestHandler class
variables
      request_queue_size
      socket type
Web servers
      building
Web serversÓ
      Ò

© 2002, O'Reilly & Associates, Inc.