See All Titles |
![]() ![]() MIME Parsing and ManipulationMIME (Multipurpose Internet Mail Extensions) is a standard for sending multipart multimedia data through Internet mail. This standard exposes mechanisms for specifying and describing the format of Internet message bodies. A MIME-encoded message looks similar to the following: Content-Type: multipart/mixed; boundary="====_238659232==" Date: Mon, 03 Apr 2000 18:30:23 -0400 From: Andre Lessa <alessa@lessaworld.com> To: Renata Lessa <rlessa@lessaworld.com> Subject: Python Book ====_238659232== Content-Type: text/plain; charset="us-ascii" Sorry Honey, I am late for dinner. I am still writing Chapter 13. Meanwhile, take a look at the following Cooking material that you've asked me to find in the Internet. ====_238659232== Content-Type: application/msword; name="cookmasters.doc" Content-Transfer-Encoding: base64 Content-Disposition: attachment, filename=" cookmasters.doc" GgjEPgkwIr4G29m1Lawr7GgjEPgkwIr4G29m14tifkAb3qPgGgjEPgkwIr4G29m1La29m14tifkAb 3qPgGgjEPgkwIr4G29m1Law29m14tifkAb3qPgGgjEPgkwIr4G29m1Lawr629m14tifkAb3qPgIr4 G29m1Lawr2GgjEPgkwIr4G29m1Lawr29m14tifkAb3qPg29m14tifkAb3qPgGgjEPgkwIr4G29m1L awr8Ab3qPgGgjEPgkwIr4G29m1GgjEPgkwIr4G29m1Lawr7GgjEPgkwIr4G29m1Hawr0== ====_238659232== Note that the message is broken into parts, and each part is delimited by a boundary. The boundary itself works like a separator, and its value is defined in the first line of the message, right next to the first content-type. Every part starts with a boundary mark, and then it is followed by a set of RFC822 headers telling you what is the content-type and the encoding format of the data for that part, and next, separated by a blank line, we have the data itself. Check out the last line of the message. Do you see the trailing after the boundary? That's how the message identifies the final boundary. The next couple of modules are tools for mail and news message processing that use MIME messages. For more information, check out RFC 1521 http://info.internet.isi.edu/in-notes/rfc/files/rfc1521.txt rfc822The rfc822 module parses mail headers that are defined by the Internet standard RFC 822. This standard specifies the syntax for text messages that are sent among computer users, within the framework of electronic mail. These headers are used in a number of contexts including mail handling and in the HTTP protocol. For more information, check out
This module defines a class, Message, which represents a collection of email headers. It is used in various contexts, usually to read such headers from a file. This module also defines a helper class AddressList for parsing RFC822 addresses. A dictionary-like object represents the Message object, where the message headers are the dictionary keys. mimetoolsThe mimetools module provides utility tools for parsing and manipulation of MIME multipart and encoded messages. This module contains a special dictionary-like object called Message that collects some information about MIME encoded messages. mime-version,content-type,charset,to,date,from, and subject are some examples of dictionary keys that the object possesses. This module also implements some utility functions. The choose_boundary() function creates a unique boundary string. The next two functions encode and decode file objects based on the encoding format, which can be "quoted-printable", "base64", or "uuencode". The functions copyliteral(input, output) and copybinary(input, output) read the input file (until EOF) and write them to the output file object. Note that the objects must be opened. Take a look at the message = mimetools.Message(fileobject) function. This function returns a Message object derived from the rfc822.Message class. Therefore, it supports all the methods supported by rfc822.Message, plus the following ones:
MimeWriterThe MimeWriter module implements a generic file-writing class, also called MimeWriter, that is used to create MIME encoded multipart files (messages). message = MimeWriter.MimeWriter(fileobject_forwriting) The following function adds a header line ("key: value") to the MIME message. message.addheader(key, value [,prefix = 0]) If prefix = 0, the header line is appended to the end; if it is 1, the line is inserted at the start. Next, you have some methods that are exposed by the message object.
The next code introduces the basic usage of the MimeWriter module, along with other supporting modules. import MimeWriter import quopri, base64 msgtext = "This message has 3 images as attachments." files = ["sun.jpg", "rain.jpg", "beach.jpg"] mimefile = "mymessage.msg" mimemsg = MimeWriter.MimeWriter(sys.stdout) mimemsg.addheader("Mime-Version","1.0") mimemsg.startmultipartbody("mixed") msgpart = mimemsg.nextpart() msgpart.addheader("Content-Transfer-Encoding", "quoted-printable") msgpart.startbody("text/plain") quopri.encode(StringIO.StringIO(msgtext), mimefile, 0) for file in files: msgpart = mimemsg.nextpart() msgpart.addheader("Content-Transfer-Encoding", "base64") msgpart.startbody("text/jpeg") base64.encode(open(file, "rb"), mimefile) mimemsg.lastpart() multifileThe multifile module enables you to treat distinct parts of a text file as file-like input objects. Usually, it uses text files that are found in MIME encoded messages. This module works by splitting a file into logical blocks that are delimited by a unique boundary string. Next, you will be exposed to the class implemented by this module: MultiFile. MultiFile (fp[, seekable])Create a multifile. You must instantiate this class with an input object argument for the MultiFile instance to get lines from, such as a file object returned by open().MultiFile only looks at the input object's readline(), seek(), and tell() methods, and the latter two are only needed if you want random access to the individual MIME parts. To use MultiFile on a non-seekable stream object, set the optional seekable argument to false; this will prevent using the input object's seek() and tell() methods. It will be useful to know that in MultiFile's view of the world, text is composed of three kinds of lines: data, section-dividers, and end-markers. MultiFile is designed to support parsing of messages that might have multiple nested message parts, each with its own pattern for section-divider and end-marker lines. A MultiFile instance has the following methods:
Note
Note that this test is intended as a fast guard for the real boundary tests; if it always returns false, it will merely slow processing, not cause it to fail.
Finally, MultiFile instances have two public instance variables:
The following code exemplifies the multifile module. 1: import multifile 2: import rfc822, cgi 3: 4: multipart = "multipart/" 5: filename=open("mymail.msg") 6: msg = rfc822.Message(filename) 7: 8: msgtype, args = cgi.parse_header(msg["content-type"]) 9: 10: if msgtype[:10] == multipart: 11: multifilehandle = multifile.MultiFile(filename) 12: multifilehandle.push(args["boundary"]) 13: while multifilehandle.next(): 14: msg = rfc822.Message(multifilehandle) 15: print msg.read() 16: multifilehandle.pop() 17: else: 18: print "This is not a multi-part message!" 19: print "---------------------------------" 20: print filename.read() Line 6: msg is a dictionary-like object. You can apply dictionary methods to this object, such as msg.keys(),msg.values(), and msg.items(). Line 8: Parses the content-type header. Lines 11-16: Handles the multipart message. Line 15: Prints the multipart message. Line 20: Prints the plain message, when necessary. mailcapThe mailcap module is used to read mailcap files and to configure how MIME-aware applications react to files with different MIME types. Note
Mailcap files are used to inform applications, including mail readers and Web browsers, how to process files with different MIME types. A small section of a mailcap file looks like this: image/jpeg; imageviewer %s application/zip; gzip %s The next code demonstrates the usage of the mailcap module. >>> import mailcap >>> capsdict = mailcap.getcaps() >>> command, rawentry = mailcap.findmatch(capsdict, "image/jpeg", \ filename="/usr/local/uid213") >>> print command imageviewer /usr/local/uid213 >>> print rawentry image/jpeg; imageviewer %s The getcaps() function reads the mailcap file and returns a dictionary mapping MIME types to mailcap entries; and the findmatch() function searches the dictionary for a specific MIME entry, returning a command line ready to be executed along with the raw mailcap entry. mimetypesThe mimetypes module supports conversions between a filename or URL and the MIME type associated with the filename extension. Essentially, it is used to guess the MIME type associated with a file, based on its extension. For example,
A complete list of extensions and their associated MIME types can be found by typing import mimetypes for EXTENSION in mimetypes.types_map.keys(): print EXTENSION, " = ", mimetypes.types_map[EXTENSION] Next, you have a list of functions exposed by the mimetypes module.
The following dictionaries are also exposed by the mimetypes module.
base64The base64 module performs base64 encoding and decoding of arbitrary binary strings into text string that can be safely emailed or posted. This module is commonly used to encode binary data in mail attachments. The arguments of the next functions can be either filenames or file objects. The first argument is open for reading: base64.encode(messagefilehandle, outputfilehandle) The second argument is open for writing: base64.decode(encodedfilehandle, outputfilehandle) This module also implements the functions encodestring(stringtoencode) and decodestring(encodedstring), which are built on top of the encode and decode function. Both internally use the StringIO module in order to enable the use of the base64 module to encode and decode strings. Note that the decodestring() function returns a string that contains the decoded binary data. quopriThe quopri module performs quoted-printable transport encoding and decoding of MIME quoted-printable data, as defined in RFC 1521: "MIME (Multipurpose Internet Mail Extensions) Part One". The quoted-printable encoding is designed for data in which there are relatively few nonprintable characters; the base64 encoding scheme available via the base64 module is more compact if there are many such characters, as when sending a graphics file. This format is primarily used to encode text files. decode(input, output) decodes the contents of the input file and writes the resulting decoded binary data to the output file. input and output must either be file objects or objects that mimic the file object interface. input will be read until input.read() returns an empty string. encode(input, output, quotetabs) encodes the contents of the input file and writes the resulting quoted-printable data to the output file. input and output must either be file objects or objects that mimic the file object interface. input will be read until input.read() returns an empty string. This module only supports file-to-file conversions. If you need to handle string objects, you need to convert them using the StringIO module. import quopri quopri.encode(infile, outfile, tabs=0) quopri.decode(infile, outfile) This module is purely based on plain U.S. ASCII text. Non-U.S. characters are mapped to an = followed by two hexadecimal digits. The = character resembles =3D, and whitespaces at the end of lines are represented by =20. mailboxThe mailbox module implements classes that allow easy and uniform access to read various mailbox formats in a UNIX system. import mailbox mailboxname = "/tmp/mymailbox" mbox = mailbox.UnixMailbox(open(mailboxname)) msgcounter = 0 while 1: mailmsg = mbox.next() if not mailmsg: break msgcounter = msgcounter + 1 messagebody = mailmsg.fp.read() print messagebody print print "The message counter is %d" % (msgcounter) mimifyThe mimify module has functions to convert and process simple and multi-part mail messages to/from MIME formatmessages are converted to plain text. This module can be used either as a command line tool, or as a regular Python module. To encode, you need to type: $mimify.py -e raw_message mime_message or import mimify, StringIO, sys msgfilename = "msgfilename.msg" filename = StringIO.StringIO() mimify.unmimify(msgfilename, filename, 1) file.seek(0) mimify.mimify(filename, sys.stdout) To decode, type $mimify.py -f mime_message raw_message or import mimify, sys mimify.unmimify(messagefilename, sys.stdout, 1) A Message instance is instantiated with an input object as parameter. Message relies only on the input object having a readline() method; in particular, ordinary file objects qualify. Instantiation reads headers from the input object up to a delimiter line (normally a blank line) and stores them in the instance. This class can work with any input object that supports a readline() method. If the input object has seek and tell capability, the rewindbody() method will work; also, illegal lines will be pushed back onto the input stream. If the input object lacks seek and tell capability but has an unread() method that can push back a line of input, Message will use that to push back illegal lines. Thus, this class can be used to parse messages coming from a buffered stream. The optional seekable argument is provided as a workaround for certain studio libraries in which tell() discards buffered data before discovering that the lseek() system call doesn't work. For maximum portability, you should set the seekable argument to zero to prevent that initial tell() when passing in an unseekable object such as a file object created from a socket object. Input lines as read from the file might either be terminated by CR-LF or by a single linefeed; a terminating CR-LF is replaced by a single linefeed before the line is stored. All header matching is done independent of upper- or lowercase; for example, m['From'],m['from'], and m['FROM'] all yield the same result.
Message ObjectsA message object behavior is very similar to a dictionary. A Message instance has also the following methods:
Message instances also support a read-only mapping interface. In particular: m[name] is similar to m.getheader(name), but raises KeyError if there is no matching header; and len(m),m.has_key(name),m.keys(),m.values(), and m.items() act as expected (and consistently). Finally, Message instances have two public instance variables:
AddressListObjectsAn AddressList instance has the following methods:
Finally, AddressList instances have one public instance variable: addresslist, which is a list of tuple string pairs, one per address. In each member, the first is the canonicalized name part of the address, the second is the route-address (@-separated host-domain pair). The following example demonstrates the use of the rfc822 module: import rfc822 mailbox_filename = "mymailbox.msg" file_handle = open("mailbox_filename") messagedic = rfc822.Message(file_handle) content_type = messagedic["content-type"] from_field = messagedic["From"] to_field = messagedic.getaddr("To") subject_field = messagedic["Subject"] file_handle.close() print content_type, from_field, to_field, subject_field
|
© 2002, O'Reilly & Associates, Inc. |