< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039199167010047123209178152124239215162147044209062004077205006

Object Serialization and Persistent Storage

These other modules provide persistent storage of arbitrary Python objects. Whenever you need to save objects whose value is not a simple string (such as None, integer, long integer, float, complex, tuple, list, dictionary, code object, and so on), you need to serialize the object before sending it to a file.

Both pickle and shelve modules save serializable objects to a file.

By using these persistent storage modules, Python objects can be stored in relational database systems. These modules abstract and hide the underlying database interfaces, such as the Sybase module and the Python Database API.

Included in the standard Python distribution, the pickle module can convert Python objects to and from a string representation.

The cPickle module is a faster implementation of the pickle module.

The copy_reg module extends the capabilities of the pickle and cpickle modules by registering support functions.

The marshal module is an alternate method to implement Python object serialization. It allows you to read/write information in a platform independent binary format and convert data to/from character strings (the module only supports the simple built-in types). Basically, it is just another way to do byte stream conversions by using serialized Python objects. This module is used to serialize the compiled bytecode for Python modules.

This module should be used for simple objects only. Use the pickle module to implement persistent objects in general.

Persistent Storage of Python Objects in Relational Databases is a paper by Joel Shprentz presented at the Sixth Python Conference. For more information, check out http://www.python.org/workshops/1997-10/proceedings/shprentz.html.

pickle Module

The pickle module serializes the contents of an object into a stream of bytes. Optionally, it can save the serialized object into a file object. It is slower than the marshal module.

						
>>> import pickle
>>> listobj = [1,2,3,4]
>>> filehandle = open(filename, 'w')
>>> pickle.dump(filehandle, listobj)
>>> filehandle = open(filename, 'r')
>>> listobj = pickle.load(filehandle)

					

The next functions are the ones implemented by the pickle module.

						
pickle.dump(object, filename [,bin])

					

This function serializes and saves an object into a file. The bin argument specifies that the information must be saved as binary data. This function is the same as the following:

						
p = pickle.Pickler(filename)
p.dump(object)

					

If an unsupported object type is serialized, a PicklingException is raised.

						
pickle.dumps(object [,bin])

					

This function has the same behavior of dump. The difference is that this one returns the serialized object.

						
pickle.load(file)

					

Restores a serialized object from a file. This function is the same as the following:

						
object = pickle.Unpickler(file).load()

					

The next example serializes the information and converts it back again.

						
>>> import pickle
>>> value = ("parrot", (1,2,3))
>>> data = pickle.dumps(value)
>>> print pickle.loads(data)
("parrot", (1,2,3)) 
				
					

cPickle Module

This module implements the same functions that the pickle module does. The difference is that cPickle is much faster because it doesn't support subclassing of the Pickler and Unpickler objects. See the next example code. It uses the fastest pickle module available on the system.

						
try:
     import cPickle
     pickle = cPickle
except ImportError:
     import pickle
				
					

copy_reg Module

This module registers new types to be used with the pickle module. It extends the capabilities of the pickle and cPickle modules by supporting the serialization of new object types defined in C extension modules.

The next example corrects the fact that the standard pickle implementation cannot handle Python code objects. It registers a code object handler by using two functions:

  • dumpdata—   Takes the code object and returns a tuple that can only contain simple data types.

  • loaddata—   Processes the tuple.

						
import copy_reg, pickle, marshal, types

def loaddata(data):
    return marshal.loads(data)

def dumpdata(code):
    return loaddata, (marshal.dumps(code),)

copy_reg.pickle(types.CodeType, dumpdata, loaddata)

script = """
x = 1
while x < 10:
   print x
   x = x - 1
"""

code = compile(script, "<string>", "exec")
codeobj = pickle.dumps(code)

exec pickle.loads(codeobj)
				
					

Note

Note that starting at Python 2.0, the copy-reg module can't be used to register pickle support for classes anymore. It can only be used to register pickle support for extension types. You will get a TypeError exception from the pickle() function whenever you try to pass a class to the function.



marshal Module

This module is only used to serialize simple data objects because class instances and recursive references in lists, tuples, and dictionaries are not supported. It works similar to pickle and shelve.

This module implements the following functions:

						
marshal.dump(value, filename)

					

Writes the value in the opened filename.

						
marshal.load(filename)

					

Returns the next readable value from file.

						
marshal.dumps(value)

					

Only returns the string.

						
marshal.loads(string)

					

Returns the next readable value from string.

Errors in the value manipulation will raise a ValueError exception.

						
>>> import marshal
>>> value = ("spam", [1,2,3,4])
>>> data = marshal.dumps(value)
>>> print repr(data)
'(\002\000\000\000s\004\000\000\000spam[\004\000\000\000i\001\000\000\000i\002\0
00\000\000i\003\000\000\000i\004\000\000\000'
>>> print marshal.loads(data)
("spam", [1,2,3,4])

					

The next example handles code objects by storing precompiled Python code.

						
import marshal
script = """
x = 1
while x < 10:
   print x
   x = x - 1
"""

code = compile(script, "<script>", "exec")
codeobj = marshal.dumps(code)

exec marshal.loads(codeobj)
				
					

shelve Module

The shelve module is also part of the standard Python distribution. Built on top of the pickle and anydbm modules, it behaves similar to a persistent dictionary whose values can be arbitrary Python objects.

The shelve module offers persistent object storage capability to Python by using dictionary objects. Both keys and values can use any data type, as long as the pickle module can handle it.

						
import shelve
key = raw_input("key: ")
data = raw_input("value: ")
dbhandle = shelve.open("DATABASE","w")
while not(dbhandle.has_key(key)):
    dbhandle[key]=data
    key = raw_input("key: ")
    data = raw_input("value: ")
dbhandle.close()

					

The shelve module implements a shelf object which supports persistent objects that must be serializable using the pickle module. In other words, a shelf is a dbm (or gdbm) file that stores pickled Python objects. It stores dictionary structures (pickled objects) on disks. For that purpose, it uses dbm-like databases, such as dbm or gdbm. The file it produces is, consequently, a BINARY file. Therefore, the file's format is specific to the database manager used in the process.

To open a shelve file, the following function is available:

						
shelve.open(filename)

					

The file is created when the filename does not exist. The following methods and operations are also supported:

						
dbhandle[key] = value    # Set the value of a given key entry
value = dbhandle[key]    # Get the value of a given key entry
dbhandle.has_key(key)    # Test whether a key exists
dbhandle.keys()          # Returns a list of the current keys available
del dbhandle[key]        # Delete a key
dbhandle.close()         # Close the file

					

Next, I present a simple example of the shelve module using the following:

						
>>> import shelve
>>> dbhandle = shelve.open("datafile", "c")
>>> dbhandle["animal"] = "parrot"
>>> dbhandle["country"] = "Spain"
>>> dbhandle["weekdays"] = 5
>>> dbhandle.close()
>>>
>>> dbhandle = shelve.open("datafile ", "r")
>>> for key in dbhandle.keys():
        print dbhandle[key]
parrot
Spain
5
>>> db.close()
				
					
Locking

As a matter of fact, even though modules such as gdbm and bsddb perform locking, shelves don't implement locking facilities. This means that many users can read the files at the same time. However, only one user can update the file at a given moment. An easy way to handle the situation is by locking the file while writing to it. A routine like this must be implemented because it is not part of the standard distribution.

More Sources of Information

PyVersant

PyVersant is a simple Python wrapper for the Versant commercial OODBMS. By using PyVersant in the Python command prompt, you can interactively find objects, look at their values, change those values, and write the object back to the database, among other things. More information is provided at the following site:

http://starship.python.net/crew/jmenzel/

Details about Versant OODBMS are shown at the following site:

http://www.versant.com/

ZODB

The Zope Object Database is a persistent-object system that provides transparent transactional object persistence to Python applications. For more information, check out the following site:

http://www.zope.org/Members/michel/HowTos/ZODB-How-To

ZODB is a powerful object database system that can be used with or without Zope. As a database, it offers many features. Note that ZODB uses other database libraries for the actual storage.

More information about Zope can be found in Chapter 11, "Web Development."


Last updated on 1/30/2002
Python Developer's Handbook, © 2002 Sams Publishing

< BACKMake Note | BookmarkCONTINUE >

Index terms contained in this section

copy reg module
     pickle support
            registering
copy_reg module
cPickle module 2nd
databases
      object serialization 2nd
      Zope Object (ZODB)
files
     shelve
            opening
locking
      shelves
marshal module 2nd
modules
     copy reg
            pickle support
      copy_reg
      cPickle 2nd
      marshal 2nd
      pickle 2nd 3rd
      shelve 2nd 3rd
object serialization
      databases 2nd
objects
      serializable, saving 2nd 3rd
opening
      shelve files
persistent storage
      databases 2nd
Persistent Storage of Python Objects in Relational Databases
pickle module 2nd 3rd
pickle support
      copy reg module
saving
      serializable objects 2nd 3rd
serializable objects
      saving 2nd 3rd
serilization
     objects
            databases 2nd
shelve files
      opening
shelve module 2nd 3rd
Shprentz, Joel
storage
     persistent
            databases 2nd
Zope Object Database (ZODB)

© 2002, O'Reilly & Associates, Inc.