< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039199167010047123209178152124239215162147044209063089000142000

DBM (Database Managers) Databases

Now, let's look at this other mechanism for storing data. The next modules store data in dbm-style format. This format specifies a simple disk-based storage facility that handles data in a way equivalent to dictionaries. The objects are manipulated by using unique key strings. Each of these modules is an interface to a specific library.

dbm, gdbm, and dbhash are database modules that are part of the standard Python distribution.

Also included with the standard Python distribution is the anydbm module, which is a generic interface to all the dbm-like modules. It uses the modules that are installed.

The dbhash module provides a function that offers a dbm-style interface to access the BSD database library.

All these modules have some behavior in common. For example, to open the files, the following syntax is used by all of them.

					
dbhandle = open(filename [, flag [,mode]])

				

Where, filename is the database filename; flag can have one of the following values: r (read-only access), w (read/write access), c (create the database), n (force the creation of a new database); and mode specifies the file access mode (specific for UNIX systems).

The following operations are supported:

					
dbhandle[key] = value    # Set the value of a given key entry
value = dbhandle[key]    # Get the value of a given key entry
dbhandle.has_key(key)    # Test whether a key exists
dbhandle.keys()          # Returns a list of the current keys available
del dbhandle[key]        # Delete a key
dbhandle.close()         # Close the file

				

For all these dbm-like modules, the keys and the values to be stored must be of type string. Later, you will see a module called shelve with a behavior similar to these dbm-like modules. However, it stores persistent objects.

Each module provides its own exception, which is called modulename. error.

					
>>> import anydbm
>>> try:
…        dbhandle = anydbm.open("datafile","r")
…    except anydbm.error:
…        print "Error while opening file"
…
Error while opening file
>>>

				

This is a simplified database system based on key/value pairs. Depending on the module and the system, it uses one or two files to store the data (for example, both gdbm and bsddb use a single file).

The disadvantage of this kind of implementation is that it is not portable. The storage format is specific to a particular hardware platform and operating system. Also, it is not designed for large volumes of data. The smaller the file, the better the performance. This is caused by the original specification, which wanted information to be accessed in a single system call. After some interactions, the data file gets very fragmented, full of data holes, which drives the performance to very low indexes. Of course, they are very efficient when you do lots of reads and almost no writes.

If you have a data file but you don't know which database you used to create it, take a look at the whichdb module.

The whichdb module provides a function that guesses which dbm module (dbm, gdbm, or dbhash) should be used to open a specific database. However, using the anydbm module should take care of guessing the format for you.

Another important fact you must know is concerning the storage size limitation of each key/value pair, which is also known as bucket size. The dbm module accepts between 1K and 2K of data. However, both gdbm and bsddb don't have any limitation at all.

dbm Module

The dbm module is a database interface that implements a simple UNIX dbm library access method. dbm objects behave similar to dictionaries in which keys and values must contain string objects. This module allows strings, which can encode any Python object, to be archived in indexed files. dbm is the original implementation of the DBM toolkit. The main function of this module opens a dbm database and returns a dbm object that behaves similar to a dictionary.

						
>>> import dbm
>>> dbhandle = dbm.open("datafile", "c")
>>> dbhandle["animal"] = "parrot"
>>> dbhandle["country"] = "Spain"
>>> dbhandle.close()
>>>
>>> dbhandle = dbm.open("datafile ", "r")
>>> for key in dbhandle.keys():
        print dbhandle[key]
parrot
Spain
>>> db.close()

					

gdbm Module

The gdbm module is similar to the dbm module. However, their files are incompatible. This module provides a GNU/FSF reinterpretation of the GNU dbm library. This module supports multi-user application, it is faster than the dbm module (the performance gets better when the number of records increases), and it was already ported to a larger number of platforms.

Check out the GNU Web site for more details:

http://www.gnu.org/software/gdbm/gdbm.html

						
>>> import gdbm
>>> key = raw_input("key: ")
>>> data = raw_input("value: ")
>>> dbhandle = gdbm.open("DATABASE","w")
>>> while not(dbhandle.has_key(key)):
…    dbhandle[key]=value
…    key = raw_input("key: ")
…    data = raw_input("value: ")
…
>>> dbhandle.close()

					

The gdbm module implements the following additional methods:

						
dbhandle.firstkey()

					

Returns the first key in the database.

						
dbhandle.nextkey(key)

					

Returns the next key located after the provided key.

						
dbhandle.reorganize()

					

Reorganizes the database by eliminating unused disk space that is created when deletions occur.

						
dbhandle.sync()

					

Synchronizes the database file by writing unsaved data to the disk.

If you append " f " to the flag clause in the open statement, Python opens the database in fast mode. This means that data is not automatically saved to disk. You must call the sync method in order to save all the unwritten information to disk. This is done to improve performance.

bsddb Module

The bsddb module is part of the standard Python distribution. In addition to the dictionary-like behavior, this module also supports B-trees (which allows traversing the keys in sorted order), extended linear hashing, and fixed- and variable-length records. Although this module has the more complex implementation, this is the fastest dbm-like module.

The bsddb module provides an interface to access routines from the Berkeley db library, a C library of database access methods copyrighted by Sleepycat Software. This library provides full transactional support, database recovery, online backups, and separate access to locking, logging, and shared-memory caching subsystems.

More information about the Berkeley DB package can be found at http://www.sleepycat.com.

The bsddb module implements the following open interfaces:

						
dbhandle = hashopen(filename [, flag [,mode]])

					

Handles hash format files.

						
dbhandle = btopen(filename [, flag [,mode]])

					

Handles btree format files.

						
dbhandle = rnopen(filename [, flag [,mode]])

					

Handles record-based files.

Along with the previous interfaces, this module also provides the following additional methods—these methods are used to move a cursor across the database.

						
cursor = dbhandle.set_location(key)

					

Moves the cursor to the location indicated by the key and assigns the location's value to the cursor variable.

						
cursor = dbhandle.first()

					

Moves the cursor to the first element and assigns its value to the cursor variable.

						
cursor = dbhandle.next()

					

Moves the cursor to the next element and assigns its value to the cursor variable.

						
cursor = dbhandle.previous()

					

Sets the cursor to the previous element and assigns its value to the cursor variable.

						
cursor = dbhandle.last()

					

Moves the cursor to the last element and assigns its value to the cursor variable.

						
dbhandle.sync()

					

Synchronizes the database file by writing unsaved data to the disk.

These methods are not supported by the hash format databases.

Although the standard Python distribution installs the bsddb module on Windows machines, there is another interesting Win32 port of the bsddb module, which was created by Sam Rushing. For more information, check out http://www.nightmare.com/software.html .

dbhash Module

The dbhash module provides a "clean" open interface to the Berkeley DB hash database. Note that the bsddb module must be installed before trying to call dbhash because the bsddb module is used to open the databases.

The syntax to open the hash database is the same as the one used by the other dbm-like modules.

						
dbhandle = open(filename [, flag [,mode]])

					

This module provides the following additional methods:

						
dbhandle.first()

					

Returns the first element.

						
dbhandle.last()

					

Returns the last element.

						
dbhandle.next(key)

					

Returns the next element after the key element.

						
dbhandle.previous(key)

					

Returns the previous element before the key element.

						
dbhandle.sync()

					

Synchronizes the database file by writing unsaved data to the disk.

Let's look at an example:

						
>>> import dbhash
>>> key = raw_input("key: ")
>>> data = raw_input("value: ")
>>> dbhandle = dbhash.open("DATABASE","w")
>>> while not(dbhandle.has_key(key)):
… dbhandle[key]=value
…  key = raw_input("key: ")
…  data = raw_input("value: ")
…
>>> dbhandle.close()
				
					

anydbm Module

The anydbm module opens (or creates) a database using the best implementation available. It searches within the available databases using the following order: Berkeley bsddb, gdbm, and dbm. It only loads the dumbdbm module when none of the others are available. Actually, the module doesn't know what database packages are installed and available—it just tries to use them.

						
>>> import anydbm
>>> def opendatabase(filename, flag):
...    try:
...        dbhandle = anydbm.open(filename, flag)
...    except:
...        raise "Error opening file " + anydbm.error
...    return dbhandle
...
>>> dbhandle = opendatabase("mydata","c")

					

dumbdbm Module

The dumbdbm module is a simple, portable, and slow dbm-style database implemented entirely in pure Python. It shouldn't be used for development because it is slow, inefficient, and inconsistent. The only case acceptable for using this module is when no other module is available.

whichdb Module

The whichdb module tries to identify which database was used to create a given file. This module implements a function of the same name. The syntax is

						
dbtype = whichdb(filename)

					

This function returns the module name (for example, gdbm) when the format is identified.

The function returns an empty string if the format is not identified. Note that databases created using the dumbdbm module were not supported by this module prior to Python 2.0.

The function returns None if the file doesn't exist or if it can't be opened.

						
import whichdb
dbtype = whichdb.whichdb("filename")

if dbtype:
    handler = __import__(result)
    dbhandle = handler.open("filename","r")
    print dbhandle.keys()
if dbtype = "":
    print "I cannot recognize this file "
if dbtype = None:
    print "An error happened while reading this file"

					

Note

You shouldn't need to use this module. anydbm uses whichdb to work out what module to use to open a database.




Last updated on 1/30/2002
Python Developer's Handbook, © 2002 Sams Publishing

< BACKMake Note | BookmarkCONTINUE >

Index terms contained in this section

anydbm module 2nd
bsddb module 2nd 3rd 4th
bucket size
c value
data
      saving to disk
database managers (DBM) databases 2nd 3rd 4th 5th 6th 7th 8th
databases
      database managers (DBM) 2nd 3rd 4th 5th 6th 7th 8th
     hash
            opening
      identifying
databases:dumbdbm module
dbhash module 2nd 3rd
dbm module
disks
      saving data to
dumbdbm module
      databases
exceptions
      modules
finding
      databases
gdbm module 2nd 3rd 4th 5th
hash databases
      opening
interfaces
      open
key/value pairs
      bucket size
methods
      sync
mode value
modules
      anydbm 2nd
      bsddb 2nd 3rd 4th
      dbhash 2nd 3rd
      dbm
      dumbdbm
            databa
      exceptions
      gdbm 2nd 3rd 4th 5th
      shelve
      whichdb 2nd
n value
open interface
opening
      hash databases
r value
saving
      data to disk
searching
      databases
shelve module
sync method
syntax
      identifying databases
      opening hash databases
values
      c
      mode
      n
      r
      w
w value
whichdb module 2nd

© 2002, O'Reilly & Associates, Inc.