< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039196038240039088173205162105045222217074025151062064201046

Development Strategy

Writing a program is something very easy, but writing a good and optimized program requires some level of experience. A good way to start is to learn all the nuances of the language, which in our case involves learning Python. You should know a little bit of everything, and this book helps you learn most of them, including classes, modules, functions, exception handling, dynamic typing, GUI, operator overloading, indentation, and so forth.

Of course, you must know many other important items too.

Nowadays, the most important development efforts are focusing on the Internet. Python offers the basic necessary tools that you might need for your Web projects. Python can be used either for Web-based interface projects or to generate entire back-end frameworks, using tools such as Zope.

Note that by extending Grail, the Web browser written in Python, you can embed your Python application directly on it and distribute a browser to your clients that carries specific and customized interfaces.

Even if you don't use Grail, you can use any browser to provide GUI interfaces for your applications. Have you ever considered delivering information and products through the Web? If so, you can do it using Python.

Python is a perfect language for project prototyping. Python's design allows you to make changes very quickly. Later you can decide whether you will re-implement the code using a compiled language, or stick to Python and continue the development effort using the prototype as a startup. Remember that after spending some time creating a prototype, you probably have a huge amount of code that you do not want to throw away.

Prototyping with Python is very easy. You can, for example, wrap your code in a function inside a module and use a development environment, such as Pythonwin or IDLE, to run the script. To test this application, you just need to save it and execute it—very simple. No intermediate stages are necessary.

Python testing mechanisms also allow you to forge command-line arguments. You can test your command-line scripts by first setting their expected arguments to predefined values using the built-in variable sys.argv.

Along the development stage, you will soon see that Python can be easily used to code entire applications, without discarding the prototyped code.

If speed is a requirement, you can use a compiled language in the back-end side of your application to support the high-demand operations. Python, in this case, can be used as the front end of the application, leaving the hard work to the other language. This kind of implementation allows you to create black boxes of code, which get called by Python, and Python doesn't necessarily need to know what is happening behind the scenes because only the external interface of the compiled language needs to be exposed.

But whenever possible, select just Python. It is good to remember that supporting a scripting language is much easier than supporting a compiled language. The usage of a scripting language makes tasks such as debug the application, fix bugs, and add enhancements look very simple. Because we are not using a compiled language, we don't need to spend time compiling and linking the files. Updating client sites with the latest version of the application is also very easy because we just need to send the file that carries the changed Python module.

As you can see, a lot of thinking is involved in the process of preparing yourself to handle a Python development. Next, we will see some ideas about how to optimize your code, and how to write a program with style. Both are very important things that you must have in mind, not only when using Python, but also when writing in any other language.

Optimizing the Code

To prevent your program from running very slowly, you might consider following some basic Python optimization rules. By designing your application from the start with these guidelines in mind, you will certainly be satisfied with the final overall performance that you will get.

My goal in this section is to provide ways to generate acceptable performance in your Python routines. Note that I don't cover everything, but a good set of basic concepts is covered.

Many things can be done to reduce the processing time of your application. Remember that you have an interpreter being called every time you execute a Python script. Consequently, you need to work on your code in order to compensate that somehow. The fact that it is an interpreted language is a big concern, but by reducing the number of statements that get parsed, you can also reduce the interpreter overhead.

By the way, the Python interpreter has a command-line option (-O, which stands for optimize) that enables you to execute your code in such a way that some of the bytecode operations are not executed. Basically, it is used to remove the comments in the bytecode that give the line number where exceptions occur, and does not compile in the doc strings and a few other things. This flag does not give that much speed increase, and it makes things harder to debug.

Some useful optimization hints are as follows:

  • Variables—Depending on how your variables are defined, the interpreter spends more or less time trying to figure out their values. Python deals with dynamic scope rules when trying to resolve variable names. After it finds a variable in the code, it first tries to discover if the variable is a local variable by looking at the local namespace dictionary. If it finds the variable, it grabs the variable's value. Otherwise, it searches in the global namespace dictionary, and if necessary, in the built-in namespace dictionary. As you can see, local variable lookups are pretty much faster than other types. Consequently, the access to their values is faster too. Also, local variable lookups are much faster because they correspond to indexing into an array, whereas global variable lookups correspond to hash table lookups. A good optimization hint might be that if you are using a global variable a lot in a function, assigning its value to a local variable can help a lot.

  • Modules—Within a single script, you just need to import an external module once. Therefore, it is not necessary to have multiple import statements inside your code. Actually, you should avoid trying to re-import modules on your program. As a rule of thumb, put all the import statements in the very first part of your program header. However, calling import on a module multiple times is not really a problem because it is just a dictionary lookup.

    In cases where you have to do a lot of referencing to particular attributes of an external module, you should consider copying those elements to a single variable (when that's possible, of course) before starting to code—especially, if the references are made inside a loop.

    Whenever you import a module, the interpreter looks for a byte-compiled version of the module. In case it doesn't find any, it automatically bytecompiles the module and generates a .pyc file. So, the next time you try to import the module, the byte-compiled file will be there. As you can feel, .pyc files are executed much faster than regular .py files because they have already being interpreted by the interpreter prior to the execution. The suggestion here is to use byte-compiled modules the more you can. The Python code executes at the same speed no matter if there is a .pyc file or not. The only difference is that if there is a byte-compiled file, startup will be a bit quicker. The actual running speed of the code is no different.

  • Strings—Use format strings whenever you need to concatenate strings with other variables. Check out the next concatenation forms.

    								
    name = "Andre"
    print "Hello " + name
    print "Hello %s" % (name)
    
    							

    Be sure that the second print statement is more optimized than the first one. The parentheses on the third line are not necessary. Another option would be

    								
    print "Hello", name
    
    							
  • Tkinter—Avoid creating unnecessary instances of widgets. If you are not planning to manipulate the attributes of a widget after it has been created, stick to direct calls to the class. In a GUI app, this won't affect the running speed that much—just the startup time.

    There is no reason to say

    								
    mybutton = Button(root, text="Close")
    mybutton.pack(side=right)
    
    							

    when you can simply use

    								
    mybutton = Button(root, text="Close").pack(side=right) 
    
    							

    Now, the interpreter has one less variable to handle.

    I open a parenthesis here to let you know that if you are testing a Tkinter application using IDLE, you need to comment your mainloop() command. That's because IDLE is already running inside a Tkinter mainloop, and calling another one might freeze your entire environment.

  • Loops—You can optimize a lot of things in your loops in order to make them run smoothly. In a short list, I can tell you the following:

    • You should use built-in functions in your inner loop instead of using functions written in Python. By using built-in functions that support list manipulation (such as map(), reduce(), and filter()) instead of straight loops, you can move some of the loop overhead to the C code. Passing built-in functions to map, reduce, or filter gives even better performance.

    • Whenever you have multiple levels of loop, it is worth it to optimize only the innermost one. When optimizing multiple-level loops, the idea is to reduce the number of memory allocations. Making the innermost loop to be the one with the fewer number of interactions should help your performance design.

    • Working with local variables is a great thing that improves the processing time inside a loop. Whenever possible, copy all your global variables and attribute look-ups to local variables before entering a loop.

    • If you use construction methods such as range(n) inside a nested loop, it is much faster to allocate the value range to a local variable outside the outmost loop, and use this variable in the loop definitions.

      										
      yRange = range(500)
      for xItem in range(100000):
          for yItem in yRange:
      
             print xItem, yItem
      
      									
    • Another optimization here would be using xrange for the x for loop because a 100000 item list is a quite large list.

      										
      yRange = range(500)
      for xItem in xrange(100000):
          for yItem in yRange:
              print xItem, yItem
      
      									
  • Functions—Python built-in functions are faster to execute than functions written in clean Python because the built-in functions are already written in C. map(), filter(), and reduce() are examples of built-in functions that can be used to beat the performance of functions written in Python. It is also good to know that Python handles function names as global constants. Having said that, the whole conception of namespace look-up that we saw previously also applies to functions as well. If you have the option to choose, use the map() function's implied loop than a for loop—it is much faster. The runtime of the loop functions that I mention here is highly dependent on what function you pass in. Passing a Python function will not be as fast as passing in a built-in function (such as the ones in the operator module).

In case you want to test the performance of your routines, you can use a simple concept, which is explained next. The idea is to measure the time spent between calling the routine and finishing its execution.

After you add these lines to your program, you can benchmark it and test new kinds of approach. Note that we have a little time overhead because we have to call the time() function.

First, you need to import the time module:

						
import time

					

Second, you just need to set a timer after executing and before starting your routine. This is done using the time.clock() function:

						
start_timer = time.clock()
call_your_routine()
end_timer = time.clock()
print end_timer-start_timer

					

Code optimization is a very complex science that is not restricted just to Python programs. Sometimes when you booster the performance in one place, it breaks something somewhere else. What I mean by that is that if the processing time of your application seems OK for you, don't touch it. I suggest that you to just try to optimize your code when a real performance problem is creating an unsupportable bottleneck in your application.

Chapter 17, "Development Tools," introduces the Python Profiler module to you. This tool can help you to identify the bottlenecks in your code.

The following links have some more additional thoughts about code optimization for Python applications:

Python Patterns—An Optimization Anecdote, essay by Guido Van Rossum

http://www.python.org/doc/essays/list2str.html

Python Performance Tips, by Skip Montanaro

http://www.musi-cal.com/~skip/python/fastpython.html

Style Guide

The following guidelines are directly based from some of the ideas of Guido van Rossum about how to write a Python program within style. The main quality that we need to acquire is the ability to decide exactly when we can apply these guidelines, and when it is better to be a little inconsistent and step out of these rules in order to have a more reliable implementation.

These are just suggestions. Feel free to write your code any way you want it. Nothing or no one will force you to follow these rules, but you will see by yourself how practical it is to have these guidelines in mind when coding a program.

Code Layout

Python's core definition says that we must delimit structures using indented blocks. A standard indentation consists of four spaces for each indentation level. Most of the time, you can alternatively use one tab instead of four spaces.

Try to write your code with lines containing less than 80 characters each. If it turns out to be necessary to break a line, use parentheses, brackets, and braces to continue the code on the next line, using a backslash only if that is not possible.

Blank lines are used to separate chunks of related code, such as top-level function and class definitions (two blank lines), class definition and the first method definition (one line), and methods definitions inside a class (one blank line). You can omit the blank lines in case your definitions have just one line each.

Handling whitespaces is another issue that you need to be aware of. The following are bad examples of whitespace usage:

						
lst = [ 3,4,5]          # After open parentheses, brackets or braces.
if var < 10 :        # Preceding a comma, semicolon, or colon.
xrange (7)              # Preceding the parenthesis of a function call.
car ["plate"]           # Preceding indexing or slicing brackets.
var      = 3            # Multiple whitespaces preceding an operator.

					

The next group of operators should always be preceded and followed by just one space on each side.

						
=, ==, <, >, !=, <>, <=, >=, in, not in, is, is not, and, or, not.

					

However, there is a special remark here for the = (equal) sign. Whenever it is used to indicate a keyword argument or a default parameter value, you should suppress the spaces that surround it.

						
def printvar(input=10):
    print input
printvar(input=20)
20
printvar()
10

					

Sometimes, arithmetic operators shouldn't be surrounded by spaces either. By avoiding whitespaces, you can make some expressions more readable, as you will see next.

						
var = (x+y * (w/z))

					

The previous expression resembles ((x+y) * (w/z)) when in fact it is (x+(y * (w/z))). A good way to write that would be

						
var = (x + y*(w/z))

					

Comments

If you decide to add comments to your code, you need to remember to keep them up-to-date all the time. Otherwise, it can become more of a problem than being a helper thing. Some of the basic rules for writing comments are listed next:

  • Write your comments in plain English. For large projects with members of different nationalities, English is often the common language. Of course, if no developers know English, this rule is not a good idea.

  • Capitalize the first word of sentences and phrases.

  • Omit the period at the end of short comments.

  • Never alter the case of identifiers. Remember that Python is case sensitive; thus, you should write your helper comments using the same notation used by the definition of the object that you are describing.

There are two kinds of comments: block comments and inline comments. The former applies to the code that follows it, and the latter is put on the code's own line. Both types require at least a single #, followed by a single space at the beginning of each commented line. When writing block comments, insert a blank line above them, and another one below each paragraph.

Be careful when using inline comments because it can cause over-pollution of text in your code—comments are no substitute for readable code. Inline comments are best used when preceded by at least two whitespace characters from the inline statement.

A documentation string is a special kind of comment that goes beyond the remarking concept that we get when using the # literal. All objects that accept the usage of documentation strings incorporate those strings to their structure, allowing you to later query, read, and use their documentation strings (see Chapter 2, "Language Review," for details).

Documentation strings are, by convention, surrounded by a triple quote structure on each side. Do not use the documentation string to store a description. Instead, try to be functional, showing the command's action. Things that you should try to register in documentation strings include: the environment variables, files, routine objective, and the syntax design of scripts, modules, functions, classes, and public methods exported by classes.

There are two types of documentation strings: the one-liners and the multi-line ones. The former must entirely fit in a single line, including the closing quotes, and you are not instructed to insert blank lines surrounding it. On the other hand, multi-line documentation strings are formed by a single line of documentation followed by a block that contains a complete description of the object. Note that we are instructed to insert a blank line between these two structures. Also, note that additional lines in a documentation string do not need to be indented following the pattern established by the first line (it does look nicer if they are though). Before typing the closing quotes, it is also advised that you enter a new paragraph in order to let the quotes stand in a line of their own.

Next, you will have some suggestions about what to include in the documentation string of modules, functions, methods, and classes.

Modules should document the objects they export, such as the classes, exceptions, and functions, with a one-line summary for each one of them.

Functions and methods should document their behavior, arguments (including optional arguments and keywords), return value(s), side effects, exceptions raised, and so forth. When documenting arguments, put each one of them in a single line and separate each name from its description using two dashes. Single blank lines separate lists of methods and functions from each other.

Classes should document their public methods and instance variable properties. If the class subclasses another class, you have to mention the superclasses as well, along with the differences between both implementations. As a suggestion, use the verbs override and extend to respectively indicate that a specific method entirely replaces, or acts in addition to the superclass's own method definition. It is also recommended that when creating the documentation string for a class, you should surround it using single blank lines.

Naming Styles and Conventions

When it comes time to name your objects and variables, you have a list of options to choose from. You just can't mix all styles throughout your code because it might cause a big mess. You need to be consistent, and I suggest that you stick to a pattern and use it in every part of your code. As I said before, many styles are available. You might already be a big fan of one of them without even knowing it. It is quite common to have different naming conventions for classes, functions, and variables (for instance, CapWords for classes, lower_case_with_underscores for functions). In order to give you an idea of what kind of different styles we have, the following case conventions are introduced to you:

x (single lowercase letter)

X (single uppercase letter)

lowercase

lower_case_with_underscores

UPPERCASE

UPPER_CASE_WITH_UNDERSCORES

CapitalizedWords (or CapWords)

mixedCase

Capitalized_Words_With_Underscores

The following leading/trailing underscore structures can be combined with any one of the previously listed naming styles. You can substitute the variable VAR for any other object name that you want (considering Python's rules for object naming seen in Chapter 2).

_VAR —  Objects that have a single leading underscore indicate that the object can be used only on the local module namespace. The from module import * statement doesn't import objects that start with a single leading underscore. The main concern about writing global variables is that if you want to have the variable only visible by the module that defines it, you need to have an underscore preceding it.

VAR_ —  You need to append a trailing underscore to the end of the name in order to avoid naming conflicts whenever you want to use a Python keyword (such as print_) as your own variable. This is one just possible way of getting rid of a conflict with a Python keyword.

__VAR —  The double leading underscore identifies class-private names.

__VAR__—  When you have an object that has both leading and trailing underscores, you can consider yourself in front of an object that, in most cases, is defined by the Python interpreter. This definition applies to both objects and attributes that work under the user namespace, which includes the __init__ method. Try to avoid using this type of structure when naming your own objects because it might cause name conflicts in your application as future releases of Python arrive.

Although there is no current naming standard among the files that are part of the Python's Standard Library, I can list some guidelines that can make the task of naming new modules easier for you.

When creating modules, give them MixedCase or lowercase names. Use the first option whenever the module exports a single class or a bunch of related classes, and the second option when the module exports a group of functions. Also, note that module names are mapped to filenames in Python. Therefore, it is a good idea to pay special attention when giving a name to a module in order to avoid long names (module names can become truncated on some systems), and keep in mind that Python is case sensitive, which makes a module called MyModule.py different from a module called mymodule.py. If you have two modules where one is a low-level interface written in C/C++, and the other one is a high-level object-oriented interface written in Python, the almost common standard nowadays is to give the Python's module a CapWords name (it isn't quite as widely used). On the other hand, the C/C++ module should be written entirely using lowercase letters, and preceded by a leading underscore (this is pretty much standardized). A known example of this concept is the pair of modules Tkinter and _tkinter.

When writing class names, you can stick to the CapWords pattern. Although this is a convention used most of the time, you are encouraged to modify this rule when handling internal classes of modules that are not supposed to be exported. You have to precede these classes with leading underscores.

When working with exceptions, you have two options. Their names are usually written in lowercase letters when part of built-in modules, whereas the ones that are part of Python modules are usually written using CapitalizedWords. The main deciding factor for creating exception names is whether you expect people to normally use from ... import * or import ... in the module.

When naming functions, you are encouraged to use one from the next two style options: CapWords for functions that provide a large functionality (less used), and lowercase for functions that expose less useful classes.

When naming methods, you should stick to the CapWords style for methods that are published by an ILU interface. For all other cases, you should consider switching to lowercase. If you don't want a method to be visible by external methods or instances, you must put an underscore in front of it. As you can see in Chapter 5, "Object-Oriented Programming," the use of this same concept can be applied to certain attributes in order to make them available only to their classes. Note that this last feature can be easily manipulated using the __dict__ attribute.

More details about these concepts can be found at

Python Style Guide, by Guido Van Rossum

http://www.python.org/doc/essays/styleguide.html


Last updated on 1/30/2002
Python Developer's Handbook, © 2002 Sams Publishing

< BACKMake Note | BookmarkCONTINUE >

Index terms contained in this section

# (pound sign)
= (equal sign)
[nd]O command-line option
adding
      comments to code 2nd
applications
      optimizing performance 2nd 3rd
     Python
            building 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
blocks
      indented
building
      Python applications 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
classes
      documentation strings
      naming styles and conventions
code
      optimizing 2nd 3rd
      style guides 2nd 3rd 4th 5th 6th
command-line options
      [nd]O
command-line scripts
      testing
comments
      adding to code 2nd
      inline
construction methods
      nested loops
creating
     code
            optimizing 2nd 3rd
            style guides 2nd 3rd 4th 5th 6th
      comments for code 2nd
      Python applications 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
development environments
      building Python applications 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
documentation strings 2nd 3rd
environments
     development
            building Python applications 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
equal (=) sign
exceptions
      naming styles and conventions
filter() function
functions
      documentation strings
      filter()
      mainloop()
      map()
      naming styles and conventions
      optimizing 2nd
      reduce()
      time()
      time.clock()
importing
      modules
      time module
indented blocks
inline comments
loops
     nested
            construction methods
      optimizing
mainloop() function
map() function
methods
     construction
            nested loops
      documentation strings
      naming styles and conventions
modules
      documentation strings
      naming styles and conventions
      optimizing 2nd 3rd
     time
            importing
nested loops
      construction methods
objects
      naming styles and conventions 2nd
optimizing
      code 2nd 3rd
options
     command-line
            [nd]O
performance
      applications, optimizing 2nd 3rd
pound (#) sign
programs
      optimizing performance 2nd 3rd
     Python
            building 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
protyping
quotes
     triple
            documentation strings
reduce() function
Rossum, Guido van
scripts
     command-line
            testing
software
      optimizing performance 2nd 3rd
     Python
            building 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
strings
      documentation 2nd 3rd
      optimizing
style guides
      writing code 2nd 3rd 4th 5th 6th
superclasses
      documentation strings
testing
      command-line scripts
time module
      importing
time() function
time.clock() function
Tkinter
      optimizing
triple quotes
      documentation strings
variables
      optimizing 2nd
whitespace
writing
     code
            optimizing 2nd 3rd
            style guides 2nd 3rd 4th 5th 6th
      Python applications 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

© 2002, O'Reilly & Associates, Inc.