< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039199167010047123209178152124239215162148047091222021048126046

Site Management Tools

The following Python tools are used to manage Web sites. They implement several functions that simplify the daily tasks performed by webmasters, such as dead link checking, and object publishing.

WebDAV/PyDAV

WebDAV (World Wide Web Distributed Authoring and Versioning) is a set of extensions to the HTTP/1.1 protocol, which allows users to collaboratively edit, manage, and update files safely on remote Web servers. It was developed by the WebDAV working group of the Internet Engineering Task Force (IETF).

WebDAV provides a standard infrastructure for asynchronous collaborative authoring across the Internet in order to turn the Web into a collaborative environment.

WebDAV has the following core features: Metadata management, Name space management, Collections, Overwrite prevention, Version management, Access Control, and Locking (concurrency control).

For more information about WebDAV, check out its Web site at

http://www.webdav.org

PyDAV is a WebDAV (also known as DAV) server implemented in Python. Check out its Web site at the following address:

http://sandbox.xerox.com/webdav/

Zebra

Zebra is an XML-based preprocessing language that offers a compact syntax for expressing common Web design patterns. Similar to Zope, Zebra is a templating system that is able to preprocess Python code. Therefore, developers don't need to stick to the details of the language before starting a nice design. For more information, check out the following site:

http://zebra.sourceforge.net/

httpd_log

The HTTPD logfile reporting tool (httpd_log) is a graphical Web statistics tool that analyzes HTTP log files and generates a page of summary information, complete with statistical graphs. Richard Jones developed this tool.

You'd better check out the new release 4.0b1 because it uses the more accurate PIL module, instead of using the old GD graphic module. Although the release 3.0 is very stable, the graphing provided by the new release is more accurate.

Keep in mind that you need to install the PIL module (PILGraph-0.1a7.tar.gz) in order to use the release 4.0b1. For more information, check out

http://starship.python.net/crew/richard/httpd_log/

Linbot

Linbot is a site management tool that analyzes a site and allows the user to view a site map, check for broken internal and external links, missing images, and list other problems that were found. It downloads each page from the Web site, and parses its contents in order to collect all the site's information. Linbot is extensible, so new tests can be added by writing some Python code.

Some of the things that Webmasters can do periodically and without user intervention when using Linbot are listed as follows:

  • View the structure of a Web site

  • Track down broken links in Web pages

  • Find potentially outdated Web pages

  • List links pointing to external sites

  • View portfolio of inline images

  • Get a run down of problems sorted by author

  • Locate pages that might be slow to download:

    http://starship.python.net/crew/marduk/linbot/

Python-Friendly Internet Solution Providers (ISPs)

The Web site "Python-friendly ISPs" lists Web site providers that support the execution of CGI scripts written in Python. These lists are separated into some specific categories:

  • Python Installed System-Wide

  • User May Install Python in Own Directories

  • Providers with No Python Installed

  • Other Providers (Python Support Unknown)

The address is http://www.corrt.com/info/pyisp-list.html

mxCGIPython

Instead of looking for an ISP that supports Python, you might be interested in the mxCGIPython tool, which helps you install Python on your ISP when your ISP either won't or can't. Marc-Andre Lemburg has put together a small Zip file, which contains all necessary setup and config files. For more information, check out the following:

http://starship.python.net/~lemburg/mxCGIPython.html

HTMLgen

If you need a module to help you generate HTML, you should check out HTMLgen, written by Robin Friedrich. It's a class library of objects corresponding to all the HTML 3.2 markup tags. It's used when you are writing in Python and want to synthesize HTML pages for generating a Web, for CGI forms, and so on. The following lines are some examples of using HTMLgen:

						
>>> print H(1, "Welcome to Python World")
<H1>Welcome to Python World</H1>
>>> print A("http://www.python.org/", "Python Web site")
<A HREF="http://www.python.org/">Python Web site</A>

					

HTMLgen is available for download at:

http://starship.python.net/crew/friedrich/HTMLgen/html/main.html

Document Template

When talking about generating HTML code, it might also be useful to consider DocumentTemplate, which offers clear separation between Python code and HTML code. DocumentTemplate is part of the Zope objects publishing system, but it can also be used independently. For more information, check out the following:

http:/www.digicool.com/

Persistent CGI

Persistent CGI architecture provides a reasonably high-performance, transparent method of publishing objects as long running processes via the World Wide Web (WWW). The current alternatives to CGI that allow the publishing of long-running processes, such as FastCGI and ILU, have some level of Web server and platform dependencies. Persistent CGI allows a long running process to be published via the WWW on any server that supports CGI, and requires no specific support in the published application.

Note

The latest version of Persistent CGI is bundled with the Zope software:

							
http:/www.digicool.com/

						



Webchecker

Webchecker is not a CGI application but a Web client application. The webchecker.py script is located under the tools/webchecker/ directory of your Python distribution. This tool enables you to check the validity of a site. In other words, given a Web page, it searches for bad links in it, and keeps a record of the links to other sites that exist in the page.

It requests all pages from the Web site via HTTP. After it loads a page, it parses the HTML code and collects the links. Pages are never requested more than once. The links found outside the original tree are treated as leaves, hence, they are checked, but their links won't be followed. Anyway, this script generates a report that contains all bad links and says which page(s) the links are referenced.

The Linbot system, as you will see later in this chapter, has a similar functionality, but its checks are more extensive than Web Checker's.

Check out thewebsucker module, which is also part of the tools/webchecker directory of the source. It mirrors a remote url locally.

LinkChecker

Pylice, a link checker written in Python, was renamed to LinkChecker. With LinkChecker you can check your HTML documents for broken links. The homepage for LinkChecker moved to the following:

http://linkchecker.sourceforge.net

You can find more information at

http://fsinfo.cs.uni-sb.de/~calvin/software/

FastCGI

FastCGI is a fast, open, and secure Web server interface that solves the performance problems inherent in CGI, without introducing the overhead and complexity of proprietary APIs (Application Programming Interfaces).

The FastCGI application library that implements the FastCGI protocol (hiding the protocol details from the developer) is based on code from Open Market, and is in the public domain while being fully supported by Fast Engines. This library makes implementing FastCGI programs as easy as writing CGI applications.

The FastCGI interface combines the best aspects of CGI and vendor APIs. Like CGI, FastCGI applications run in separate, isolated processes. The main advantages of using FastCGI are

  • Performance—FastCGI processes are persistent and do not create a new process for each request.

  • Simplicity—It is easily migrated from CGI.

  • Language independence—Like CGI, FastCGI applications can be written in any language.

  • Process isolation—A buggy FastCGI application cannot crash or corrupt the core server or other applications.

  • Non-proprietary—FastCGI was originally implemented in the Open Market Web server.

  • Architecture independence—The FastCGI interface isn't tied to any particular server architecture.

  • Support for distributed computing—FastCGI provides the ability to run applications remotely.

For details about the library, check out FASTCGI's official Web site at http://www.fastcgi.org/.

The following link forwards you to a white paper that explains the minor details of FASTCGI:

http://www.fastcgi.org/whitepapers/fcgi-whitepaper.shtml

The best place to go for Python FastCGI support is at http://www.digicool.com/releases/fcgi/.

There is also an all Python (no extension module required) implementation of the FastCGI application interface located at http://starship.python.net/crew/robind/.


Last updated on 1/30/2002
Python Developer's Handbook, © 2002 Sams Publishing

< BACKMake Note | BookmarkCONTINUE >

Index terms contained in this section

applications
      DocumentTemplate
      FastCGI 2nd
      httpd_log
      Linbot 2nd
      PyDAV
      site management tools 2nd 3rd 4th 5th 6th 7th
      Webchecker
      WebDAV (World Wide Web Distributed Authoring and Versioning)
      Zebra
development
     Web
            site management tools 2nd 3rd 4th 5th 6th 7th
DocumentTemplate
FastCGI 2nd
GD module
httpd_log
IETF (Internet Engineering Task Force)
Internet
     development for
            site management tools 2nd 3rd 4th 5th 6th 7th
Internet Engineering Task Force (IETF)
Internet Solution Providers (ISPs)
      Python-friendly
ISPs (Internet Solution Providers)
      Python-friendly
Linbot 2nd
modules
      GD
      PIL
      websucker
PIL module
programs
      DocumentTemplate
      FastCGI 2nd
      httpd_log
      Linbot 2nd
      PyDAV
      site management tools 2nd 3rd 4th 5th 6th 7th
      Webchecker
      WebDAV (World Wide Web Distributed Authoring and Versioning)
      Zebra
PyDAV
site management tools 2nd 3rd 4th 5th 6th 7th
software
      DocumentTemplate
      FastCGI 2nd
      httpd_log
      Linbot 2nd
      PyDAV
      site management tools 2nd 3rd 4th 5th 6th 7th
      Webchecker
      WebDAV (World Wide Web Distributed Authoring and Versioning)
      Zebra
tools
      site management 2nd 3rd 4th 5th 6th 7th
utilities
      site management 2nd 3rd 4th 5th 6th 7th
Webchecker
WebDAV (World Wide Web Distributed Authoring and Versioning)
websucker module
World Wide Web
     development for
            site management tools 2nd 3rd 4th 5th 6th 7th
World Wide Web Distributed Authoring and Versioning (WebDAV)
Zebra

© 2002, O'Reilly & Associates, Inc.