This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Chapter 9. XML Processing 9.1. Diving in These next two chapters are about XML processing in Python. It would be helpful if you already knew what an XML document looks like, that it's made up of structured tags to form a hierarchy of elements, and so on. If this doesn't make sense to you, there are many XML tutorials (http://directory.google.com/Top/Computers/Data_Formats/Markup_Languages/XML/Resources/FAQs,_Help,_and_Tutorials/ that can explain the basics. If you're not particularly interested in XML, you should still read these chapters, which cover important topics like Python packages, Unicode, command line arguments, and how to use getattr for method dispatching. Being a philosophy major is not required, although if you have ever had the misfortune of being subjected to the writings of Immanuel Kant, you will appreciate the example program a lot more than if you majored in something useful, like computer science. There are two basic ways to work with XML. One is called SAX ("Simple API for XML"), and it works by reading the XML a little bit at a time and calling a method for each element it finds. (If you read Chapter 8, HTML Processing , this should sound familiar, because that's how the sgmllib module works.) The other is called DOM ("Document Object Model"), and it works by reading in the entire XML document at once and creating an internal representation of it using native Python classes linked in a tree structure. Python has standard modules for both kinds of parsing, but this chapter will only deal with using the DOM. The following is a complete Python program which generates pseudo-random output based on a context-free grammar defined in an XML format. Don't worry yet if you don't understand what that means; you'll examine both the program's input and its output in more depth throughout these next two chapters. Example 9.1. kgp.py If you have not already done so, you can download this and other examples (http://diveintopython.org/download/diveintopython-examples-5.4.zip) used in this book. """Kant Generator for Python Generates mock philosophy based on a context-free grammar Usage: python kgp.py [options] [source] Options:-g . .., --grammar=. .. use specified grammar file or URL-h, --help show this help-d show debugging information while parsing Examples: kgp.py generates several paragraphs of Kantian philosophy kgp.py -g husserl.xml generates several paragraphs of Husserl kpg.py "<xref id='paragraph'/>" generates a paragraph of Kant kgp.py template.xml reads from template.xml to decide what to generate """ from xml.dom import minidom import random import toolbox import sys Dive Into Python 115 import getopt _debug = 0 class NoSourceError(Exception): pass class KantGenerator: """generates mock philosophy based on a context-free grammar""" def __init__(self, grammar, source=None): self.loadGrammar(grammar) self.loadSource(source and source or self.getDefaultSource()) self.refresh() def _load(self, source): """load XML input source, return parsed XML document...
View Full Document
This document was uploaded on 08/10/2011.
- Spring '11