    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: SAX - A Simple API for XML</title>
        <link>https://members.accu.org/index.php/journals/515</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #34 - Oct 1999 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c170/">34</a>
                    (11)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c170-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c170+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;SAX - A Simple API for XML</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 26 October 1999 17:50:55 +01:00 or Tue, 26 October 1999 17:50:55 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a></h2>
</div>
<p>XML is big business. Although it's far too early to declare
whether it really is the killer content format, its popularity
coupled with increasing support from industry heavyweights ensures
it will be around for a good while yet. If you're interested in
what XML can do for you then read on. This is the first of three
articles on XML in applications, and introduces SAX, the Simple API
for XML. The remaining articles will introduce DOM, the Document
Object Model and XSL, the XML Stylesheet Language.</p>
<p>There are two main ways to process XML for use in an
application. The first is an event-based approach, with handler
methods being fired in response to certain parsing events (for
example, a start element, some data, an error). The second approach
is to build an internal tree representation of the XML document in
order to query or traverse it.</p>
<p>The standard for the first approach is called SAX, the Simple
API for XML. The standard for the second approach is called DOM
(Document Object Model) level 1, and is a W3C recommendation. This
article will describe SAX, how it came about, and how to use it to
parse your XML documents.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e26" id="d0e26"></a>A brief history
of SAX</h2>
</div>
<p>The first XML parsers began appearing in early 1997. These early
applications mainly displayed XML documents as tree views. In late
1997 on the XML-Dev mailing list, Peter Murray-Rust (author of the
JUMBO application for viewing CML (Chemical Markup Language)
documents) insisted that parser writers should all support a common
Java event-based API. In discussions with Tim Bray (author of the
Lark parser) and David Megginson (author of Microstar's
&AElig;lfred parser), the idea for SAX was born. The design
discussion took place publicly on the XML-Dev mailing list, and
many people contributed ideas, comments, and criticisms. The first
draft interfaces of SAX was released in January 1998, and shortly,
SAX 1.0 was released in June of 1998.</p>
<p>A SAX compliant XML parser reports parsing events to the
application using callbacks on an interface implemented by the
handler class. This isolation of reporting from processing logic
enables the same SAX parser to be used with different handlers for
different purposes (e.g. validation, display, data import). XML
parser implementations using SAX have been written in Java, Python,
Perl and C++. Sun, IBM, Oracle and DataChannel/Microsoft have all
produced Java XML parsers with SAX 1.0 drivers.</p>
<p>SAX 1.0 consists of two Java packages, <tt class=
"literal">org.xml.sax</tt> and <tt class=
"literal">org.xml.sax.helpers</tt>.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e41" id="d0e41"></a><tt class=
"literal">org.xml.sax</tt> interfaces</h2>
</div>
<p>Six interfaces are defined in <tt class=
"literal">org.xml.sax</tt>. The following four are the most
helpful.</p>
<div class="variablelist">
<dl>
<dt><span class="term">DocumentHandler</span></dt>
<dd>
<p>This is the main interface that most SAX applications implement:
if the application needs to be informed of basic parsing events, it
implements this interface and registers an instance with the SAX
parser using the setDocumentHandler method. The parser uses the
instance to report basic document-related events like the start and
end of elements and character data.</p>
</dd>
<dt><span class="term">ErrorHandler</span></dt>
<dd>
<p>If a SAX application needs to implement customised error
handling, it must implement this interface and then register an
instance with the SAX parser using the parser's setErrorHandler
method. The parser will then report all errors and warnings through
this interface.</p>
</dd>
<dt><span class="term">DTDHandler</span></dt>
<dd>
<p>If a SAX application needs information about notations and
unparsed entities, then the application implements this interface
and registers an instance with the SAX parser using the parser's
setDTDHandler method. The parser uses the instance to report
notation and unparsed entity declarations to the application.</p>
</dd>
<dt><span class="term">Parser</span></dt>
<dd>
<p>All SAX parsers must implement this basic interface: it allows
applications to register handlers for different types of events and
to initiate a parse from a URI, or a character stream.</p>
</dd>
</dl>
</div>
<p>All SAX parsers must also implement a zero-argument constructor
(though other constructors are also allowed).</p>
<p>SAX parsers are reusable but not re-entrant: the application may
reuse a parser object (possibly with a different input source) once
the first parse has completed successfully, but it may not invoke
the parse() methods recursively within a parse.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e80" id="d0e80"></a>org.xml.sax
classes</h2>
</div>
<div class="variablelist">
<dl>
<dt><span class="term">HandlerBase</span></dt>
<dd>
<p>This class implements the default behaviour for four SAX
interfaces: EntityResolver, DTDHandler, DocumentHandler, and
ErrorHandler.</p>
<p>Application writers can extend this class when they need to
implement only part of an interface; parser writers can instantiate
this class to provide default handlers when the application has not
supplied its own.</p>
</dd>
<dt><span class="term">InputSource</span></dt>
<dd>
<p>This class allows a SAX application to encapsulate information
about an input source in a single object, which may include a
public identifier, a system identifier, a byte stream (possibly
with a specified encoding), and/or a character stream.</p>
<p>There are two places that the application will deliver this
input source to the parser: as the argument to the Parser.parse
method, or as the return value of the EntityResolver.resolveEntity
method.</p>
</dd>
</dl>
</div>
<p>The org.xml.sax package also defines two exceptions for use with
SAX applications: SAXException and SAXParseException.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e102" id="d0e102"></a>Using SAX to
get what you want</h2>
</div>
<p>It's a bit of a no-brainer to write your own XML parser when
there are so many out there already. It's far more productive to
reap the benefits of someone else's hard labour! Here's how you use
an existing parser:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>Create an instance of the parser object.</p>
</li>
<li>
<p>Register your handler with the parser.</p>
</li>
<li>
<p>Wrap your input with an <tt class="classname">InputSource</tt>
object.</p>
</li>
<li>
<p>Pass the <tt class="classname">InputSource</tt> to the
<tt class="methodname">parse()</tt> method of the parser.</p>
</li>
</ul>
</div>
<p>As an example, here is how to achieve the above using the IBM
XML parser, XML4J.</p>
<pre class="programlisting">
import org.xml.sax.*;
import com.ibm.xml.parsers.*    // SAXParser
import java.io.*;

public void ImportXML(File inputFile)
{
  SAXParser parser = new SAXParser();
  EchoHandler eHandler = new EchoHandler();
  parser.setDocumentHandler( eHandler );
  InputSource iStream = new InputSource(
              new FileInputStream(inputFile));
  parser.parse( iStream );
}
</pre>
<p><span class="emphasis"><em>Listing 1 - Using the IBM XML parser
with a SAX compliant handler.</em></span></p>
<p>Simple isn't it?</p>
<p>In the above example, EchoHandler is a handler I wrote to echo
the input file to the standard output. Here is the implementation
of it:</p>
<pre class="programlisting">
// EchoHandler.java - a SAX handler for echoing back input XML

package org.accu.cornish.xml;
import org.xml.sax.*;

public class EchoHandler extends HandlerBase
{
  protected final String spaces = &quot;   &quot;;
  protected int numspaces = 0;

  public EchoHandler()    { }

  private void spaces()
  {
    for (int i = 0; i &lt; numspaces; ++i)
    {
      System.out.print(spaces);
    }
  }

  public void startElement
    (String parm1, AttributeList parm2) 
    throws org.xml.sax.SAXException
  {
    spaces();
    System.out.println(&quot;&lt;&quot; + parm1 + &quot;&gt;&quot;);
    ++numspaces;
  }

  public void endElement(String parm1) 
    throws org.xml.sax.SAXException
  {
    --numspaces;
    spaces();
    System.out.println(&quot;&lt;/&quot; + parm1 + &quot;&gt;&quot;);
  }

  public void characters
    (char[] parm1, int parm2, int parm3) 
    throws org.xml.sax.SAXException
  {
    spaces();
    for (int i = 0; i &lt; parm3; ++i)
    {
      System.out.print(parm1[parm2 + i]);
    }
    System.out.println();
  }
}
</pre>
<p><span class="emphasis"><em>Listing 2 -
EchoHandler.java</em></span></p>
<p>The SAX API provides a class called <tt class=
"classname">HandlerBase</tt> that implements all the handler
interfaces, but provides no-op versions of all the methods. Since
the EchoHandler only needs to override a small part of the four
interfaces, I have derived EchoHandler from HandlerBase. This
handler only implements a subset of the org.xml.sax.DocumentHandler
interface, but it's enough to demonstrate how to use SAX compliant
parsers.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e150" id="d0e150"></a>A better
example</h2>
</div>
<p>In the following example, we will be parsing a simple XML
document which just contains tags (elements) and data; the elements
have no attributes. We use a Document Type Definition (DTD) to
define a set of rules about the XML structure. Here is the DTD our
example XML has to conform to:</p>
<pre class="programlisting">
&lt;!ELEMENT recipe (recipe_name, author, meal, preptime, cooktime, ingredients, directions)&gt;
&lt;!ELEMENT ingredients (item)+&gt;
&lt;!ELEMENT meal (#PCDATA?, course?)&gt;
&lt;!ELEMENT recipe_name (#PCDATA)&gt;
&lt;!ELEMENT author (#PCDATA)&gt;
&lt;!ELEMENT course (#PCDATA)&gt;
&lt;!ELEMENT item (#PCDATA)&gt;
&lt;!ELEMENT directions (#PCDATA)&gt;
&lt;!ELEMENT preptime (#PCDATA)&gt;
&lt;!ELEMENT cooktime (#PCDATA)&gt;
</pre>
<p><span class="emphasis"><em>Listing 3 -
Recipe.dtd</em></span></p>
<p>What does the DTD tell us?</p>
<p>Line 1 says our root element is called &quot;recipe&quot;, and has seven
sub-elements, all compulsory.</p>
<p>Line 2 says the &quot;ingredients&quot; element has one or more &quot;item&quot;
elements.</p>
<p>Line 3 says the &quot;meal&quot; element may have some text, and it may
have a &quot;course&quot; sub-element.</p>
<p>The other lines show that the remaining elements contain
text.</p>
<p>An example of XML conforming to this DTD is shown below:</p>
<pre class="programlisting">
&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;!DOCTYPE recipe SYSTEM &quot;recipe.dtd&quot;&gt;

&lt;recipe&gt;
  &lt;author&gt;Steve Cornish&lt;/author&gt;
  &lt;recipe_name&gt;Thick Veg Stew&lt;/recipe_name&gt;
  &lt;meal&gt;Dinner
    &lt;course&gt;Main&lt;/course&gt;
  &lt;/meal&gt;
  &lt;preptime&gt;15 minutes&lt;/preptime&gt;
  &lt;cooktime&gt;30 minutes&lt;/cooktime&gt;
  &lt;ingredients&gt;
    &lt;item&gt;2 carrots&lt;/item&gt;
    &lt;item&gt;2 parsnips&lt;/item&gt;
    &lt;item&gt;2 leeks&hellip;&lt;/item&gt;
  &lt;/ingredients&gt;
  &lt;directions&gt;Chop the vegetables into large 
                discs, etc. ... &lt;/directions&gt;
&lt;/recipe&gt;
</pre>
<p><span class="emphasis"><em>Listing 4 -
VegetableStew.xml</em></span></p>
<p>The first line of VegetableStew.xml is compulsory as it declares
that the document conforms to the XML 1.0 Standard (see <a href=
"http://www.w3.org/xml" target="_top">www.w3.org/xml</a>). The
second line declares that the rules for out &quot;recipe&quot; tag can be
found in the file &quot;recipe.dtd&quot;.</p>
<p>Our handler has to be able to extract the data from the recipe,
and populate a java Recipe object. The public and package interface
for the Recipe class is shown here:</p>
<pre class="programlisting">
package org.accu.cornish.xml;

import java.util.Vector;
import java.io.*;

public class Recipe
{
  public Recipe()  { /* &hellip; */ }

  public void setName(String name)
  { /* &hellip; */ }
  public void setAuthor(String author)
  { /* &hellip; */ }
  public void setPreparationTime(String time)
  { /* &hellip; */ }
  public void setCookingTime(String time)
  { /* &hellip; */ }
  public void setDirections(String directions)
  { /* &hellip; */ }
  public void setMeal(String meal)
  { /* &hellip; */ }
  public void setCourse(String course)
  { /* &hellip; */ }
  public void addIngredients(String name)
  { /* &hellip; */ }
  public String toString()
  { /* &hellip; */ }
  void printSelfAsXML()
  {    /* print self as XML  */    }
}
</pre>
<p><span class="emphasis"><em>Listing 5 -
Recipe.java</em></span></p>
<p>Note that <tt class="methodname">printSelfAsXML()</tt> has no
visibility modifier - this means it is visible to the package
<tt class="literal">org.accu.cornish.xml</tt>. This is fine by me
since I want my other classes to be able to use this method for
diagnostic purposes.</p>
<p>Now to write the handler. I think a good strategy is to run
through the source file, and store all the tag data in a HashTable.
Then after parsing, we can request the constructed Recipe object
from the handler, and the handler can create it on demand.</p>
<pre class="programlisting">
package org.accu.cornish.xml;

import org.xml.sax.*;
import java.util.HashMap;
import java.util.Stack;

public class RecipePopulator extends HandlerBase
{
  protected  HashMap  properties;
  protected  Stack  tagStack;
  private  String  currentTag;
  private  int  item_suffix = 0;

  public RecipePopulator()
  {
    properties = new HashMap();
    tagStack = new Stack();
  }

  public void startElement
    (String parm1, AttributeList parm2)
    throws SAXException
  {
    currentTag = 
      (String) tagStack.push(parm1);
    if (parm1.equals(&quot;item&quot;))
    {
       ++item_suffix;
    }
  }

  public void endElement(String parm1)
    throws SAXException
  {
    if (parm1.equals(&quot;ingredients&quot;))
    {
      item_suffix = 0;
    }
    currentTag = (String) tagStack.pop();
    if (currentTag == null)
    {
      throw new SAXException(&quot; End tag without start tag: &quot; + parm1);
    }
  }

  public void characters
    (char[] parm1, int parm2, int parm3) 
    throws SAXException
  {
    // first, do we have a current tag?
    if (currentTag == null ||
        currentTag.equals(&quot;&quot;))
    {
      throw new SAXException(&quot;Data with no element&quot;);
    }

    // extract string
    String data = new String( parm1,
                           parm2, parm3);
    String keyname = currentTag;
    // if the currentTag is &quot;item&quot; 
    // add a unique suffix
    if (currentTag.equals(&quot;item&quot;))
    {
      keyname += item_suffix;
    }
    // strip whitespace
    properties.put( keyname, data.trim() );
  }

  public Recipe getRecipe() 
    throws InstantiationException
  { /* create and populate Recipe object */ }
}
</pre>
<p><span class="emphasis"><em>Listing 6 -
RecipePopulator.java</em></span></p>
<p>The RecipePopulator class maintains two collections; a HashMap
of tag and data pairs, and a Stack of the tag names. Both HashMap
and Stack are defined in java.util. Because the &quot;ingredients&quot; tag
can have many &quot;item&quot; tags, an index has to be suffixed to the key
to prevent overwriting the previous items.</p>
<p>The methods <tt class="methodname">startElement()</tt> and
<tt class="methodname">endElement()</tt> maintain the value of the
current tag and any suffix values for the &quot;item&quot; tags.</p>
<p>The method <tt class="methodname">characters()</tt> does the
work of putting the key / value pairs into the HashMap.</p>
<p>Our <tt class="methodname">getRecipe()</tt> method has to check
that the compulsory fields of the target Recipe object exist. If
they don't we throw an InstantiationException (java.lang). If they
do, we can get on with the work of creating the Recipe object.</p>
<pre class="programlisting">
public Recipe getRecipe() 
  throws InstantiationException
{
  // check the compulsory fields
  String author = 
  (String) properties.get( &quot;author&quot; );
  String name = 
  (String) properties.get( &quot;recipe_name&quot; );
  String prepTime = 
    (String) properties.get( &quot;preptime&quot; );
  String cookTime = 
    (String) properties.get( &quot;cooktime&quot; );
  String directions =
    (String) properties.get( &quot;directions&quot; );
  if ( author == null ||
     name == null ||
     prepTime == null ||
     cookTime == null ||
     directions == null )
  {
    throw new InstantiationException(
         &quot;Cannot create Recipe object&quot;
         + &quot; - missing elements&quot;);
  }

  // otherwise, we can carry on

  Recipe r = new Recipe();
  r.setAuthor( author );
  r.setName( name );
  r.setMeal((String) properties.get(&quot;meal&quot;));
  r.setCourse(
    (String) properties.get( &quot;course&quot; ) );
  r.setPreparationTime( prepTime );
  r.setCookingTime( cookTime );
  r.setDirections( directions );

  // now, add the ingredients
  int item_index = 1;
  String ingredient = null;

  while ((ingredient = (String) 
  properties.get(&quot;item&quot; + item_index))
                              != null)
  {
    ++item_index;
    r.addIngredients(ingredient);
  }
  return r;
}
</pre>
<p><span class="emphasis"><em>Listing 7 -
RecipePopulator.getRecipe()</em></span></p>
<p>Although this is a highly trivial example (the elements have no
attributes), it is not hard to see that a handler could be written
to populate data objects belonging to an existing application. For
example, what if the Recipe class above belonged to a recipe
catalogue application we wrote? Imagine that the only way to enter
new recipes was to fill out a GUI form by hand. Using the steps
above, we can easily provide for import of new recipes using
XML.</p>
<p>SAX is a very simple API (hence the name), but its simplicity is
also its strength. SAX parsers are best suited to processing XML
documents that only need to be read, and only need to be read once.
In the next article, I will offer a different approach to parsing
XML; using DOM, the Document Object Model.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e233" id="d0e233"></a>Free SAX
parsers</h2>
</div>
<p>There are a number of free XML Parsers that support the SAX 1.0
interface. Here are the main players:</p>
<div class="variablelist">
<dl>
<dt><span class="term">XML4J (IBM)</span></dt>
<dd>
<p><a href="http://www.alphaworks.ibm.com/formula/xml" target=
"_top">http://www.alphaworks.ibm.com/formula/xml</a></p>
</dd>
<dt><span class="term">&AElig;lfred (Microstar)</span></dt>
<dd>
<p><a href="http://www.microstar.com/aelfred.html" target=
"_top">http://www.microstar.com/aelfred.html</a></p>
</dd>
<dt><span class="term">Java Project X (Sun)</span></dt>
<dd>
<p><a href=
"http://developer.java.sun.com/developer/earlyAccess/xml/index.html"
target=
"_top">http://developer.java.sun.com/developer/earlyAccess/xml/index.html</a></p>
</dd>
<dt><span class="term">XML Parser for Java 2 (Oracle)</span></dt>
<dd>
<p><a href="http://technet.oracle.com" target=
"_top">http://technet.oracle.com</a></p>
</dd>
<dt><span class="term">XP (James Clark)</span></dt>
<dd>
<p><a href="http://www.jclark.com/xml/xp/index.html" target=
"_top">http://www.jclark.com/xml/xp/index.html</a></p>
</dd>
</dl>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e274" id="d0e274"></a>Further
References</h2>
</div>
<p>David Megginson's SAX site - <a href=
"http://www.megginson.com/SAX" target=
"_top">http://www.megginson.com/SAX</a></p>
<p>SAX online API - <a href=
"http://www.megginson.com/SAX/javadoc/packages.html" target=
"_top">http://www.megginson.com/SAX/javadoc/packages.html</a></p>
<p>The World Wide Web Consortium - <a href="http://www.w3.org/"
target="_top">http://www.w3.org/</a></p>
<p>XML-Dev mailing list - <tt class="email">&lt;<a href=
"mailto:xml-dev@ic.ac.uk">xml-dev@ic.ac.uk</a>&gt;</tt></p>
<p>A good XML site from Seybold and O'Reilly - <a href=
"http://www.xml.com" target="_top">www.xml.com</a></p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
