Global Java FAQ repository of java, ejb, j2ee related faq

Java XML

3) On calling DOMHelloWorld

The main method of DOMHelloWorld.java will check that the filenameof the xml file has been provided as an argument.

public static void main(String[] args)

{

if (args.length != 1)

{

System.out.println("usage: java DOMHelloWorld hello.xml");

System.exit(0);

}
String xmlfilename = args[0];
}

4) Parsing the XML document

First we must create an instance of the parser (vendor specific parser). This is the same parser we imported earlier in step 2.

DOMParser xmlparser = new DOMParser();

5) Parse the xml file.

This is really easy because the parser does it for you.
All you have to do is call the parse method with the name of the xml file.
xmlparser.parse(xmlfilename);
If you look at the API documentation that comes with the Xerces parser and search for the parse method you will notice something special.
DOMParser and SAXParser are subclasses of XMLParser.
The parse method throws two exceptions, SAXException and java.io.IOException.
If you try to compile the source code without catching the exception, an error will occur (java.io.IOException must be caught).
Under the previous import statements add the importsfor the IOException and SAXException classes .
import

java.io.IOException;
import org.xml.sax.SAXException;
So now we put a try/catch block around the parse method of DOMParser.
try {
xmlparser.parse(xmlfilename);
} catch (IOException e)
{
System.out.println("Error reading xml file: " e.getMessage());
}
catch (SAXException e)
{
System.out.println("Error in parsing: " e.getMessage());
}

6) Accessing the DOM tree

As DOM creates a tree-based structure based on the xml file,we will need to access the information stored in the tree.
To access the tree call getDocument() and this returns a Document.

Document doc = xmlparser.getDocument();

This Document represents the entire XML document. The data we need to access is stored in nodes. These nodes can have child nodes.
Therefore what you need to do next is walk through the Tree structure (Document) and display the data stored in the Nodes.
Now the Fun starts!!!

7) Walking the nodes

Next we will write a method to display the data in a node. The method will be called displayNode and will take one parameter, the start node.

This method will start from the first node and walk through all the nodes in the XML using recursion.

The nodes in the XML document are of different node types.
The node types can be divided into two broad categories; structural nodes and content nodes.
Structural nodes are not actually part of the content in the document but are used to provide syntax structure.

The following is a list of the different types of nodes:

The main nodes we are interested in are DOCUMENT_NODE, ELEMENT_NODE and ATTRIBUTE_NODE.

Should I use DOM or SAX?
If your document is really large, or you only need to extract a few elements from the whole document, then using DOM is not practical.
SAX Looks for startElement events in which the element name is verse, then looks at each character event for the name SAX is more efficient because we only care about a few elements in the document

SAX is good when:

1)You only need to go through a document once
2)You’re only looking for a few items
3)You’re not too concerned with document structure of the context of the elements you’re looking for
4)You don’t have much memory

DOM Builds Java objects for everything in the document, then walks through the tree looking for the name. Using DOM, we create a lot of Java objects we never use, and we go through the document twice (once to parse it and once to search it)

DOM is good when:

1)You need to go through a document more than once
2)You need to manipulate lots of things
3)in the document The structure of the document and the
4)context of the elements is a concern
5)Memory is not an issue

Previous Home