Saturday, May 2, 2009

On XML Document Type Declarations

Document Type Declarations (DTD) are declarations that can be placed at the top of your custom XML file that can aid in creating XML files using a specific XML Layout. Think of it as a smaller, embedded version of the XSD file. The method XmlDocument.GetElementByIddoesn't work without a DTD declaration on the XML file to specify which attribute of an XML Element is the "ID" element of an XML file. For starters, let's take a look at the following XML file:


<?xml version="1.0" encoding="utf-8" ?>
<widgets>
<widget id="one">One</widget>
<widget id="two">Two</widget>
<widget id="three">Three</widget>
</widgets>


Now, this XML File obviously represents a series of widgets. I might think that I can easily access a particular widget within the XML file using the GetElementById method simply by calling the following code:


string location = Assembly.GetExecutingAssembly().Location;
string directory = Path.GetDirectoryName(location);
string filename = Path.Combine(directory, "XMLFile1.XML");
XmlDocument doc = new XmlDocument();
doc.Load(filename);
XmlElement element = doc.GetElementById("two");
Console.WriteLine(element.InnerText);
Console.ReadLine();


If all goes well, I should see "Two" printed out on the screen, correct? No.

Just because I have an attribute on each widget called "id" doesn't necessarily mean that this attribute is considered to be the actual ID of a widget. If I run this code using the XML file as it's listed above, I'll get a NullReferenceException stating that the XmlElement is null, this is because the GetElementById method uses the ID as specified in the DTD declaration for the XML file, and I have no such declaration. Without going into an immense amount of details (you can find more details on DTD Declarations here), I'll need to add a DTD to my XML file in order to find the proper XML Element I want to return. Adding the declaration would produce the following XML file:


<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE widgets [
<!ELEMENT widgets (widget)*>
<!ELEMENT widget (#PCDATA)>
<!ATTLIST widget id ID #IMPLIED >
]>
<widgets>
<widget id="one">One</widget>
<widget id="two">Two</widget>
<widget id="three">Three</widget>
</widgets>


This declaration, located at the top of the file, just under the XML Header, specifies a few things:

1. The DocType is called "widgets" and the root of the document should be called "widgets".
2. The "widgets" element should have zero or more elements beneath it called "widget".
3. Each "widget" element can have data within it.
4. The ID of each "widget" element is an attribute called "id", and the attribute is not required.

Changing the XML file to this new format, and then rerunning our code produces the proper output, and "Two" is displayed on the screen.