Wednesday, June 24, 2009

How to Load an XmlDocument and Completely Ignore DTD

A question came up on the Forums today from someone looking to ignore the DOCTYPE tag on an XML file while loading an XML file into an XmlDocument class instance without first reading the whole file and using something like Regex to replace the element. In other words, he was looking for a fast performing solution.

The XmlDocument class loads XML files via the Load or LoadXml methods, which all ultimately convert to an XmlTextReader before reading the XML. There's one exception to this rule, however, and that's the Load overload that accepts an XmlReader.

More than this, it's the XmlReader, and not the XmlDocument that resolves DTD validation arguments. It does this by using the XmlResolver set in the XmlReaderSettings.XmlResolver property.

To solve this issue, create an instance of XmlReaderSettings, and allow DTD processing by setting ProhibitDTD to false, but then remove the ability for the XmlReader to resolve the address specified in the DOCTYPE element by setting the XmlResolver property to null. After doing this, you can safely create an XmlReader, and pass the reader into the Load method of the XmlDocument, and the XmlDocument will load the specified XML file without validating the document.

The following code assumes you have your XML file loaded into a Stream named "xmlStream".

// Create an XmlReaderSettings object.  
XmlReaderSettings settings = new XmlReaderSettings();

// Set XmlResolver to null, and ProhibitDtd to false.
settings.XmlResolver = null;
settings.ProhibitDtd = false;

// Now, create an XmlReader. This is a forward-only text-reader based
// reader of Xml. Passing in the settings will ensure that validation
// is not performed.
XmlReader reader = XmlTextReader.Create(xmlStream, settings);

// Create your document, and load the reader.
XmlDocument doc = new XmlDocument();