Sunday, September 12, 2010

XML validation using a DTD and Versions

In a previous post, I invetigated how we could validate an XML file in revTalk by using a DTD. As we all know, requirements evolve and our software needs to adapt likewise, especially at the integration end-points. So if we need to accept more data, we should (a) know that it's coming, and (b) make sure it's there before we process the incoming data.

How do we do that? Well, as we've seen beofre, we can check the structure of the XML file. And to make sure we check it correctly, we should introduce a specification version for our XML structure, and attach it to the root node of the document as an attribute. Then all we need to do is check the root node, extract its SpecificationVersion attribute, and we can apply the correct DTD validation.

Doesn't sound too complicated, does it? Let's expand our current stack design a bit so that it looks like this:


As you can see, the field "DTD" was renamed to "DTD 1.0" and another field "DTD 1.1" was added to hold the DTD for specification version 1.1; finally, I moved down the "Validate" button and modified its script:

constant kMaxVersion = 1.1

on mouseUp
-- Load the XML into a local variable
local tXmlText
put field "XML" into tXmlText
-- Parse the XML text into a Tree
local tXmlTree
put revCreateXmlTree(tXmlText, \
false, \
true, \
false) \
into tXmlTree
if tXmlTree is not an integer then
answer error \
"There is an error in the XML structure" & return & \
tXmlTree -- contains the full error message
else
-- Validate the root node
local tRootNode
put revXmlRootNode(tXmlTree) into tRootNode
if tRootNode is not "RootNode" then
answer error \
"The XML root node should be 'RootNode'"
else
-- Validate the SpecVersion
local tSpecVersion
put revXmlAttribute(tXmlTree, tRootNode, \
"SpecificationVersion") into tSpecVersion
if tSpecVersion begins with "xmlerr" then
answer error \
"The SpecificationVersion is missing"
else if tSpecVersion > kMaxVersion then
answer error \
"The SpecificationVersion " && tSpecVersion && \
"is newer than" && kMaxVersion
else
-- Load the corresponding DTD
local tDtdText
if tSpecVersion is 1.0 then
put field "DTD 1.0" into tDtdText
else
put field "DTD 1.1" into tDtdText
end if
-- Validate the XML against the DTD
local tValidationResult
put revXmlValidateDTD(tXmlTree, tDtdText) \
into tValidationResult
if tValidationResult is not empty then
answer error \
"The XML structure does not conform" & \
return & tValidationResult
else
answer information "The XML conforms to the DTD"
end if
end if
end if
-- Cleanup
revDeleteXmlTree tXmlTree
end if
end mouseUp


So how does this new version work?
- first, it parses the XML document
- next, it verifies the root node
- next, it checks the specification version
- next, it loads the appropriate DTD
- finally, it validates the XML against the DTD

If we test it, it correctly informs us that the XML document conforms to our specification version 1.1. What happens if we change the SpecificationVersion to 1.2?


Then we get this error message:


And finally, what happens if we change the SpecificationVersion to the original 1.0?


Then we get this error message:


This is a much safer way to check the incoming data in XML format. Unfortunately, a Document Type Definition is quite a limited way of XML validation. So next time, we'll improve our defenses again, by incorporating XML Schemas.

No comments: