Tuesday, September 7, 2010

XML validation using a DTD

One of the positive aspects of the Extensible Markup Language XML is that it is a flexible way to structure data in a human-readable format, in a cross-platform and technology-independent way. No wonder it is widely used as a way to exchange data between applications, and forms the foundation for XML-RPC, SOAP and other Web Service methods.

But it would be naive to think that every XML document that we get is not only well-formed, but also in the format that we expect it to be, with the right elements and attributes. In this post, we'll examine a strategy to validate incoming XML data in our revTalk application, using a Document Type Definition - a.k.a. DTD.

Part of the XML specification since the very start, a DTD describes the structure of the XML elements and attributes. For more information, I advise you to study the excellent introductory tutorials on W3Schools.com. We're here to use it from revTalk, so let's start by creating a new stack for the user interface.


Simply drop two scrolling fields onto it, name them "XML" and "DTD" respectively, and then group each of them separately so we can put a nice group label on top (I have the memory of a goldfish so I might forget which-is-which ;-) ) Finally drop a button at the bottom of the stack and set its name to "Validate" - and now we're ready to start scripting the button.

First things first, we need to parse the XML text into an XML tree to use all the rev XML commands and functions.

on mouseUp
-- Load DTD and XML into local variables
local tDtdText, tXmlText
put field "DTD" into tDtdText
put field "XML" into tXmlText
-- Parse the XML text into a Tree
local tXmlTree
put revCreateXmlTree(tXmlText, \
false, \ -- must be well-formed
true, \ -- create a tree in memory
false) \ -- no SAX parser messages
into tXmlTree
if tXmlTree is not an integer then
answer error \
"There is an error in the XML structure" & return & \
tXmlTree -- contains the full error message
else
-- Clean up resources
revDeleteXmlTree tXmlTree
end if
end mouseUp

We use the revCreateXmlTree function to parse the XML text into a tree structure. If the XML test is not well-formed then we report the error, otherwise we know we have a valid XML tree structure at our disposal - which we need to cleanup after we're done, using the revDeleteXmlTree command. Now that we have the basics covered, we can add the DTD validation to our script.

on mouseUp
-- Load DTD and XML into local variables
local tDtdText, tXmlText
put field "DTD" into tDtdText
put field "XML" into tXmlText
-- Parse the XML text into a Tree
local tXmlTree
put revCreateXmlTree(tXmlText, \
false, \ -- must be well-formed
true, \ -- create a tree in memory
false) \ -- no SAX parser messages
into tXmlTree
if tXmlTree is not an integer then
answer error \
"There is an error in the XML structure" & return & \
tXmlTree -- contains the full error message
else
-- Validate the XML against the DTD
local tValidationResult
put revXmlValidateDTD(tXmlTree, tDtdText) \
into tValidationResult
if tValidationResult is not empty then
answer error \
"XML structure does not conform to the DTD" & return & \
tValidationResult -- contains the full error message
else
answer information "The XML conforms to the DTD"
end if
-- Clean up resources
revDeleteXmlTree tXmlTree
end if
end mouseUp

If the XML conforms to the DTD, the revXmlValidateDtd function will return empty, otherwise its output contains the validation error. Pretty straightforward, so let's test this with a simple XML and DTD:


When we click the 'Validate' button, we get the message that the XML conforms to the DTD. Exactly what we were hoping for. Now let's change the XML somewhat to see if it fails when our XML text clearly does not conform to the DTD.


And here's the error message that we get on our screen:


With very little scripting, we have added a first layer of defense against incoming XML data that is not up to our specifications. Next time, we'll elaborate on this example and bolster our defenses.

1 comment:

Unknown said...

Software validation is a part of the design validation for a finished device, but is not separately defined in the Quality System regulation.more information