Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Friday, December 31, 2010

XML validation using a Schema

In earlier posts, I examined how we can use the built-in revXML library in LiveCode to validate XML data against a DTD, later refining it with a version check to match evolving requirements. Unfortunately, a Document Type Definition is quite a limited way of XML validation. So this time, we'll improve our defenses again, by incorporating XML Schemas.

Whereas a DTD is limited to defining the basic structure of the XML in terms of elements and attributes, XML Schemas allow you to define validation on the actual content of the elements and attributes. So you can be sure that an element defined as "xs:date" is actually a valid date, or that an attribute defines as "xs:positiveInteger" is actually a positive integer, etc. A full explanation of XML Schemas is beyond the scope of this post, you'll find plenty of information around the web - a good first stop is this W3Schools XML Schema tutorial.

This all sounds very good, but here's the rub: the revXML library offers no built-in support for XML Schemas. So yet again we turn to Java, with its built-in XML Validation API. We can easily execute Java code using LiveCode's shell function - so let's start by writing the XmlValidateSchema class:
import java.io.File;
import java.io.IOException;

import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class XmlValidateSchema {
public static void main(String[] args) throws SAXException, ParserConfigurationException, IOException {
final File xmlFile = new File(args[0]);
final File xsdFile = new File(args[1]);
// Load the XML file
final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
final Document document = docBuilder.parse(xmlFile);
// Load the XSD file
final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final Schema schema = schemaFactory.newSchema(xsdFile);
final Validator validator = schema.newValidator();
// Validate the XML document against the XSL schema
final Source source = new DOMSource(document);
validator.validate(source);
}
}

In keeping with earlier Java examples, the code is a bit lazy when it comes to exception handling: if any exception is thrown, it will end up in the output of our shell function call. The only important thing to remember is that the first parameter is the XML file, and the second is the XML Schema Definition (XSD) file.

Let's go to LiveCode and create a new stack for the user interface.



As you can see, there's a field for the Schema text, a field for the XML text, and a button to Validate the XML against the Schema. Since it's perhaps a tad small, here's the content of the Schema field:
<?xml version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.quartam.com"
xmlns="http://www.quartam.com"
elementFormDefault="qualified">

<xs:element name="RootNode" type="RootNode"/>

<xs:complexType name="RootNode">
<xs:sequence>
<xs:element name="BranchNode" type="BranchNode" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="SpecVersion" type="xs:string"/>
</xs:complexType>

<xs:complexType name="BranchNode">
<xs:sequence>
<xs:element name="LeafNode" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

And here's the content of the XML field:
<?xml version="1.0"?>
<RootNode
xmlns="http://www.quartam.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.quartam.com schema.xsd"
SpecVersion="1.0">
<BranchNode>
<LeafNode>The first leaf node</LeafNode>
<LeafNode>The second leaf node</LeafNode>
<LeafNode>The third leaf node</LeafNode>
</BranchNode>
<BranchNode>
<LeafNode>The fourth leaf node</LeafNode>
<LeafNode>The fifth leaf node</LeafNode>
</BranchNode>
</RootNode>

After saving the stack, we copy the compiled XmlValidateSchema.class file into the same directory as the stack. Now we can write the script for the 'Validate' button:
on mouseUp
--> write Schema and XML to temporary files
local tSchemaFile, tXmlFile
put the tempName into tSchemaFile
put field "Schema" into URL ("file:" & tSchemaFile)
put the tempName into tXmlFile
put field "XML" into URL ("file:" & tXmlFile)
--> assemble the shell command
local tShellCommand
put "java XmlValidateSchema" && \
ShellPath(tXmlFile) && \
ShellPath(tSchemaFile) \
into tShellCommand
--> execute the shell command
local tHideConsoleWindows, tDefaultFolder, tShellResult
put the hideConsoleWindows into tHideConsoleWindows
set the hideConsoleWindows to true
put the defaultFolder into tDefaultFolder
set the defaultFolder to AbsolutePathFromStack()
put shell(tShellCommand) into tShellResult
set the defaultFolder to tDefaultFolder
set the hideConsoleWindows to tHideConsoleWindows
--> cleanup the temporary files
delete file tSchemaFile
delete file tXmlFile
if tShellResult is not empty then
answer error tShellResult
end if
end mouseUp

function AbsolutePathFromStack pFileName
local tAbsolutePath
put the effective filename of this stack into tAbsolutePath
set the itemDelimiter to slash
if pFileName is not empty then
put pFileName into item -1 of tAbsolutePath
else
delete item -1 of tAbsolutePath
end if
return tAbsolutePath
end AbsolutePathFromStack

function ShellPath pPath
if the platform is "Win32" then
put quote & pPath & quote into pPath
else
replace space with backslash & space in pPath
end if
return pPath
end ShellPath

This time around, we didn't have to fiddle with the Java classpath, as the XML Validation API is built-in. However, we had to write the XML and Schema to temporary files, to avoid length limitations in the shell command. If we now make a deliberate mistake, say change one of the 'LeafNode' elements into a 'BeafNode' element, we see this error:



Again, the image is a bit small, so here's the content of the error:
ERROR:  'cvc-complex-type.2.4.a: Invalid content was found starting with element 'BeafNode'. One of '{"http://www.quartam.com":LeafNode}' is expected.'
Exception in thread "main" org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'BeafNode'. One of '{"http://www.quartam.com":LeafNode}' is expected.
at com.sun.org.apache.xerces.internal.jaxp.validation.Util.toSAXParseException(Util.java:109)
at com.sun.org.apache.xerces.internal.jaxp.validation.ErrorHandlerAdaptor.error(ErrorHandlerAdaptor.java:104)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:382)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(XMLSchemaValidator.java:429)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.reportSchemaError(XMLSchemaValidator.java:3185)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStartElement(XMLSchemaValidator.java:1831)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.startElement(XMLSchemaValidator.java:705)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorHandlerImpl.startElement(ValidatorHandlerImpl.java:335)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.closeStartTag(ToXMLSAXHandler.java:205)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.characters(ToXMLSAXHandler.java:524)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.characters(ToXMLSAXHandler.java:467)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:229)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:121)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:85)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:615)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:661)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:300)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.process(ValidatorImpl.java:220)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:141)
at javax.xml.validation.Validator.validate(Validator.java:82)
at XmlValidateSchema.main(XmlValidateSchema.java:31)

Thanks to the combination of LiveCode and Java, we can develop cross-platform solution quickly, without having to give up the power of existing libraries. Unfortunately, loading Java every time for a shell call is not the optimal solution, so in another post, we'll investigate how we can run Java code using LiveCode's 'process' communication. Stay tuned...

Sunday, September 12, 2010

XML validation using a DTD and Versions

In a previous post, I invetigated how we could validate an XML file in revTalk by using a DTD. As we all know, requirements evolve and our software needs to adapt likewise, especially at the integration end-points. So if we need to accept more data, we should (a) know that it's coming, and (b) make sure it's there before we process the incoming data.

How do we do that? Well, as we've seen beofre, we can check the structure of the XML file. And to make sure we check it correctly, we should introduce a specification version for our XML structure, and attach it to the root node of the document as an attribute. Then all we need to do is check the root node, extract its SpecificationVersion attribute, and we can apply the correct DTD validation.

Doesn't sound too complicated, does it? Let's expand our current stack design a bit so that it looks like this:


As you can see, the field "DTD" was renamed to "DTD 1.0" and another field "DTD 1.1" was added to hold the DTD for specification version 1.1; finally, I moved down the "Validate" button and modified its script:

constant kMaxVersion = 1.1

on mouseUp
-- Load the XML into a local variable
local tXmlText
put field "XML" into tXmlText
-- Parse the XML text into a Tree
local tXmlTree
put revCreateXmlTree(tXmlText, \
false, \
true, \
false) \
into tXmlTree
if tXmlTree is not an integer then
answer error \
"There is an error in the XML structure" & return & \
tXmlTree -- contains the full error message
else
-- Validate the root node
local tRootNode
put revXmlRootNode(tXmlTree) into tRootNode
if tRootNode is not "RootNode" then
answer error \
"The XML root node should be 'RootNode'"
else
-- Validate the SpecVersion
local tSpecVersion
put revXmlAttribute(tXmlTree, tRootNode, \
"SpecificationVersion") into tSpecVersion
if tSpecVersion begins with "xmlerr" then
answer error \
"The SpecificationVersion is missing"
else if tSpecVersion > kMaxVersion then
answer error \
"The SpecificationVersion " && tSpecVersion && \
"is newer than" && kMaxVersion
else
-- Load the corresponding DTD
local tDtdText
if tSpecVersion is 1.0 then
put field "DTD 1.0" into tDtdText
else
put field "DTD 1.1" into tDtdText
end if
-- Validate the XML against the DTD
local tValidationResult
put revXmlValidateDTD(tXmlTree, tDtdText) \
into tValidationResult
if tValidationResult is not empty then
answer error \
"The XML structure does not conform" & \
return & tValidationResult
else
answer information "The XML conforms to the DTD"
end if
end if
end if
-- Cleanup
revDeleteXmlTree tXmlTree
end if
end mouseUp


So how does this new version work?
- first, it parses the XML document
- next, it verifies the root node
- next, it checks the specification version
- next, it loads the appropriate DTD
- finally, it validates the XML against the DTD

If we test it, it correctly informs us that the XML document conforms to our specification version 1.1. What happens if we change the SpecificationVersion to 1.2?


Then we get this error message:


And finally, what happens if we change the SpecificationVersion to the original 1.0?


Then we get this error message:


This is a much safer way to check the incoming data in XML format. Unfortunately, a Document Type Definition is quite a limited way of XML validation. So next time, we'll improve our defenses again, by incorporating XML Schemas.

Tuesday, September 7, 2010

XML validation using a DTD

One of the positive aspects of the Extensible Markup Language XML is that it is a flexible way to structure data in a human-readable format, in a cross-platform and technology-independent way. No wonder it is widely used as a way to exchange data between applications, and forms the foundation for XML-RPC, SOAP and other Web Service methods.

But it would be naive to think that every XML document that we get is not only well-formed, but also in the format that we expect it to be, with the right elements and attributes. In this post, we'll examine a strategy to validate incoming XML data in our revTalk application, using a Document Type Definition - a.k.a. DTD.

Part of the XML specification since the very start, a DTD describes the structure of the XML elements and attributes. For more information, I advise you to study the excellent introductory tutorials on W3Schools.com. We're here to use it from revTalk, so let's start by creating a new stack for the user interface.


Simply drop two scrolling fields onto it, name them "XML" and "DTD" respectively, and then group each of them separately so we can put a nice group label on top (I have the memory of a goldfish so I might forget which-is-which ;-) ) Finally drop a button at the bottom of the stack and set its name to "Validate" - and now we're ready to start scripting the button.

First things first, we need to parse the XML text into an XML tree to use all the rev XML commands and functions.

on mouseUp
-- Load DTD and XML into local variables
local tDtdText, tXmlText
put field "DTD" into tDtdText
put field "XML" into tXmlText
-- Parse the XML text into a Tree
local tXmlTree
put revCreateXmlTree(tXmlText, \
false, \ -- must be well-formed
true, \ -- create a tree in memory
false) \ -- no SAX parser messages
into tXmlTree
if tXmlTree is not an integer then
answer error \
"There is an error in the XML structure" & return & \
tXmlTree -- contains the full error message
else
-- Clean up resources
revDeleteXmlTree tXmlTree
end if
end mouseUp

We use the revCreateXmlTree function to parse the XML text into a tree structure. If the XML test is not well-formed then we report the error, otherwise we know we have a valid XML tree structure at our disposal - which we need to cleanup after we're done, using the revDeleteXmlTree command. Now that we have the basics covered, we can add the DTD validation to our script.

on mouseUp
-- Load DTD and XML into local variables
local tDtdText, tXmlText
put field "DTD" into tDtdText
put field "XML" into tXmlText
-- Parse the XML text into a Tree
local tXmlTree
put revCreateXmlTree(tXmlText, \
false, \ -- must be well-formed
true, \ -- create a tree in memory
false) \ -- no SAX parser messages
into tXmlTree
if tXmlTree is not an integer then
answer error \
"There is an error in the XML structure" & return & \
tXmlTree -- contains the full error message
else
-- Validate the XML against the DTD
local tValidationResult
put revXmlValidateDTD(tXmlTree, tDtdText) \
into tValidationResult
if tValidationResult is not empty then
answer error \
"XML structure does not conform to the DTD" & return & \
tValidationResult -- contains the full error message
else
answer information "The XML conforms to the DTD"
end if
-- Clean up resources
revDeleteXmlTree tXmlTree
end if
end mouseUp

If the XML conforms to the DTD, the revXmlValidateDtd function will return empty, otherwise its output contains the validation error. Pretty straightforward, so let's test this with a simple XML and DTD:


When we click the 'Validate' button, we get the message that the XML conforms to the DTD. Exactly what we were hoping for. Now let's change the XML somewhat to see if it fails when our XML text clearly does not conform to the DTD.


And here's the error message that we get on our screen:


With very little scripting, we have added a first layer of defense against incoming XML data that is not up to our specifications. Next time, we'll elaborate on this example and bolster our defenses.