Friday, December 31, 2010

XML validation using a Schema

In earlier posts, I examined how we can use the built-in revXML library in LiveCode to validate XML data against a DTD, later refining it with a version check to match evolving requirements. Unfortunately, a Document Type Definition is quite a limited way of XML validation. So this time, we'll improve our defenses again, by incorporating XML Schemas.

Whereas a DTD is limited to defining the basic structure of the XML in terms of elements and attributes, XML Schemas allow you to define validation on the actual content of the elements and attributes. So you can be sure that an element defined as "xs:date" is actually a valid date, or that an attribute defines as "xs:positiveInteger" is actually a positive integer, etc. A full explanation of XML Schemas is beyond the scope of this post, you'll find plenty of information around the web - a good first stop is this W3Schools XML Schema tutorial.

This all sounds very good, but here's the rub: the revXML library offers no built-in support for XML Schemas. So yet again we turn to Java, with its built-in XML Validation API. We can easily execute Java code using LiveCode's shell function - so let's start by writing the XmlValidateSchema class:
import java.io.File;
import java.io.IOException;

import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class XmlValidateSchema {
public static void main(String[] args) throws SAXException, ParserConfigurationException, IOException {
final File xmlFile = new File(args[0]);
final File xsdFile = new File(args[1]);
// Load the XML file
final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
final Document document = docBuilder.parse(xmlFile);
// Load the XSD file
final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final Schema schema = schemaFactory.newSchema(xsdFile);
final Validator validator = schema.newValidator();
// Validate the XML document against the XSL schema
final Source source = new DOMSource(document);
validator.validate(source);
}
}

In keeping with earlier Java examples, the code is a bit lazy when it comes to exception handling: if any exception is thrown, it will end up in the output of our shell function call. The only important thing to remember is that the first parameter is the XML file, and the second is the XML Schema Definition (XSD) file.

Let's go to LiveCode and create a new stack for the user interface.



As you can see, there's a field for the Schema text, a field for the XML text, and a button to Validate the XML against the Schema. Since it's perhaps a tad small, here's the content of the Schema field:
<?xml version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.quartam.com"
xmlns="http://www.quartam.com"
elementFormDefault="qualified">

<xs:element name="RootNode" type="RootNode"/>

<xs:complexType name="RootNode">
<xs:sequence>
<xs:element name="BranchNode" type="BranchNode" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="SpecVersion" type="xs:string"/>
</xs:complexType>

<xs:complexType name="BranchNode">
<xs:sequence>
<xs:element name="LeafNode" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

And here's the content of the XML field:
<?xml version="1.0"?>
<RootNode
xmlns="http://www.quartam.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.quartam.com schema.xsd"
SpecVersion="1.0">
<BranchNode>
<LeafNode>The first leaf node</LeafNode>
<LeafNode>The second leaf node</LeafNode>
<LeafNode>The third leaf node</LeafNode>
</BranchNode>
<BranchNode>
<LeafNode>The fourth leaf node</LeafNode>
<LeafNode>The fifth leaf node</LeafNode>
</BranchNode>
</RootNode>

After saving the stack, we copy the compiled XmlValidateSchema.class file into the same directory as the stack. Now we can write the script for the 'Validate' button:
on mouseUp
--> write Schema and XML to temporary files
local tSchemaFile, tXmlFile
put the tempName into tSchemaFile
put field "Schema" into URL ("file:" & tSchemaFile)
put the tempName into tXmlFile
put field "XML" into URL ("file:" & tXmlFile)
--> assemble the shell command
local tShellCommand
put "java XmlValidateSchema" && \
ShellPath(tXmlFile) && \
ShellPath(tSchemaFile) \
into tShellCommand
--> execute the shell command
local tHideConsoleWindows, tDefaultFolder, tShellResult
put the hideConsoleWindows into tHideConsoleWindows
set the hideConsoleWindows to true
put the defaultFolder into tDefaultFolder
set the defaultFolder to AbsolutePathFromStack()
put shell(tShellCommand) into tShellResult
set the defaultFolder to tDefaultFolder
set the hideConsoleWindows to tHideConsoleWindows
--> cleanup the temporary files
delete file tSchemaFile
delete file tXmlFile
if tShellResult is not empty then
answer error tShellResult
end if
end mouseUp

function AbsolutePathFromStack pFileName
local tAbsolutePath
put the effective filename of this stack into tAbsolutePath
set the itemDelimiter to slash
if pFileName is not empty then
put pFileName into item -1 of tAbsolutePath
else
delete item -1 of tAbsolutePath
end if
return tAbsolutePath
end AbsolutePathFromStack

function ShellPath pPath
if the platform is "Win32" then
put quote & pPath & quote into pPath
else
replace space with backslash & space in pPath
end if
return pPath
end ShellPath

This time around, we didn't have to fiddle with the Java classpath, as the XML Validation API is built-in. However, we had to write the XML and Schema to temporary files, to avoid length limitations in the shell command. If we now make a deliberate mistake, say change one of the 'LeafNode' elements into a 'BeafNode' element, we see this error:



Again, the image is a bit small, so here's the content of the error:
ERROR:  'cvc-complex-type.2.4.a: Invalid content was found starting with element 'BeafNode'. One of '{"http://www.quartam.com":LeafNode}' is expected.'
Exception in thread "main" org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'BeafNode'. One of '{"http://www.quartam.com":LeafNode}' is expected.
at com.sun.org.apache.xerces.internal.jaxp.validation.Util.toSAXParseException(Util.java:109)
at com.sun.org.apache.xerces.internal.jaxp.validation.ErrorHandlerAdaptor.error(ErrorHandlerAdaptor.java:104)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:382)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(XMLSchemaValidator.java:429)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.reportSchemaError(XMLSchemaValidator.java:3185)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStartElement(XMLSchemaValidator.java:1831)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.startElement(XMLSchemaValidator.java:705)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorHandlerImpl.startElement(ValidatorHandlerImpl.java:335)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.closeStartTag(ToXMLSAXHandler.java:205)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.characters(ToXMLSAXHandler.java:524)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.characters(ToXMLSAXHandler.java:467)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:229)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:121)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:85)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:615)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:661)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:300)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.process(ValidatorImpl.java:220)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:141)
at javax.xml.validation.Validator.validate(Validator.java:82)
at XmlValidateSchema.main(XmlValidateSchema.java:31)

Thanks to the combination of LiveCode and Java, we can develop cross-platform solution quickly, without having to give up the power of existing libraries. Unfortunately, loading Java every time for a shell call is not the optimal solution, so in another post, we'll investigate how we can run Java code using LiveCode's 'process' communication. Stay tuned...

No comments: