Friday, December 31, 2010

XML validation using a Schema

In earlier posts, I examined how we can use the built-in revXML library in LiveCode to validate XML data against a DTD, later refining it with a version check to match evolving requirements. Unfortunately, a Document Type Definition is quite a limited way of XML validation. So this time, we'll improve our defenses again, by incorporating XML Schemas.

Whereas a DTD is limited to defining the basic structure of the XML in terms of elements and attributes, XML Schemas allow you to define validation on the actual content of the elements and attributes. So you can be sure that an element defined as "xs:date" is actually a valid date, or that an attribute defines as "xs:positiveInteger" is actually a positive integer, etc. A full explanation of XML Schemas is beyond the scope of this post, you'll find plenty of information around the web - a good first stop is this W3Schools XML Schema tutorial.

This all sounds very good, but here's the rub: the revXML library offers no built-in support for XML Schemas. So yet again we turn to Java, with its built-in XML Validation API. We can easily execute Java code using LiveCode's shell function - so let's start by writing the XmlValidateSchema class:
import java.io.File;
import java.io.IOException;

import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class XmlValidateSchema {
public static void main(String[] args) throws SAXException, ParserConfigurationException, IOException {
final File xmlFile = new File(args[0]);
final File xsdFile = new File(args[1]);
// Load the XML file
final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
final Document document = docBuilder.parse(xmlFile);
// Load the XSD file
final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final Schema schema = schemaFactory.newSchema(xsdFile);
final Validator validator = schema.newValidator();
// Validate the XML document against the XSL schema
final Source source = new DOMSource(document);
validator.validate(source);
}
}

In keeping with earlier Java examples, the code is a bit lazy when it comes to exception handling: if any exception is thrown, it will end up in the output of our shell function call. The only important thing to remember is that the first parameter is the XML file, and the second is the XML Schema Definition (XSD) file.

Let's go to LiveCode and create a new stack for the user interface.



As you can see, there's a field for the Schema text, a field for the XML text, and a button to Validate the XML against the Schema. Since it's perhaps a tad small, here's the content of the Schema field:
<?xml version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.quartam.com"
xmlns="http://www.quartam.com"
elementFormDefault="qualified">

<xs:element name="RootNode" type="RootNode"/>

<xs:complexType name="RootNode">
<xs:sequence>
<xs:element name="BranchNode" type="BranchNode" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="SpecVersion" type="xs:string"/>
</xs:complexType>

<xs:complexType name="BranchNode">
<xs:sequence>
<xs:element name="LeafNode" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

And here's the content of the XML field:
<?xml version="1.0"?>
<RootNode
xmlns="http://www.quartam.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.quartam.com schema.xsd"
SpecVersion="1.0">
<BranchNode>
<LeafNode>The first leaf node</LeafNode>
<LeafNode>The second leaf node</LeafNode>
<LeafNode>The third leaf node</LeafNode>
</BranchNode>
<BranchNode>
<LeafNode>The fourth leaf node</LeafNode>
<LeafNode>The fifth leaf node</LeafNode>
</BranchNode>
</RootNode>

After saving the stack, we copy the compiled XmlValidateSchema.class file into the same directory as the stack. Now we can write the script for the 'Validate' button:
on mouseUp
--> write Schema and XML to temporary files
local tSchemaFile, tXmlFile
put the tempName into tSchemaFile
put field "Schema" into URL ("file:" & tSchemaFile)
put the tempName into tXmlFile
put field "XML" into URL ("file:" & tXmlFile)
--> assemble the shell command
local tShellCommand
put "java XmlValidateSchema" && \
ShellPath(tXmlFile) && \
ShellPath(tSchemaFile) \
into tShellCommand
--> execute the shell command
local tHideConsoleWindows, tDefaultFolder, tShellResult
put the hideConsoleWindows into tHideConsoleWindows
set the hideConsoleWindows to true
put the defaultFolder into tDefaultFolder
set the defaultFolder to AbsolutePathFromStack()
put shell(tShellCommand) into tShellResult
set the defaultFolder to tDefaultFolder
set the hideConsoleWindows to tHideConsoleWindows
--> cleanup the temporary files
delete file tSchemaFile
delete file tXmlFile
if tShellResult is not empty then
answer error tShellResult
end if
end mouseUp

function AbsolutePathFromStack pFileName
local tAbsolutePath
put the effective filename of this stack into tAbsolutePath
set the itemDelimiter to slash
if pFileName is not empty then
put pFileName into item -1 of tAbsolutePath
else
delete item -1 of tAbsolutePath
end if
return tAbsolutePath
end AbsolutePathFromStack

function ShellPath pPath
if the platform is "Win32" then
put quote & pPath & quote into pPath
else
replace space with backslash & space in pPath
end if
return pPath
end ShellPath

This time around, we didn't have to fiddle with the Java classpath, as the XML Validation API is built-in. However, we had to write the XML and Schema to temporary files, to avoid length limitations in the shell command. If we now make a deliberate mistake, say change one of the 'LeafNode' elements into a 'BeafNode' element, we see this error:



Again, the image is a bit small, so here's the content of the error:
ERROR:  'cvc-complex-type.2.4.a: Invalid content was found starting with element 'BeafNode'. One of '{"http://www.quartam.com":LeafNode}' is expected.'
Exception in thread "main" org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'BeafNode'. One of '{"http://www.quartam.com":LeafNode}' is expected.
at com.sun.org.apache.xerces.internal.jaxp.validation.Util.toSAXParseException(Util.java:109)
at com.sun.org.apache.xerces.internal.jaxp.validation.ErrorHandlerAdaptor.error(ErrorHandlerAdaptor.java:104)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:382)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(XMLSchemaValidator.java:429)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.reportSchemaError(XMLSchemaValidator.java:3185)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleStartElement(XMLSchemaValidator.java:1831)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.startElement(XMLSchemaValidator.java:705)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorHandlerImpl.startElement(ValidatorHandlerImpl.java:335)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.closeStartTag(ToXMLSAXHandler.java:205)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.characters(ToXMLSAXHandler.java:524)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.characters(ToXMLSAXHandler.java:467)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:229)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:121)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:85)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:615)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:661)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:300)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.process(ValidatorImpl.java:220)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:141)
at javax.xml.validation.Validator.validate(Validator.java:82)
at XmlValidateSchema.main(XmlValidateSchema.java:31)

Thanks to the combination of LiveCode and Java, we can develop cross-platform solution quickly, without having to give up the power of existing libraries. Unfortunately, loading Java every time for a shell call is not the optimal solution, so in another post, we'll investigate how we can run Java code using LiveCode's 'process' communication. Stay tuned...

Thursday, December 30, 2010

Stamping PDF files

In a previous post, I examined how we can use LiveCode and the Java-based iText library to concatenate a series of existing PDF files into a single PDF file. Now we will examine how we can 'stamp' a PDF file with an image using the same technique.

The first thing to code is the Java class that we will call using the shell function. Here's what I came up with:
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

import com.lowagie.text.DocumentException;
import com.lowagie.text.Image;
import com.lowagie.text.Rectangle;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfStamper;

public class StampPdfFile {
public static void main(String[] args) throws IOException, DocumentException {
final String inputFile = args[0];
final String outputFile = args[1];
final String imageFile = args[2];
final String[] coords = args[3].split(",");
final PdfReader inputReader = new PdfReader(inputFile);
final OutputStream outputStream = new FileOutputStream(outputFile);
final PdfStamper outputStamper = new PdfStamper(inputReader, outputStream);
final int pageCount = inputReader.getNumberOfPages();
final Image image = Image.getInstance(imageFile);
final int left = Integer.parseInt(coords[0]);
final int top = Integer.parseInt(coords[1]);
final int right = Integer.parseInt(coords[2]);
final int bottom = Integer.parseInt(coords[3]);
final int height = bottom - top;
final int width = right - left;
image.scaleToFit(width, height);
for (int pageIndex = 1; pageIndex <= pageCount; pageIndex++) {
final PdfContentByte overContent = outputStamper.getOverContent(pageIndex);
final Rectangle pageSize = inputReader.getPageSize(pageIndex);
image.setAbsolutePosition(left, pageSize.getHeight() - bottom);
overContent.addImage(image);
}
outputStamper.close();
}
}

In a nutshell, the first parameter is the input file, the second the output file, the third the image file, and the fourth parameter is a comma-separated list of coordinates making up the target rectangle. As usual, the code is a tad lazy when it comes to faulty input parameters and exception handling - if there's a mistake you'll simply get the stacktrace as the output of the shell command.

The most important bit is in the loop over the pages, where we use the outputStamper.getOverContent() method to draw our image on top of the existing content. If you'd rather have the image in the back, as a watermark, you would use the outputStamper.getUnderContent() methopd instead. Also note that setting the image position coordinate system works from the bottomLeft of the page, so we have to use the original page height and subtract the bottom coordinate from it.

Now we can proceed with writing a LiveCode button script:
on mouseUp
--> determine the input, output and image files
local tInputFile, tOutputFile, tImageFile
put ShellPath(AbsolutePathFromStack("demo1.pdf")) \
into tInputFile
put ShellPath(AbsolutePathFromStack("stamp.pdf")) \
into tOutputFile
put ShellPath(AbsolutePathFromStack("Template.png")) \
into tImageFile
--> determine the image target rectangle
local tImageRect
put quote & "10,10,103,87" & quote \
into tImageRect
--> determine the class path
local tClassPath
if the platform is "Win32" then
put ".;iText-2.1.7.jar" into tClassPath
else
put ".:iText-2.1.7.jar" into tClassPath
end if
--> assemble the shell command
local tShellCommand
put "java -classpath" && tClassPath && \
"StampPdfFile" && \
tInputFile && \
tOutputFile && \
tImageFile && \
tImageRect \
into tShellCommand
--> execute the shell command
local tHideConsoleWindows, tDefaultFolder, tShellResult
put the hideConsoleWindows into tHideConsoleWindows
set the hideConsoleWindows to true
put the defaultFolder into tDefaultFolder
set the defaultFolder to AbsolutePathFromStack()
put shell(tShellCommand) into tShellResult
set the defaultFolder to tDefaultFolder
set the hideConsoleWindows to tHideConsoleWindows
if tShellResult is not empty then
answer error tShellResult
end if
end mouseUp

function AbsolutePathFromStack pFileName
local tAbsolutePath
put the effective filename of this stack into tAbsolutePath
set the itemDelimiter to slash
if pFileName is not empty then
put pFileName into item -1 of tAbsolutePath
else
delete item -1 of tAbsolutePath
end if
return tAbsolutePath
end AbsolutePathFromStack

function ShellPath pPath
if the platform is "Win32" then
put quote & pPath & quote into pPath
else
replace space with backslash & space in pPath
end if
return pPath
end ShellPath

Click the button, and it happily takes the existing PDF files (demo1.pdf), paints the image (Template.png) on top of all pages, and writes a new PDF file (stamp.pdf) in the same folder as our stack. There we have it, another example of using iText from within LiveCode.

Wednesday, December 29, 2010

Concatenating PDF files

Since the advent of LiveCode 4.5, developers have the ability to 'print' stack content directly to PDF files. And if you need pin-point control over what goes where, you can use Quartam PDF Library to generate PDF files from scripts. That's great if you are in full control of the content, but what if you need to work with existing PDF files? In the next few posts, we will examine how you can tap into the power of the Java-based iText library from LiveCode.
So let's start by downloading a copy of iText version 2.1.7 - do not use version 5.x as the API changed and the following example code won't work.

The first question is: how can we execute Java code from LiveCode? The simplest solution is the shell function: it allows you to execute DOS or Unix commands, as if you typed them in from the command line. Note that on Windows, using this function will show a DOS window, but you can control that by setting the hideConsoleWindows property before calling the shell function.
You can test it out by simply executing the following line from the message box:
  answer shell("java -version")

The second question is: what sort of Java code do we need to write? Well, I fired up a copy of Eclipse, started a new project, and created a new class 'ConcatPdfFiles' in the default package. Then I grabbed my paper copy of iText in action (first edition) and flipped to page 64 as this contains the examples for concatenating PDF files. A little bit of thinking, and I derived the following code:
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfReader;

public class ConcatPdfFiles {
public static void main(String[] args) throws DocumentException, IOException {
final String outputFilePath = args[0];
final OutputStream outputStream = new FileOutputStream(outputFilePath);
final Document outputDocument = new Document();
final PdfCopy outputCopy = new PdfCopy(outputDocument, outputStream);
outputDocument.open();
for (int i = 1; i < args.length; i++) {
final PdfReader inputPdfReader = new PdfReader(args[i]);
final int pageCount = inputPdfReader.getNumberOfPages();
for (int pageIndex = 0; pageIndex < pageCount; pageIndex++) {
outputCopy.addPage(outputCopy.getImportedPage(inputPdfReader, pageIndex + 1));
}
}
outputDocument.close();
}
}

As you can see, the code is a bit lazy when it comes to exception handling: I just let the exceptions get thrown, and this will be the output of our shell call if something goes wrong. Note also that the first argument is the output file, followed by the input files that you want to concatenate into the output file.

More importantly, at this point in time, the code doesn't compile. The problem is, we haven't yet told Eclipse where that iText-2.1.7.jar library file is, so compilation fails. This is sometimes referred to as 'classpath hell' - you have to give Java a list of paths where it can find the necessary additional libraries, not just at compile time but also at runtime as we'll see later.
Because I like to keep everything together in my Java projects, I added a new 'lib' folder to my project, and copied the iText2.1.7.jar file into it. At that point, you can use the contextual menu on the iText.2.1.7.jar file, and add it to the Build Path. Now the code I showed earlier compiles just fine, and we can proceed to the next stage.

The third question is: how do we put everything together in LiveCode? We'll begin by putting all the necessary parts into a single folder: the iText-2.1.7.jar library file, the ConcatPdfFiles.class compiled file and two example PDF files (demo1.pdf and demo2.pdf). Then we fire up LiveCode, create a new stack 'ConcatPdfFiles' and save it in the same folder as the other files, naming it "ConcatPdfFiles.liveCode'. Now we can drop a button onto the stack and start scripting.

Now we need to determine the correct command to be executed by the shell function. It should look something like:
java -classpath <class-path> ConcatPdfFiles <output-file> <input-file-1> <input-file-2> ...

The java executable needs the correct classpath, and we need to pass in compatible file paths.

Let's start with the classpath. This is a list of places that java needs to look for its .class files - as separate files in folders, or stored together in a .jar file. And for extra fun, the separator character is a colon on Unix-based platforms, and a semicolon on Windows. You can have relative paths in this classpath, and '.' (period) is short for the current directory. So rather than building a long class path, we can circumvent the issue by setting the defaultFolder property to change the working directory before calling the shell function. Then our classpath can be as short as:
.:iText-2.1.7.jar
on MacOS X/Linux and
.;iText-2.1.7.jar
on Windows.

The next bit is compatible file paths. The good news: LiveCode uses a '/' (slash) as separator, regardless of the underlying platform, and Java is more than happy to accept '/' in a path, even when it's running on Windows. However, if there are spaces in the path, we need to save them by putting quotes around the path on Windows, and escaping the spaces with a backslash on Unix-based platforms.
And to determine the paths relative to the stack's location on your hard disk, we'll need a helper function that uses the effective filename property of our stack.

So finally, we have a button script as follows:
on mouseUp
--> determine the input and output files
local tInputFiles, tOutputFile
put ShellPath(AbsolutePathFromStack("demo1.pdf")) && \
ShellPath(AbsolutePathFromStack("demo2.pdf")) \
into tInputFiles
put ShellPath(AbsolutePathFromStack("output.pdf")) \
into tOutputFile
--> determine the class path
local tClassPath
if the platform is "Win32" then
put ".;iText-2.1.7.jar" into tClassPath
else
put ".:iText-2.1.7.jar" into tClassPath
end if
--> assemble the shell command
local tShellCommand
put "java -classpath" && tClassPath && \
"ConcatPdfFiles" && \
tOutputFile && tInputFiles \
into tShellCommand
--> execute the shell command
local tHideConsoleWindows, tDefaultFolder, tShellResult
put the hideConsoleWindows into tHideConsoleWindows
set the hideConsoleWindows to true
put the defaultFolder into tDefaultFolder
set the defaultFolder to AbsolutePathFromStack()
put shell(tShellCommand) into tShellResult
set the defaultFolder to tDefaultFolder
set the hideConsoleWindows to tHideConsoleWindows
if tShellResult is not empty then
answer error tShellResult
end if
end mouseUp

function AbsolutePathFromStack pFileName
local tAbsolutePath
put the effective filename of this stack into tAbsolutePath
set the itemDelimiter to slash
if pFileName is not empty then
put pFileName into item -1 of tAbsolutePath
else
delete item -1 of tAbsolutePath
end if
return tAbsolutePath
end AbsolutePathFromStack

function ShellPath pPath
if the platform is "Win32" then
put quote & pPath & quote into pPath
else
replace space with backslash & space in pPath
end if
return pPath
end ShellPath

Click the button, and it happily concatenates the two PDF files (demo1.pdf and demo2.pdf) into a single PDF file (output.pdf) in the same folder as our stack. There we have it, our first use of iText from within LiveCode.