Wednesday, November 26, 2008

Future-proofing your Revolution scripts

While I'm still not giving out details on the new features or release date for Quartam PDF Library 1.1, I would like to share some insights that I gained as part of my efforts on producing this new version. Back in September, the team of Runtime Revolution shipped version 3.0 of its cross-platform development tool Revolution.

Overall, Revolution 3.0 is an impressive release, with a revamped script editor, gradients, nested arrays and much-improved out-of-the-box experience thanks to the Start Center and Resource Center. One of the items that was also introduced, but not mentioned in the press release, was the new 'byte' chunk type.

If you're wondering what it's good for or why you would want to use it instead of the good old 'character' chunk type, then allow me to explain the underlying reason for its introduction. Back in the HyperCard days, not much attention was given to languages that didn't fall neatly in the ASCII-domain, based on the English alphabet, where one character takes up a single byte and thus there are 256 possible characters (a number of which are control characters that can't even be displayed on screen).

Naturally, this isn't quite enough room for a couple of thousand Chinese or Japanese characters, and that's where Unicode comes into play. This is a set of standards that govern how this much larger number of characters can be stored and should be interpreted. The UTF-8 standard still relies on single bytes as its main storage system, but uses special bit-settings to determine if the next character is a single byte or more than one byte.

My other favorite cross-platform technology Java embraced Unicode from the very first day, which made perfect sense given its goal of "write once, run anywhere" global software. And to make this easier, if less memory-efficient for those circumstances where you only have to deal with the ASCII characters, it uses UTF-16 everywhere.

When Revolution 2.0 saw the daylight, it introduced support for Unicode text entry and manipulation. Rather than jumping on the UTF-16 bandwagon, the engineers decided to implement a more flexible system that would use plain-old ASCII for everyday operations, while allowing the developer to use UTF-16 only when necessary.

While the implementation could still use some love and care, Revolution's technology director Mark Waddingham has big plans for it and wants to evolve to a system that is far less painful than anything else out there, but as resource-efficient as the current implementation. Which means that in the future, Revolution developers will be able to use the 'character', 'word', 'item' and 'line' chunks without having to care about the underlying encoding - the engine will know what constitutes a word or a sentence in the language of that piece of text and will take care of collation and all the gruesome details of Unicode.

Traditionally, the one-byte-is-one-character paradigm of xCard meant that developers could use the 'character' keyword, along with the 'charToNum' and 'numToChar' functions, to manipulate data at the byte-level. But with this equivalence fading away at some point in the future, the new 'byte' chunk type is our saviour!

Quartam PDF Library reads PNG and JPEG image files as byte streams, extracting information about the image height and width, bit depth and other binary-encoded information. To ensure that this and other binary-reading code will keep working correctly in the future, I've decided to make the necessary changes in version 1.1 to use the 'byte' chunk type rather than the 'character' chunk type.

While this means that the next version will require developers have Revolution 3.0 or higher, I feel this is a fair trade-off. In my experience, the new Revolution version performs well with less problems and quirks than ever. And if I compare my glacial speed of development with their feature-packed release momentum, I'm sure they will be shipping their next version before I do ;-)

Just in case you'd like to know how I updated and thus future-proofed my code, here are some pointers:
- replace 'open file theFilePath for read' with 'open file theFilePath for binary read'
- replace 'open file theFilePath for write' with 'open file theFilePath for binary write'
- replace 'length(theBinaryVariable)' / 'the number of chars in theBinaryVariable' with 'the number of bytes in theBinaryVariable'
- replace 'char x to y of theBinaryVariable' with 'byte x to y of theBinaryVariable'
- replace 'charToNum(char x of theBinaryVariable)' with 'byteToNum(byte x of theBinaryVariable)'
- replace 'numToChar(theNumber)' with 'numToByte(theNumber)'

That's it - no other changes required! The modification was straightforward and consistent, and everything works just fine. I know that my code will continue to work in future versions, and I can enjoy the same linear speed that I have now with the 'character' chunk type once that part is overhauled for Unicode embracing.

If only all requested or necessary changes were so easy to implement...

Tuesday, November 11, 2008

An Apparition

If you have been wondering why it's been so quiet here lately, the answer is simple: a lot of work piled up, there was release pressure and the band that I sing in had its first gig - which ate up a lot of hours and energy. But things are slowly getting back to normal, and I'll have two weeks off from the day-job very soon - time which I'm going to spend relaxing and catching up on items that languished while I was busy. This blog is one of those bits that deserve attention. So let me start by finishing a post that I started to write back in August: 'The Programmer Syndrome'