Variables in documents
July 27, 2007
Announcing a new Daisy feature…
Background (skip this if you know Daisy)
Various tools are available for documentation authoring and publishing. One set of them are tools oriented towards technical writers such as Docbook, DITA, and lots of others of which I’m mostly ignorant.
Daisy offers an interesting alternative by providing an online, easy-as-a-wiki tool, yet with a focus on structured markup. It’s backed by a repository allowing for searching, browsing, access control, etc. On the publishing side, features like recursive document includes, embedded queries, and a ‘book’ publication engine are available. Some of the features of the book publication engine are TOC generation, lists of figures and tables, (print) indexes, footnotes, shifting of headers with respect to their position in the book, various header numbering possibilities, complete control over styling through XSLT, and chunked HTML output where the chunks are unrelated to the orginal document boundaries.
Variables
Daisy has now given birth to a new feature especially useful for documentation applications: variables in documents. A common use-case for variables is to avoid hardcoding product names in documentation, for example for when the product is marketed under different names.
The actual variable resolution is configured at the site level (a “site” in Daisy is a website providing a particular view on the repository, there can be many of them). So depending on the site through which you view the document, you will see the same text with the variables resolved to different values. As you can expect from XML fairies like us, variable values can contain mixed content (mixed text & tags). And of course, the variables work in the book publishing too.

Variable-challenges
Besides the fluffy feature announcement, it is interesting to look at a particular difficulty of document variables.
Putting variables in documents basically turns the document into simple templates. The documents are not displayed as-is, but they can generate different content depending on the context. To some extent this difference between “stored document” and “displayed webpage” already exists, due to features such as includes, information aggregation, etc. The variables are a different class though since are really part of the document text.
When a reader sees a particular word on a page, and enters it in the search box, then the expected behaviour is to see that page turn up in the search results.
(sidenote: authors on the other hand might want to search where a particular variable is used. This special use-case could be served by maintaining a dedicated variable-usage index, much like the document-links index.)
As a solution, one approach might be to recognize that there’s just too much difference between documents stored in the repository and pages published on a website, and that therefore we need a separate index of the published pages. While this simplifies things conceptually, it will make the system setup more involved, and still has the disadvantage that the index will need rebuilding as soon as e.g. a variable changes its value (one could track the dependencies of each page to know when it needs to be reindexed).
Another thought is to first perform the search on the variable values to find which variables match, and then add the corresponding variable names to the search condition. A nice idea on first sight, but when you think about it, the actual implementation of such thing, working in a completely transparent way, is not simple either.
For now I have not treated this problem at all, since [other work is waiting, and] the current use-case for variables is mainly in the documentation area, where things like the book publication (or the maven plugin) will be used for generating publications.
Open Source CMS Award nominations
July 20, 2007
The Packt Open Source Content Management System Award is collecting nominations.
If you like Daisy, please take a moment to submit your nomination.
For the Daisy website URL you can enter http://www.daisycms.org/
Thanks!
Daisy Runtime
July 16, 2007
During the past three weeks, I’ve been – unexpectedly – able to spend some time on changing the runtime environment of the repository server.
Previously we used something called Avalon Merlin, a now-dead project (basically it’s an IoC component container). Merlin has introduced some good practices in the early Daisy days, but it has been sitting in the way for quite some time now, and it was a relief to finally do something about it.
History
For bigger software systems, it is no secret that to keep things manageable, one should split the system into smaller subsystems, and let those subsystems communicate by well-defined interfaces. Divide and conquer! Separate the concerns! Modularize! Encapsulate!
This separation happens on various levels of granularity. For example, in Daisy we have a strict separation between the content repository server and the frontend application: they are separate applications which communicate via an HTTP+XML interface.
On the next level downwards, the repository server itself is divided into several big parts:
- supporting services such as a datasource, JMS system, configuration, etc.
- the core repository itself (storage, querying, access control, …)
- various optional plugins: ldap & ntlm authentication, image thumbnailing & exif extraction, email notifications, …
One can work on each of these parts without having to know the internals of the others. So less stuff to know about, and to keep in ones head at the same time.
However, under time pressure it’s often tempting to bypass the official APIs: hmm, something’s not possible here, and rather than spending effort thinking about how to fix it well, let’s just for once (very consciously of course) hack around it by using some non-API methods. Unless this is made impossible. And making this impossible is basically what I’ve learned from using a Merlin-Maven combination.
How did Maven and Merlin do this? Simple:
- Maven makes it easy to define multiple projects and build them in one go. For all subsystems, make separate API and implementation projects, thus leading to separate API and implementation jars. When one subsystem makes use of another one, only add its API jar to the dependencies. This will make it impossible at compile time to circumvent the API.
- Merlin does the equivalent at runtime: it allows to define multiple component containers (we use one per subsystem), each one has its own classloader. Only API jars (and shared library code) is added to their common parent classloader. Also, each Merlin sub-container only makes explicitly exported services of its components visible to the other containers.
Unrelated to the separation stuff, another nice idea from Merlin is that classloaders are defined by means of artifacts loaded from a Maven-style repository. This has the nice side-effect that after performing the build using Maven, you are immediately able to launch the repository server, no additional copying or assembling needed: Merlin loads the code from your local Maven repository.
Daisy Runtime
And so we finally arrive at the the original topic: the new Daisy Runtime. As basis we have took the popular Spring bean container. The Daisy Runtime basically manages a set of independent Spring containers, each Spring container has its own (Maven-style-repository-based) classloader and selected services can be shared between the containers. The Daisy Runtime isn’t a complex system, it’s only a tiny amount of code. A more detailed description is in the documentation, I won’t repeat that here.
Spring-OSGI
You might wonder if we haven’t recreated Spring-OSGI. In a way, yes, you could think of the Daisy Runtime as a Spring-OSGI-very-light. However, for us making the move to Spring-OSGI right now would have been a too-big step to take in one time (it would have taken more time, and would have introduced more changes). But if we want more advanced features, like dynamic bundle reloading, we’ll certainly look into that.
Conclusion
So what’s in it for Daisy users? A lot:
- it has become way easier to write plugins for the repository server
- a barrier for people looking at the Daisy sources has been removed
- and while at it, I’ve also looked at significantly simplifying the Daisy source build and development setup environment
Best of all, non-developer users won’t notice this change. There’s no special action to take when upgrading.
The first release to incorporate this change will be 2.1, which we plan to release early August.
GSoC Guy
July 4, 2007
Last week our Google Summer of Code student, Guy, started working on his project. Since he’s a student from our little country, he decided to join us in our office.

The project he’s working on is the development of a visual HTML diffing (comparing) engine, thus showing the rendered HTML document with deleted and inserted text highlighted. Such an HTML diff is an interesting feature for a CMS that aims to be usable by non-technical people.
We are not aware of any existing decent open-source project doing this, let alone in Java and with an appropriate license. There’s also not something like a standard or ideal solution to this problem (AFAIK), so it’s a bit of research project really. If anyone has interesting information on the subject (like papers or similar OSS projects), you’re welcome to drop a comment.
As a first step, Guy did some promising work by creating an alternative, non-line based text diff.
No Daisy?
July 3, 2007
Someone remarked that my blog would better be implemented using Daisy. I agree, and I should set an example to others, however:
- I wanted to have some usage experience with typical blogging software, using it for real is the best way to get this.
- Time. There’s no ready-made blogging solution built on Daisy. Designing some document types and layouts isn’t a big deal, but I’m rather busy and didn’t want to postpone the blog for this reason.
I doubted for a while though. After all, concerning the first point, there probably won’t be big surprises anyway, and the second point is only a short-term advantage. But some action had to be taken.
PS: unlike other CMS’s out there, Daisy does use itself for its own website, documentation, wiki and knowledge base.
1
July 2, 2007
Welcome to this little blog which will be about the Daisy CMS. At worst, I’ll announce new releases here, but I’ll do an effort to keep you posted about what’s happening in Daisy-land.
