We’ve been working on the 0.4 release of JBoss DNA for quite some time – longer than I had hoped. Unfortunately, we’re not quite ready yet, since we still have work to do on the primary feature for the release (write to the repository through the JCR API). I guess the holidays and vacations really did a number on our schedule.
But we thought you might like a preview of some new features that will be appearing in the 0.4 release.
File System Connector
We’ve been working on several DNA connectors, including one that allows a repository to access and expose files and directories on a file system. Why would you want to do this? Well, usually it is because you have files on the file system that are needed for other applications, but you want your application to access them through JCR. After all, if your application already uses JCR for some of its data, wouldn’t it be nice to get to all of the data it needs through JCR?
One solution would be to copy the files into a JCR repository, but, well, you then have a copy. And duplicates can wreak havoc when the data changes.
Another solution is to move the files into a JCR repository, and have the other apps get to them through WebDAV. Possible, but it’s more complex (you may need a WebDAV server), and not every application will support WebDAV. Plus, then the files aren’t where other applications might be looking for them.
This is where the new connector comes in. Simply leave the files where they are, but set up the file system connector to expose those files and directories as if they were in a JCR repository. Or, combine this connector with the federating connector, and project the files and directories into a JCR repository that also has content from other places. Either way, it’s simple and works really well.
The connector itself is remarkably simple – it simply takes one or more starting directories, and exposes them as “nt:folder” nodes. The content of those directories is then exposed as “nt:folder” and “nt:file” child nodes, and so on. About the only complicated thing about the connector is related to how the standard JCR node types break a file into two pieces, separating the hierarchy information from the content. The “nt:file” node is named according to the name of the file on the file system, and it holds the file’s metadata (e..g, when the file was created, the name, etc.). This “nt:file” node then contains a child node named “jcr:content” (of type “nt:resource”) on which is stored the actual contents of the file plus some content-related metadata (e.g., the MIME type, last modification date, etc.). For more information, see the JCR specification.
Relational Persistence Connector
So far, most of the connectors we’ve written have exposed the content of external systems as a graph, so the structure of the graph reflects the information model of that system. Such connectors generally can’t store any kind of graph structure, since they’re restricted to the kind of information that the external system can store.
Our new relational persistence connector, however, is able to store graphs with any structure. It doesn’t matter whether it uses a relational database that’s embedded or on a (remote) database server and shared by multiple connectors. The connector uses JPA and the Hibernate library, which helps isolate the connector from many of the peculiarities in different database and driver implementations, and which brings compatibility with many different database management systems plus a variety of flat file formats.
We’ve also tried to make this connector work well with a wide variety of graph topologies, and this greatly influenced the relational schema. The relationships between parent and child nodes are stored as individual relational records, meaning that navigating a tree is very fast while making adding or removing a node largely independent1 of the number of existing siblings. Also, deleting subgraphs is also quite efficient, with the number of operations related to the number of levels in the subgraph rather than the number of nodes in the subgraph. Properties are also stored in a compressed manner (since the database is not used for searching and querying), reducing space and time to persist or materialize. Finally, the connector stores large property values2 in a central area keyed by the SHA-1 hash3 of their contents, meaning that if the same file is uploaded multiple times, it is only stored the first time and reused all other times. This saves space and the amount of work that has to be done.
I’ve barely scratched the surface on the capabilities of this new connector, but it is fast and very capable. We’re excited about this, and hope you are, too.
Read and Writable JCR
The ability to read and write content through DNA’s JCR implementation is perhaps one of the most important features of this release – important enough to warrant its own post. So stay tuned for more information.
The upcoming 0.4 release also has a number of miscellaneous improvements. But a few of these are worth small mentions:
- DNA has an internal Graph API that is serves as the foundation for the connector API, and on top of which sits our JCR implementation. The Graph API is an internal DSL (aka fluent API) that defines a very simple vocabulary of atomic operations for working with graphs. The API doesn’t require or use sessions, supports batching large numbers of operations, and uses mostly immutable data structures that simplify concurrent applications and object reuse. This release will include some really good improvements to make the API more usable, easier to implement connectors, and faster at runtime.
- We’re making good progress on the Subversion connector, which works similarly to the file system connector, except that it exposes the files and folders in a SVN repository.
- We’re also making some good progress on a JDBC metadata connector. With this, the connector will access a database using JDBC, and project the database metadata into the repository.
- Reusable unit tests for connectors, making it easier to make sure your connector behaves correctly and according to the API. We’ll probably be adding more tests as we go, but right now it’s pretty easy to reuse: simply extend one of the unit test classes, and override a method to set up your connector. That’s it. Your tests inherit a slew of unit test methods that operate against your connector.
I hope you found this preview of upcoming features useful and informative. And, I hope you’re as interested as ever in DNA. Stay tuned for more information about the 0.4 release.
1 Well, largely independent. The connector does need to figure out the correct same-name-sibling index. But a single query is used to do this, so it is quite fast.
2 The connector can be configured to define how large a property must be before it is stored by its SHA-1 hash.
3 The SHA-1 secure hash is used not for any security or encryption purpose, but rather because it produces hashes that are repeatable, well-distributed, and for all practical purposes unlikely to have collisions.