An open-source, federated content repository

Speedy unit testing with Jackrabbit

We’re using Apache Jackrabbit for one of the JCR implementations in our unit tests. Configuring Jackrabbit isn’t intuitive at first (like many libraries, it’s highly configurable and thus non-trivial to configure), so the trick for us was figuring out how we wanted to use it in our unit tests.

One of the more important qualities of a unit test is that its fast. We do a lot of unit testing, and so we run unit tests very frequently. Change, compile, run tests. Repeat. Repeat again, and again. So the slower the tests take to run, the more they interrupt this process and your train of thought. (More on our testing philosophy and techniques in a future post.)

So we’ve found that the easiest way to speed up Jackrabbit is to use the in-memory persistence manager and the in-memory file system implementations. Here’s a snippet of the XML configuration showing the in-memory file system for the “/repository” branch:

<filesystem class="org.apache.jackrabbit.core.fs.mem.MemoryFileSystem">
  <param name="path" value="${rep.home}/repository"/>

and here’s a snippet showing the XML configuration for the in-memory persistence manager:

<PersistenceManager class="org.apache.jackrabbit.core.persistence.mem.InMemPersistenceManager">
  <param name="persistent" value="false"/>

Remember, there are two persistence managers and three file system managers in the normal configuration, so make sure to change all of them.

Then in your test code, create an instance of the TransientRepository class by passing in the location of your configuration file and the location of the directory used for the repository data. We’re using Maven 2, so our configuration file goes in “./src/test/resources/” while we use “./target/testdata/jackrabbittest/repository” for the test data directory.

We’re also using JUnit (version 4.4), so one decision we had to make was whether to set up the repository in a @Before method and tear it down in an @After method. This makes all the tests easy to write, but it also means that the repository is set up and torn down for every test case. That means slower than necessary. And since I like to have a single test class for each class, my test cases often have a mixture of test methods that need a repository and test methods that don’t.

The pattern we’ve settled on is to create an abstract base class that sets up the repository in a “startRepository()” method, and in the @After method automatically tear it down if needed. That means in our unit test case classes that use Jackrabbit, simply extend the base class, and call “startRepository()” in those test methods that need the repository. Test methods that don’t need a repository don’t take the time to set it up. Plus, I personally like that this explicit call makes it more obvious which test needs the repository.

There’s one final twist. The TransientRepository cleans itself up when the last session is closed (not when the instance is garbage collected). Since some tests try saving saving changes in a session, closing the session and opening a new one can make all this data go away. To fix this, our “startRepository()” method creates a “keep alive” session, and our @After tear down method closes the session if it’s there.

Here’s the basics of our abstract base class:

private static Repository repository;private Session keepAliveSession;
    public static void beforeAll() throws Exception {
        // Clean up the test data ...

       // Set up the transient repository (this shouldn't do anything yet)...
       repository = new TransientRepository(REPOSITORY_CONFIG_PATH,REPOSITORY_DIRECTORY_PATH);

public static void afterAll() throws Exception {
    try {
        JackrabbitRepository jackrabbit = (JackrabbitRepository)repository;
    } finally {
        // Clean up the test data ...

public void startRepository() throws Exception {
    if (keepAliveSession == null) {
        keepAliveSession = repository.login();

public void shutdownRepository() throws Exception {
    if (keepAliveSession != null) {
        try {
        } finally {
            keepAliveSession = null;

So setting up unit tests is a piece of cake, and they run very quickly. Now we’re getting somewhere.


Filed under: techniques, testing, tools

Using JIRA and Eclipse

The JBoss DNA project is using JIRA for its issue and task management system. I’ve not used JIRA before working on JBoss DNA, but I definitely like what I’ve seen so far. And, like many other Java developers, I’m using Eclipse as my development environment. In fact, I’ve used it for years and have even developed quite a few plug-ins during my earlier days at MetaMatrix and Revelytix. Needless to say that Eclipse works for me.

So it was only recently that I’ve had the opportunity to really try out the Mylyn task-oriented tools that are included in the 3.3 (Europa) release of Eclipse. After just a few days, I have to admit that I’m impressed.

If you’re already working with some task management system, then right off the bat Mylyn does the obvious: it allows you to view, edit and create new JIRA, Trac, or Bugzilla tasks directly from within Eclipse. The Mylyn forms are all customized to your particular task repository, making them very usable. You can also create lists of tasks (e.g., “My open tasks” or “All open issues”) that are kept in sync with the repository. Sure there are some things that Mylyn doesn’t do that the web-based JIRA interface does much better. But some things are just as easily done in Eclipse – like reviewing an issue, adding a comment, or even resolving an issue.

But the real gem of Mylyn is that you can focus your environment on the task at hand. Just activate the task you want to work on, and Mylyn tracks what source files you read, change, and create. Mylyn uses this information (what it calls the task’s “context”) to do useful things that make your life much easier.

First of all, Mylyn adds a mode to the “Package Explorer” view that shows only those resources that are part of the current task’s context. As you add or edit or create files, they are automatically added to this context. When you switch to a different task, Mylyn switches the view to show that task’s context. I find myself switching tasks frequently, so this feature has been invaluable.

Second, Mylyn integrates with your team plugin to automatically track the changes you’ve made as a change set. The name of the change set is based upon the task name, so it’s easy to keep track of uncommitted changes. Also, when you commit the change set, the comment is prepopulated with the task information so that you’re changes in the SCM system can be linked back to the JIRA, Trac or Bugzilla task.

In short, Mylyn make me feel a little more productive, and that I have a better handle on the things I have to get done. Gotta like that.

Filed under: tools

JBoss DNA presentation

I’ve uploaded a presentation that gives an introduction to the JBoss DNA project. There’s also a link on the JBoss DNA documents page.

Filed under: uncategorized

Introducing JBoss DNA

The JBoss DNA project is building an enterprise repository to capture, version, manage and understand the numerous kinds of metadata used in software systems.

Why an enterprise repository? Plain and simple: there’s so much information and metadata going into software systems that it’s difficult to get a handle on exactly it and to understand what it means for the system. What components, services, schemas, data sources, policies, and subsystems do I have? What are the relationships between them? How does my production system differ from my development and test environment? What’s the impact of a proposed change? How has the system changed over time? What do I need to know to manage and govern the system?

An enterprise repository’s job is to help answer these questions by managing the system’s metadata and making it useful. This metadata takes a wide variety of forms, including data models, service definitions, policies, schemas, messages, source code, data sources, configuration information, deployment information, just to name a few. These kinds of things are what repository folks call “artifacts” – the things that are to be managed.

But managing these artifacts is only part of the problem of understanding the information. JBoss DNA’s approach is to not only manage them, but in effect to sequence and catalog the information’s DNA. So as artifacts are added to the repository, JBoss DNA automatically looks inside them, discovers and extracts the fundamental building blocks of the information, and places that information into the repository. Over time, the repository not only contains the artifacts, but also the artifact metadata and web of integrated and interrelated information contained by the artifacts.

Another goal it to make JBoss DNA ready for enterprise use. We’re building JBoss DNA to be transactional, clusterable, and scalable (and luckily JBoss already has some great enterprise-class ingredients to build with). Large organizations also need to federate multiple repositories. Plus, there’s also a ton of very useful information in existing data sources, applications, services, and other kinds of repositories, so JBoss DNA will also provide a way to integrate all this live information into the JBoss DNA repository without copying it. Finally, we’re going to be building the tools and services that help make this information useful.

We’re building JBoss DNA on top of JCR, which provides an excellent graph-based approach for working with metadata in highly extensible, dynamic, and flexible ways. We’re designing our architecture such that our components either work along side a JCR implementation or sit entirely on top of a JCR implementation, which for us is currently Apache’s Jackrabbit.

Of course, we still have a lot of work to do. Stay tuned for news and updates on the project. I’m planning to cover a lot the features and architecture in upcoming posts. Until then, check out our project page or the project wiki for more details. If you’re interested in participating, post in our discussion forum or drop me an email.

Filed under: jcr, news, repository

ModeShape is

a lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc.

Use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

ModeShape used to be 'JBoss DNA'. It's the same project, same community, same license, and same software.