ModeShape

An open-source, federated content repository

New repository backup and restore in ModeShape 3

We recently added a new feature to ModeShape 3.0.0.Beta3 that enables repository administrators to create backups of an entire repository (even when the repository is in use), and to then restore a repository to the state reflected by a particular backup. This works regardless of where the repository content is persisted.

There are several reasons why you might want to restore a repository to a previous state, and many are quite obvious. For example, the application or the process it’s running in might stop unexpectedly. Or perhaps the hardware on which the process is running might fail. Or perhaps the persistent store might have a catastrophic failure (although surely you’re also using the persistent store’s backup system, too).

But there are also non-failure related reasons. Backups of a running repository can be used to transfer the content to a new repository that is perhaps hosted in a different location. It might be possible to manually transfer the persisted content (e.g., in a database or on the file system), but the process of doing so varies with different kinds of persistence options.  Also, ModeShape can be configured to use a distributed in-memory data grid that already maintains its own copies for ensuring high availability, and therefore the data grid might not persist anything to disk. In such cases, the content is stored on the data grid’s virtual heap, and getting access to it without ModeShape may be quite difficult. Or, you may initially configure your repository to use a particular persistence approach that suitable given the current needs, but over time the repository grows and you want to move to a different, more scalable (but perhaps more complex) persistence approach. Finally, the backup and restore feature can be used to migrate to a new major version of ModeShape.

In short, you may very well have the need to set the contents of a repository back to an earlier state. ModeShape’s backup and restore feature makes this easy to do.

Getting started

Let’s walk through the basic process of creating a backup of an existing repository and then restoring the repository. Both of these steps require an authenticated Session that has administrative privileges. It actually doesn’t matter which workspace the session uses:

javax.jcr.Repository repository = ...
javax.jcr.Credentials credentials = ...
String workspaceName = ...
javax.jcr.Session session = repository.login(credentials,workspaceName);

So far, this is basic and standard stuff for any JCR client.

Introducing the RepositoryManager

Each JCR Session instance has it’s own Workspace object that provides workspace-level functionality and access to a set of “manager” interfaces: the VersionManagerNodeTypeManagerObservationManagerLockManager, etc. The JSR-333 (aka, “JCR 2.1”) effort is still incomplete, but has plans to introduce a RepositoryManager that offers some repository-level functionality. The ModeShape public API has created such an interface, and accessing it from a standard JCR Session instance is pretty simple:

org.modeshape.jcr.api.Session msSession = (org.modeshape.jcr.api.Session)session;
org.modeshape.jcr.api.RepositoryManager repoMgr = ((org.modeshape.jcr.api.Session)session).getWorkspace().getRepositoryManager();

The interface is pretty self-explanatory, and defines several methods including two that are related to the backup and restore feature:

public interface RepositoryManager {

    ...

    /**
     * Begin a backup operation of the entire repository, writing the files
     * associated with the backup to the specified directory on the local
     * file system.
     *
     * The repository must be active when this operation is invoked, and
     * it can continue to be used during backup (e.g., this can be a
     * "live" backup operation), but this is not recommended if the backup
     * will be used as part of a migration to a different version of
     * ModeShape or to different installation.
     *

     *
     * Multiple backup operations can operate at the same time, so it is
     * the responsibility of the caller to not overload the repository
     * with backup operations.
     *

     *
     * @param backupDirectory the directory on the local file system into
     *        which all backup files will be written; this directory
     *        need not exist, but the process must have write privilege
     *        for this directory
     * @return the problems that occurred during the backup operation
     * @throws AccessDeniedException if the current session does not
     *         have sufficient privileges to perform the backup
     * @throws RepositoryException if the backup cannot be run
     */
    Problems backupRepository( File backupDirectory ) throws RepositoryException;

    /**
     * Begin a restore operation of the entire repository, reading the
     * backup files in the specified directory on the local file system.
     * Upon completion of the restore operation, the repository will be
     * restarted automatically.
     *
     * The repository must be active when this operation is invoked.
     * However, the repository <em>may not</em> be used by any other
     * activities during the restore operation; doing so will likely
     * result in a corrupt repository.
     *

     *
     * It is the responsibility of the caller to ensure that this method
     * is only invoked once; calling multiple times wil lead to
     * a corrupt repository.
     *

     *
     * @param backupDirectory the directory on the local file system
     *        in which all backup files exist and were written by a
     *        previous {@link #backupRepository(File) backup operation};
     *        this directory must exist, and the process must have read
     *        privilege for all contents in this directory
     * @return the problems that occurred during the restore operation
     * @throws AccessDeniedException if the current session does not
     *         have sufficient privileges to perform the restore
     * @throws RepositoryException if the restoration cannot be run
     */
    Problems restoreRepository( File backupDirectory ) throws RepositoryException;
}

Next, we’ll take a look at each of these two methods.

Creating a backup

The backupRepository(...) method on ModeShape’s RepositoryManager interface is used to create a backup of the entire repository, including all workspaces that existed when the backup was initiated. This method blocks until the backup is completed, so it is the caller’s responsibility to invoke the method asynchronously if that is desired. When this method is called on a repository that is being actively used, all of the changes made while the backup process is underway will be included; at some point near the end of the backup process, however, additional changes will be excluded from the backup. This means that each backup contains a fully-consistent snapshot of the entire repository as it existed near the time at which the backup completed.

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.backupRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems restoring the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
       System.out.println(problem);
    }
} else {
    System.out.println("The backup was successful");
}

Each ModeShape backup is stored on the file system in a directory that contains a series of GZIP-ed files (each containing representations of a approximately 100K nodes) and a subdirectory in which all the large BINARY values are stored.

It is also the application’s responsibility to initiate each backup operation. In other words, there currently is no way to configure ModeShape to perform backups on a schedule. Doing so would add significant complexity to ModeShape and the configuration, whereas leaving it to the application lets the application fully control how and when such backups occur.

Restoring a repository

Once you have a complete backup on disk, you can then restore a repository back to the state captured within the backup. To do that, simply start a repository (or perhaps a new instance of a repository with a different configuration) and, before it’s used by any applications, load into the new repository all of the content in the backup. Here’s a simple code example that shows how this is done:

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.restoreRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems backing up the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
         System.out.println(problem);
    }
} else {
    System.out.println("The restoration was successful");
}

Once a restore succeeds, the newly-restored repository will be restarted and will be ready to be used.

Migrating from ModeShape 2.8 to 3.0

Earlier I mentioned that backup and restore can be used to migrate from one version of ModeShape to the next major version of ModeShape. This is how we plan to support migrating from a ModeShape 2.8 repository instance to a new ModeShape 3.0 instance. We plan to cut one more release of ModeShape 2, which we’ll christen 2.8.4.Final, and that will include a utility that will create a 3.0-compatible backup of the ModeShape 2.8 instance. Then, simply use the “restoreRepository” method on the new (and empty) ModeShape 3.0 repository to load all the backed-up content.

Questions or feedback

This feature is still relatively new and was introduced in ModeShape 3.0.0.Beta3, and we’d love to get your feedback on our forums before we freeze the public API and cut the 3.0.0.Final release.

Advertisements

Filed under: features, jcr, repository, techniques, tools

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ModeShape is

a lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc.

Use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

ModeShape used to be 'JBoss DNA'. It's the same project, same community, same license, and same software.

ModeShape

Topics

%d bloggers like this: