ModeShape

An open-source, federated content repository

Structured, unstructured, and everything between

Shane gives a good breakdown of the various ways to classify data as structured or unstructured. He points out that very often data is a mixture of both structured and unstructured data, and he gives several examples.

What I find so interesting about this, however, is how well ModeShape can handle these varieties of data.

ModeShape handles structured data really well. Most data structures are very easily mapped to the nodes and properties that ModeShape uses. And when those nodes also say which node types apply to them, ModeShape can enforce the node structure by validating it against those built-in and/or custom node types and prevent invalid data from being stored.

The other end of the spectrum is unstructured data, and ModeShape handles that beautifully, too. You can store unstructured data in a property using a string value or a binary value. Typically you would use a string value when the data is some form of text, and a binary value in any other cases (or when you don’t want to treat it as text).

But the best part is that ModeShape naturally handles combinations of structured and unstructured data. Recall that ModeShape is a hierarchical database, which means that each database consists of a single tree of nodes, and each node has one or more properties. That hierarchy is by definition structured, though it’s up to you whether ModeShape validates and enforces that structure using node types. But the leaves of that tree — that is the properties and their values — typically unstructured (though property value like dates and even some string values could be considered structured).

ModeShape’s query languages can also deal with both structured and unstructured data. Relationships between nodes, specific properties defined by node types, and the definitions of those properties all are addressable within the query language. But ModeShape queries can include full-text search constraints on both string and binary property values!

ModeShape can search those binary values when it can extract text using the Tika library, which supports many formats, including PDF, Microsoft Office™, RTF, HTML, and many others.

There’s one more way that ModeShape can deal with unstructured data: it can sequence unstructured data (string and binary property values) using built-in or custom sequencers to extract structure and save it as more nodes and properties in the repository. This is ideal for getting at that unstructured data that has the implicit structure defined by the format. For example, if an image is loaded into the repository, ModeShape’s image sequencer can extract the EXIF data in the image (e.g., ISO setting, focal length, aperture, shutter, geo-location, etc.) and save it as properties in the repository. ModeShape has a number of built-in sequencers that can extract this implicit structure from a variety of file formats:

  • DDL files
  • images (JPEG, GIF, BMP, PCX, PNG, IFF, RAS, PBM, PGM, PPM and PSD)
  • audio (MP3)
  • comma-separated and delimited text files
  • Java source and class files
  • Microsoft Office™
  • ZIP archives
  • XML
  • XML Schema
  • WSDL

In summary, ModeShape deals very naturally and easily with data that is part unstructured and part structured. What else could you want?

Filed under: features, repository, techniques

Creating and using tags in your content

UPDATE 2: Changed option 3 to use string identifiers, as WEAKREFERENCE and REFERENCE properties both maintain back-references.

UPDATE 1: Added a 5th option, as suggested by Bertrand Delacretaz.

(This post was inspired by a response I recently wrote to a Stack Overflow question. That answer was a bit long, but I thought it would also be suitable as a blog post.)

Many applications offer a way to tag “things” with either user-defined or system-defined tags. Assuming those “things” are nodes, what’s the best way to add tags to a ModeShape repository? I know of four five possible approaches, each with their own benefits and disadvantages.

Option 1: Use Mixins

This approach will use a separate mixin node type definition for each tag. The mixin is a marker mixin (e.g., it has no property definitions or child node definitions). One example of “known-issue” tag is the following (in CND format):

tag="http://www.example.com/tags"
[tag:known-issue] mixin

Create this tag by registering the node type definition using the NodeTypeManager, either by programmatically creating the node type template or by uploading a CND file.

To “tag” a particular node, simply add the tag’s mixin to the node:

node.addMixin("tag:knownIssue");

Note that any node can have multiple tags, since any node can have multiple mixins.

To find all nodes that have a particular tag, simply issue a query:

SELECT * FROM [tag:known-issue]

To find all nodes that have two tags, simply perform a UNION:

SELECT * FROM [tag:known-issue]
UNION
SELECT * FROM [tag:critical-issue]

This approach is pretty straightforward and really uses ModeShape’s mixin feature. However, it is fairly cumbersome to create new tags, since that requires registering new node types. Plus, you cannot easily rename tags, but instead would have to:

  1. create the mixin for the tag with the new name;
  2. find all nodes that have the mixin representing the old tag, and for each remove the old mixin and add the new one;
  3. finally remove the node type definition for the old tag (after it is no longer used anywhere).

Removing old tags is done in a similar manner. Finally, it’s not really possible to associate additional metadata (like a display name) with a tag, since extra properties aren’t allowed on node type definitions.

This approach should perform quite well, however.

Option 2: Use a taxonomy and references

This approach involves using one or more “taxonomies“, each of which consist of a parent node for the taxonomy and child nodes for each tag in that taxonomy. The exact node types used are entirely up to you, but the taxonomy structure can be as rich as you’d like it to be. For example, you can create inheritance between tags in much the same way that classes can inherit from other classes in an ontology. Obviously adding, renaming, and removing tags is straightforward.

To “tag” a node, this approach uses a REFERENCE property. One way to do this is to define a single node type for the tag nodes and a single mixin that we’ll use to add this REFERENCE property to “taggable” nodes:

tags="http://www.example.com/tags"
[tags:tag] > mix:title, mix:referenceable

[tags:taggable] mixin
- tags:tags (REFERENCE) multiple < 'tags:tag'

To “apply” the tag to a node, simply add the “tags:taggable” mixin to the node (if not already there) and add the REFERENCE to the desired tag node. Here’s some code that does this (although it is too simple and assumes the node hasn’t already been tagged):

Node tag = ... // find in taxonomy
Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
Value[] values = new Value[1];
values[0] = session.getValueFactory().createValue(tag);
n.setProperty("tags:tags",values);

To find all nodes of a particular tag, simply get the tag and call “getReferences()” on a tag node to find all of the nodes that contain a reference to the tag node:

Node tag = ...
NodeIterator iter = tag.getReferences("tags:tags");
while ( iter.hasNext() ) {
    Node tagged = iter.next();
}

Alternatively, you could use a query to find all of the nodes for a particular tag. Here’s one that finds all the nodes that are tagged with the ‘known-issues’ or ‘critical-issue’ tag (note how easy it is to search for nodes tagged with any of 1, 2, or n tags just by changing the set criteria):

SELECT * FROM [tags:taggable] AS taggable
JOIN [tags:tag] AS tag ON taggable.[tags:tags] = tag.[jcr:uuid]
AND LOCALNAME(tag) IN ('known-issue','critical-issue')

This approach has the benefit that all tags have to be controlled/managed within one or more taxonomies (including perhaps user-specific taxonomies).

However, there is one potentially substantial disadvantage: this option may not scale very well to large numbers of tagged nodes. ModeShape might start to degrade adding and removing REFERENCE values when there are hundreds of nodes pointing to the same tag node. Another disadvantage is that a tag cannot be removed from a taxonomy unless it is no longer used.

You can also use WEAKREFERENCE rather than REFERENCE. The only distinction is that with WEAKREFERENCE you can remove a tag from the taxonomy without having to remove it from the tagged nodes.

Option 3: Use taxonomy and identifier references

This option is similar to Option 2 above in that it involves formally managing one or more taxonomies, in exactly the same was as described above. The difference, however, is that rather than use a REFERENCE (or WEAKREFERENCE) the node that is to be tagged points to the tag node using a STRING property with the identifier of the tag node:

tags="http://www.example.com/tags"
[tags:tag] > mix:title, mix:referenceable

[tags:taggable] mixin
- tags:tags (STRING) multiple

Note that the tag has a “jcr:title” property, which you can use to hold the display name for the tag.

Tagging a node is done similarly to Option 2, except the value of the “tags:tag” property is a string:

Node tag = ... // find in taxonomy
String tagId = tag.getIdentifier();
Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
Value[] values = new Value[1];
values[0] = session.getValueFactory().createValue(tagId);
n.setProperty("tags:tags",values);

To find all nodes of a particular tag, simply use a query to find all of the nodes that have the identifier of a particular tag. Here’s one that finds all the nodes that are tagged with the ‘known-issues’ or ‘critical-issue’ tag (note how easy it is to search for nodes tagged with any of 1, 2, or n tags just by changing the set criteria):

SELECT * FROM [tags:taggable] AS taggable
JOIN [tags:tag] AS tag ON taggable.[tags:tags] = tag.[jcr:uuid]
AND LOCALNAME(tag) IN ('known-issue','critical-issue')

You’ll note that this is very similar to the query in Options 2 and 3. That’s because REFERENCE and WEAKREFERENCE properties are physically stored in a property value as an identifier.

Like option 2, this approach does enforce using one or more taxonomies, makes it a bit easier to control the tags, since they must exist in a taxonomy before they can be used. Renaming nodes is also pretty easy, although this is not necessary if using the “jcr:title” property for the display name , since renaming involves simply changing the title property value. Performance-wise, this is far better than the REFERENCE and WEAKREFERENCE approach, since non-reference properties will scale much better and perform better with large numbers of references, regardless of whether they all point to one node or many. Looking up the tag(s) from the “tags:tags” property is also very fast (and faster than navigating a path).

This approach is similar to Option 2 with WEAKREFERENCE properties in that you can remove a tag even if it is still used, although nodes’ “tags:tags” property values that point to that removed tag will not be usable anymore. This can be remedied with some conventions in your application, or by simply keeping tags around and using metadata on the taxonomy to say that a particular tag is “deprecated” and shouldn’t be used. (IMO, the latter is actually a benefit of this approach.)

This option will generally perform and scale much better than Option 2.

Option 4: Use string properties

The final approach is to simply use a STRING property to tag each node with the name of the tag(s) that are to be applied. This works great for ad hoc tags, which is when there is no formal taxonomy and any tag can be used at any time.

Here’s a mixin that defines a multi-valued STRING property:

tags="http://www.example.com/tags"
[tags:taggable] mixin
- tags:tags (STRING) multiple

To tag a node, simply add the mixin (if not already present) and add the name of the tag as a value on the “tags:tags” STRING property (again, if it’s not already present as a value). Here’s some simplified code that does none of the checking, but which gives the basic idea:

Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
String[] tags = new String[1]{"known-issue"};
n.setProperty("tags:tags",tags);

The primary advantage of this approach is that it is very simple: you’re simply using string values on the node that is to be tagged. To find all nodes that are tagged with a particular tag (e.g., “tag1″), simply issue a query:

SELECT * FROM [acme:taggable] AS taggable
WHERE taggable.[tags:tags] = 'known-issue'

Also, there is no taxonomy to manage. But if a tag is to be renamed, then you could simply process the “tags:tags” values. If a tag is to be deleted (and removed from the nodes that are tagged with it), then that can be done by removing the tag name from the “tags:tags” properties (perhaps in a background job).

Note that this allows any tag name to be used, and thus works best for cases where the tag names are not controlled at all. If you want to control the list of strings used as tag values, you could create a taxonomy in the repository (as described in Options 2 and 3 above) and have your application limit the values to those in the taxonomy. You can even have multiple taxonomies, some of which are perhaps user-specific. But this approach doesn’t have quite the same control as Options 2 or 3.

This option will perform just a bit better than Option 3 (since the queries are tad simpler), but will scale just as well.

Option 5: Use taxonomy and paths

A fifth option is very similar to Option 3, except that you use a PATH property (rather than a STRING property) that points to the tag, where the PATH values are paths to the tag. Here are some node types:

tags="http://www.example.com/tags"
[tags:tag] > mix:title

[tags:taggable] mixin
- tags:tags (PATH) multiple

(You could also use a STRING property instead of PATH; really the only advantage of using PATH is that it enforces that each value is a legal path value. But using PATH does not enforce that it is an existing path.)

To tag a node, simply add the mixin (if not already present) and add the path of the tag as a value on the “tags:tags” STRING property (again, if it’s not already present as a value). Here’s some simplified code that does none of the checking, but which gives the basic idea:

Node tag = ... // the tag node
Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
String[] tags = new String[1]{tag.getPath()};
n.setProperty("tags:tags",tags);

Unlike Options 2 or 3, this approach does not even use taxonomies. In fact, you’ll notice that the “tags:tags” property node type has no constraints that require it to contain a path; this reduces the constraints and requires your application to use convention, which can be an advantage. Using a title on the tag for the displayable name obviates having to rename tags. Performance-wise, this is far better than the REFERENCE or WEAKREFERENCE approach, and (for ModeShape) just a bit worse than using the STRING property with an identifier (ModeShape can resolve an identifier faster than it can finding it by path). But it will scale far better than Option 2 and similarly to Option 3.

One advantage of this approach (and of Option 3) over Option 2 is that you can remove a tag even if it is still used, although nodes’ PATH properties that point to that removed tag will be readable but not resolvable. (If you’re using the tag’s title for the display name, this might not be useful since the path might not contain meaningful and usable information.) This can be remedied with some conventions in your application, or by simply keeping tags around and using metadata on the taxonomy to say that a particular tag is “deprecated” and shouldn’t be used. (IMO, the latter is actually a benefit of this approach.)

Summary

We looked at five different ways of incorporating tags into your application. Of course, which one works best for you will depend on the needs of your particular application. And use these as a starting point — feel free to customize them, combine them, or even come up with even other alternatives.

If you just need a way to associate informal tags with content, perhaps Option 4 is a good fit. For very small and limited tagging needs, Option 1 might work. Whereas you should seriously look at option 2 for smallish repositories that needs a formal taxonomy.

But for most applications, your repository will be large enough that you will probably want to look at Options 3, 4 or 5, with the deciding factor being whether you need formal or informal taxonomies. Personally, of these three I think I’d tend to lean toward Option 3.

Happy tagging!

Filed under: jcr, techniques

Concurrent writes

It’s almost a certainty that you will have multiple applications and multiple threads within those applications simultaneously update data in your database. The speed of your application will depend significantly on how fast your database can perform these simultaneous updates.

If you’re using ModeShape, the first thing to know is that reading content does not require any locks. In other words, applications or threads that are reading content can always do so with no contention. (ModeShape doesn’t need read locks because it via Infinispan uses MVCC to isolate readers from writers. See the details for more.)

The second thing to know is that, because ModeShape is a hierarchical database, all data is stored in a tree-like structure of nodes and properties, and any transaction updating content must obtain locks for all nodes being updated. Much of the time, applications and threads that change content do tend to update different parts (subtrees) of the database, which means completely different write locks are acquired by the different transactions. In other words, updates to different parts of the database never block each other.

There are times, however, when multiple applications and/or threads do attempt to update the same node at the same time. In this case, the transactions do compete for the node’s lock, and these transactions complete in essentially a serialized fashion. (Again, they still do not block any reading operations or any transactions updating other areas of the repository.) Occasionally two transactions may deadlock, because they each obtain a lock on separate nodes and then try to obtain a lock on the node currently locked by the other. If you run into this situation, you can enable deadlock detection to automatically detect such cases and roll back one of the deadlocked transactions, which your application can simply re-try by performing the save again.

It’s nice to know that most of the time, application will not have any contention. And when there is contention for concurrent writes to the same areas, ModeShape does the logical thing by serializing the transactions. (Isn’t ACID behavior nice?!)

But even after all this, you may find that your applications are still highly contentious while trying to concurrently update the same nodes. In these cases, you have several options:

  1. Can you initialize the highly-contentious area when the database is created? If so, then the different transactions will update different areas of the database.
  2. Can you alter the hierarchical design of your database to eliminate the contention? Consider if your hierarchy would improve by adding one or more time-based levels. Or consider inserting a level for different contexts (e.g., users, groups, customers, etc.).
  3. Can you centralize where/how your application is updating these areas? For example, a hierarchy that includes a level for users might have contention when adding users. Try centralizing the process of adding users. (Queues often work great for these kinds of patterns.)

By the way, how does ModeShape compare to other hierarchical data stores? Really well, actually. One of the more popular JCR implementations uses a single, cluster-wide, global write lock that guarantees that only one write will proceed at a time. Yikes.

Filed under: features, jcr, techniques

New repository backup and restore in ModeShape 3

We recently added a new feature to ModeShape 3.0.0.Beta3 that enables repository administrators to create backups of an entire repository (even when the repository is in use), and to then restore a repository to the state reflected by a particular backup. This works regardless of where the repository content is persisted.

There are several reasons why you might want to restore a repository to a previous state, and many are quite obvious. For example, the application or the process it’s running in might stop unexpectedly. Or perhaps the hardware on which the process is running might fail. Or perhaps the persistent store might have a catastrophic failure (although surely you’re also using the persistent store’s backup system, too).

But there are also non-failure related reasons. Backups of a running repository can be used to transfer the content to a new repository that is perhaps hosted in a different location. It might be possible to manually transfer the persisted content (e.g., in a database or on the file system), but the process of doing so varies with different kinds of persistence options.  Also, ModeShape can be configured to use a distributed in-memory data grid that already maintains its own copies for ensuring high availability, and therefore the data grid might not persist anything to disk. In such cases, the content is stored on the data grid’s virtual heap, and getting access to it without ModeShape may be quite difficult. Or, you may initially configure your repository to use a particular persistence approach that suitable given the current needs, but over time the repository grows and you want to move to a different, more scalable (but perhaps more complex) persistence approach. Finally, the backup and restore feature can be used to migrate to a new major version of ModeShape.

In short, you may very well have the need to set the contents of a repository back to an earlier state. ModeShape’s backup and restore feature makes this easy to do.

Getting started

Let’s walk through the basic process of creating a backup of an existing repository and then restoring the repository. Both of these steps require an authenticated Session that has administrative privileges. It actually doesn’t matter which workspace the session uses:

javax.jcr.Repository repository = ...
javax.jcr.Credentials credentials = ...
String workspaceName = ...
javax.jcr.Session session = repository.login(credentials,workspaceName);

So far, this is basic and standard stuff for any JCR client.

Introducing the RepositoryManager

Each JCR Session instance has it’s own Workspace object that provides workspace-level functionality and access to a set of “manager” interfaces: the VersionManagerNodeTypeManagerObservationManagerLockManager, etc. The JSR-333 (aka, “JCR 2.1″) effort is still incomplete, but has plans to introduce a RepositoryManager that offers some repository-level functionality. The ModeShape public API has created such an interface, and accessing it from a standard JCR Session instance is pretty simple:

org.modeshape.jcr.api.Session msSession = (org.modeshape.jcr.api.Session)session;
org.modeshape.jcr.api.RepositoryManager repoMgr = ((org.modeshape.jcr.api.Session)session).getWorkspace().getRepositoryManager();

The interface is pretty self-explanatory, and defines several methods including two that are related to the backup and restore feature:

public interface RepositoryManager {

    ...

    /**
     * Begin a backup operation of the entire repository, writing the files
     * associated with the backup to the specified directory on the local
     * file system.
     *
     * The repository must be active when this operation is invoked, and
     * it can continue to be used during backup (e.g., this can be a
     * "live" backup operation), but this is not recommended if the backup
     * will be used as part of a migration to a different version of
     * ModeShape or to different installation.
     *

     *
     * Multiple backup operations can operate at the same time, so it is
     * the responsibility of the caller to not overload the repository
     * with backup operations.
     *

     *
     * @param backupDirectory the directory on the local file system into
     *        which all backup files will be written; this directory
     *        need not exist, but the process must have write privilege
     *        for this directory
     * @return the problems that occurred during the backup operation
     * @throws AccessDeniedException if the current session does not
     *         have sufficient privileges to perform the backup
     * @throws RepositoryException if the backup cannot be run
     */
    Problems backupRepository( File backupDirectory ) throws RepositoryException;

    /**
     * Begin a restore operation of the entire repository, reading the
     * backup files in the specified directory on the local file system.
     * Upon completion of the restore operation, the repository will be
     * restarted automatically.
     *
     * The repository must be active when this operation is invoked.
     * However, the repository <em>may not</em> be used by any other
     * activities during the restore operation; doing so will likely
     * result in a corrupt repository.
     *

     *
     * It is the responsibility of the caller to ensure that this method
     * is only invoked once; calling multiple times wil lead to
     * a corrupt repository.
     *

     *
     * @param backupDirectory the directory on the local file system
     *        in which all backup files exist and were written by a
     *        previous {@link #backupRepository(File) backup operation};
     *        this directory must exist, and the process must have read
     *        privilege for all contents in this directory
     * @return the problems that occurred during the restore operation
     * @throws AccessDeniedException if the current session does not
     *         have sufficient privileges to perform the restore
     * @throws RepositoryException if the restoration cannot be run
     */
    Problems restoreRepository( File backupDirectory ) throws RepositoryException;
}

Next, we’ll take a look at each of these two methods.

Creating a backup

The backupRepository(...) method on ModeShape’s RepositoryManager interface is used to create a backup of the entire repository, including all workspaces that existed when the backup was initiated. This method blocks until the backup is completed, so it is the caller’s responsibility to invoke the method asynchronously if that is desired. When this method is called on a repository that is being actively used, all of the changes made while the backup process is underway will be included; at some point near the end of the backup process, however, additional changes will be excluded from the backup. This means that each backup contains a fully-consistent snapshot of the entire repository as it existed near the time at which the backup completed.

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.backupRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems restoring the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
       System.out.println(problem);
    }
} else {
    System.out.println("The backup was successful");
}

Each ModeShape backup is stored on the file system in a directory that contains a series of GZIP-ed files (each containing representations of a approximately 100K nodes) and a subdirectory in which all the large BINARY values are stored.

It is also the application’s responsibility to initiate each backup operation. In other words, there currently is no way to configure ModeShape to perform backups on a schedule. Doing so would add significant complexity to ModeShape and the configuration, whereas leaving it to the application lets the application fully control how and when such backups occur.

Restoring a repository

Once you have a complete backup on disk, you can then restore a repository back to the state captured within the backup. To do that, simply start a repository (or perhaps a new instance of a repository with a different configuration) and, before it’s used by any applications, load into the new repository all of the content in the backup. Here’s a simple code example that shows how this is done:

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.restoreRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems backing up the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
         System.out.println(problem);
    }
} else {
    System.out.println("The restoration was successful");
}

Once a restore succeeds, the newly-restored repository will be restarted and will be ready to be used.

Migrating from ModeShape 2.8 to 3.0

Earlier I mentioned that backup and restore can be used to migrate from one version of ModeShape to the next major version of ModeShape. This is how we plan to support migrating from a ModeShape 2.8 repository instance to a new ModeShape 3.0 instance. We plan to cut one more release of ModeShape 2, which we’ll christen 2.8.4.Final, and that will include a utility that will create a 3.0-compatible backup of the ModeShape 2.8 instance. Then, simply use the “restoreRepository” method on the new (and empty) ModeShape 3.0 repository to load all the backed-up content.

Questions or feedback

This feature is still relatively new and was introduced in ModeShape 3.0.0.Beta3, and we’d love to get your feedback on our forums before we freeze the public API and cut the 3.0.0.Final release.

Filed under: features, jcr, repository, techniques, tools

New disk storage option for ModeShape

We’re introducing a new feature that allows ModeShape to store content directly on disk using the native file system. It’s called the Disk Connector, and is capable of storing any content that applications can put into a repository. It’s already in the ‘master’ branch and will be in the upcoming 2.6.0.Beta1 release of ModeShape. (If you want to give it a try before the release, grab the latest from our repository, run a local build to install it into your local Maven repository, and use the ’2.6-SNAPSHOT’ version in your application’s POM file.)

So now ModeShape offers are five connectors that can store all valid JCR content (including ‘mix:referenceable’ and ‘mix:versionable’ nodes, REFERENCE properties, version histories, etc.) and can also find nodes by identifier. We’ve designed all these connectors to own their data, meaning other applications should not directly access the underlying storage system. But any one of these is a great fit for most applications:

  • JPA Connector – stores all content in one of the 17 relational DBMS systems supported by Hibernate, including DB2, Oracle, MySQL, PostgreSQL, and SQL Server (to name a few)
  • Infinispan Connector – stores all content in a fast, scalable, distributed, and fault-tolerant Infinispan data grid
  • JBoss Cache Connector – stores all content in a JBoss Cache instance, and useful for small-to-medium sized repositories when Infinispan is not available
  • In-memory Connector – stores all content in-memory, and is fast and useful for small transient repositories or when importing XML and using JCR to read and search the content
  • Disk Connector – stores all content on disk in a binary format defined by ModeShape

ModeShape also offers other connectors that enable accessing the information in external systems, even when other applications use those same systems:

  • File System Connector – reads and writes ‘nt:file’, ‘nt:folder’ and ‘nt:resource’ nodes on the native file system using regular files and directories, mapping the properties defined by these node types to the actual file and directory attributes, and storing extra properties added to nodes via mixins in UTF-8 files (BINARY properties stored encoded in hexadecimal) that your applications can even read
  • JCR Connector – reads and writes content into an external JCR repository, and is useful when migrating from other JCR implementations or when federating existing JCR repositories into a single repository
  • Subversion Connector – reads and writes ‘nt:file’, ‘nt:folder’ and ‘nt:resource’ nodes as files and directories in a SVN repository; unlike the File System Connector, this only supports the standard properties defined on the ‘nt:file’, ‘nt:folder’, and ‘nt:resource’ node types
  • JDBC Metadata Connector – a read-only connector that maps the JDBC metadata into nodes representing the databases, catalogs, schemas, tables, columns, procedures, and other metadata information, and is very useful if you want to have a JCR repository that contains an accurate schema representation of one or more databases

Filed under: features, jcr, news, techniques

Finding a JCR repository

Updated 6/21/2011: Added section describing the Seam JCR module
Updated 6/23/2011: Added more detail about the JNDI location when ModeShape is deployed to JBoss AS

Okay, you’re using JCR in your application, and you’re writing all of your code to the JCR API. That’s great, because your application doesn’t have any implementation-specific calls, and you can rely only upon the “javax.jcr” packages.

“But,” you ask, “how do I get a reference to the javax.jcr.Repository instance without using implementation-specific code in my app?”

If you’re using JCR 1.0, you’re basically out of luck. The spec didn’t specify how to do that, and so the implementations all do it differently.

But thankfully JCR 2.0 introduced the javax.jcr.RepositoryFactory interface and described how to use the Java SE Service Locator pattern to get that initial reference to your repository instance without any implementation-specific code. Here’s how that works.

Using the JCR 2.0 RepositoryFactory

Your application will have one (or more) JCR implementations on the classpath, and per JCR 2.0 they will each provide their own RepositoryFactory implementations and manifest entries so that the JVM can find them. Your application can find them by using the Service Locator pattern:

Map parameters = ...
Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
  repository = factory.getRepository(parameters);
  if (repository != null) break;
}

This basically iterates over all of the RepositoryFactory implementations, and for each one asks that factory to return the JCR Repository instance given the map of parameters. Per JCR 2.0, if the RepositoryFactory understands the parameters, it will return a Repository instance; otherwise, it will return null. Now, each JCR implementation is allows to define their own parameters, so these definitely are still implementation-specific. But since they’re just properties, your application can remain independent of JCR implementation by simply loading them from a file:

Properties parameters = new Properties();
// Read from a file or from other input streams or readers ...
parameters.load(new FileInputStream(file));
// Find the Repository instance ...
Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
  repository = factory.getRepository(parameters);
  if (repository != null) break;
}

Look, Ma! No implementation-specific code!

ModeShape parameters for RepositoryFactory

So what parameters does ModeShape expect? Just one:

org.modeshape.jcr.URL

If the value of this parameter is a URL that resolves to a ModeShape configuration file, the factory will actually start up a new ModeShape engine using that configuration file, and will look for the repository in the URL. For example:

file:config/configRepository.xml?repositoryName=MyRepository

will look for a ModeShape configuration file named “configRepository.xml” that is in the “config” directory relative to where the JVM was started, and will return the repository defined in the configuration file with the name “MyRepository”. (Remember that a single ModeShape engine can host multiple JCR repositories.) Other URLs are possible, as long as they can be resolved to the configuration file.

If the value of the “org.modeshape.jcr.URL” parameter is a URL that begins with “jndi:”, then the ModeShape factory will attempt to look for a ModeShape engine instance registered in JNDI, and will ask that engine for the named repository. For example:

jndi:name/in/jndi?repositoryName=MyRepository

will look in JNDI for a ModeShape engine at “name/in/jndi”, and will ask it for the repository named “MyRepository”.

The JNDI form is what you’ll use if you’ve deployed ModeShape to JBoss AS and your applications need to access the repositories. ModeShape runs as a service within JBoss AS, so when the app server is started ModeShape will be auto-registered the engine in JNDI at “jcr/local”. If you’ve not changed the configuration, there will be a repository called “repository” (with a default workspace called “default”, though you can create other workspaces using the JCR API), and you can use the following URL for the “org.modeshape.jcr.URL” parameter:

jndi:jcr/local?repositoryName=repository

Of course, you probably want to change the configuration to add other repositories or to control where and how the repositories store the content (by default it is stored in-memory). If you add repositories or change the name of the repository, you’ll need to change the URL accordingly.

Injecting JCR Repositories

If you’re building an application that uses CDI, there’s another option for getting a hold of your Repository instance. The Seam JCR project is a portable extension to CDI that provides annotations for automatically injecting a javax.jcr.Repository object into your application, and Seam JCR works with ModeShape and Jackrabbit. Simple ensure that Seam JCR and your JCR implementation are on your classpath, and then simply use annotations to provide the same parameters normally supplied to the RepositoryFactory. Here’s an example of injecting ModeShape with the same “file:” URL used above:

  @Inject @JcrConfiguration(name="org.modeshape.jcr.URL",
                            value="file:config/configRepository.xml?repositoryName=MyRepository")
  Repository repository;

Seam JCR also makes it easy to inject a JCR Session into your application:

  @Inject @JcrConfiguration(name="org.modeshape.jcr.URL",
                            value="file:config/configRepository.xml?repositoryName=MyRepository")
  Session session;

This code will obtain a Session using the default workspace and no credentials, but the Seam JCR team is working on supporting Credentials and workspace names.

Of course, Seam JCR also works with Jackrabbit, but uses Jackrabbit-specific parameters. For more details, see the Seam JCR site.

Filed under: features, jcr, repository, techniques

ModeShape moves to Git

The ModeShape project’s official source code repository is now at GitHub:
http://github.com/ModeShape/modeshape
.

We’re adopting the Fork+Pull method of development. The basic idea is that you first fork the “official” ModeShape repository on GitHub. Then, you do all your development locally, push your proposed changes into your fork, and generate a pull-request describing your proposed changes. The ModeShape committers will review and discuss your changes and pull them into the “official” repository (using this process).

For details on this process, see our ModeShape Development Workflow article. We’ve started a discussion thread for any questions or lessons learned. We’ll hopefully improve the documentation in the coming days and weeks. And to learn more about Git, we recommend the following resources:

Filed under: news, open source, techniques, tools

Custom properties on nt:file and nt:folder nodes

One really nice feature of JCR repositories is that you can use them to store files and folders. Like with all other content, the structure of these nodes is dictated by their node types, and most people use the “nt:file” and “nt:folder” node types defined in the JCR specification. Learning to use these node types can take a little work, because they’re not quite as straightforward as you might expect.

Consider a “MyDocuments” folder that contains a “Personal” folder and a “Status Report.pdf” file. Here’s what those nodes might look like:

Nodes for folders and files

The folders look like what you might expect: they have a name, a primary type of “nt:folder”, and the “jcr:createdBy” and “jcr:created” properties defined by the “nt:folder” node type. (These properties are defined as ‘autocreated’, meaning the repository should set these automatically.)

The file representation, on the other hand, is different. The “Status Report.pdf” node has a primary type of “nt:file” and the “jcr:createdBy” and “jcr:created” properties defined by the “nt:file” node type, but everything about the content (including the binary file content in the “jcr:data” property) is actually stored in the child node named “jcr:content”. This may seem odd at first, but actually this design very nicely separates the file-related information from the content-related information.

Think about how an application might navigate the files and folders in a repository. Using the JCR API, the application asks for the “MyDocuments” node, so the repository materializes it (and probably its list of children) from storage. The application then asks for the children, so the repository loads the “Personal” folder node and the “Status Report.pdf” node, and there’s enough information on those nodes for the application to display relevant information. Note that the ”Status Report.pdf” file’s content has not yet been materialized. Only when the application asks for the content of the file (that is, it asks for the “jcr:content” node) will the content-related information be materialized by the repository. (And, some repository implementations might delay loading the “jcr:data” binary property until the application asks for it.) Nice, huh?

Another interesting aspect of the “nt:file” and “nt:folder” node types (and even the “nt:resource” node type) is that they don’t allow adding just any property on the node. The beauty is that they don’t have to, because you can still add extra properties to these nodes using mixins!

Let’s imagine that we want to add tags to our file and folder nodes, and that we want to start capturing the SHA-1 checksum (as a hexadecimal string) of our files. To start, we need to create two mixins (we’ll use the CND format):

[acme:taggable] mixin
- acme:tags (STRING) multiple

[acme:checksum] mixin
- acme:sha1 (STRING) mandatory

(We could have defined a mixin that allows any property, similar to how the standard “nt:unstructured” node type does it. Then, we can add any properties we want. However, I tend to like using more targeted mixins like these, if for no other reason than it makes it very easy to use JCR-SQL2 to query the nodes that use these mixins.)

We then need to register these node types in our repository (perhaps by loading the CND file or programmatically using the NodeTypeManager). Then, we can add the “acme:taggable” mixin to whatever file and folder nodes we want. This is as simple as:

// Find the node ...
Node myDocuments = session.getNode(pathToMyDocuments);
Node personalFolder = myDocuments.getNode("Personal");

// Add the mixin ...
personalFolder.addMixin("acme:taggable");

// Set the tags ...
String[] tags = {"non-work"};
personalFolder.setProperty("acme:tags",tags);

// Save the changes ...
session.save();

We can do something similar for the “Status Report.pdf” node, as well as use the “acme:checksum” mixin on the “jcr:content” node. The result is something like this:

File and folder nodes with custom properties

As you can see, JCR’s built-in “nt:file” and “nt:folder” nodes really are pretty easy to understand and use, even when you need to place custom properties on these nodes.

Filed under: jcr, techniques

Running tests against different DBMSes

Testing software against multiple database management systems can be tricky. It’s usually a headache to configure and maintain these tests, and then they usually take a long time to run.

Of course there are multiple ways of doing this, and each approach probably has some advantage for your system, the build environment, and whether your developers have access to the test databases.

JBoss DNA is an open source project, and open source projects run into some walls that commercial software developers often don’t have. With open source, each developer likely has a very different environment and likely does not have access to the same databases or same set of DBMSes. It doesn’t work to hard-coding the connection information or even to put it in some property files, because each developer would have to go and change all those settings. Even if we agreed upon a convention, not everyone has access to the same DBMS systems. What we want is to be able to run all of our builds normally without relying upon any external resources, and then at our choosing easily run our tests against the database each developer has access to.

Luckily, we use Maven, and so we can use Maven profiles to define a different environment for each database we want to use. In each database profile, we can add dependencies for the appropriate JDBC driver and specify connection properties. And it becomes very easy to turn profiles on and off, which means developers can choose which databases they want to test against. And, we can use HSQLDB or H2 by default, since these are fast, all developers have them (merely because of Maven dependencies), and there transient.

The only challenge is that we already are using profiles for different kinds of builds we do. One of our profiles (the one used by default) is fast because it simply compiles the source and run only the unit tests. This is the default because we run this build all of the time – it’s actually the easiest way to run all of the unit tests, so I run it constantly throughout the day as I make changes locally.

We have another profile that also compiles and runs the integration tests – these take several minutes to run, so it’s not very nice to have to wait for all these tests while you’re just verifying some changes didn’t cause some unintended behavior. Although I run these tests locally, I suspect most of the developers let our automated continuous integration server do the work.

Other profiles also generate JavaDoc, build our documentation (compiling DocBook into multiple HTML formats and one PDF form), and create the ZIP archives that we publish in our downloads area.

The database profiles are actually orthogonal to these other profiles. Despite this, there is (at least) one way to make them stay independent and play nicely together, and that’s activating them with properties. Our Maven command line becomes:

mvn clean install -Ddatabase=postgresql

and if we want to use any of our other profiles, we can just use the “-P” switch as we’ve always done:

mvn -P integration clean install -Ddatabase=postgresql

That would work great. All we have to do is define each database profile to activate based upon the “database” property.

Oh, you may be wondering why we don’t just explicitly name each profile on the command line. Well, actually we can, and it works as long as the user always remembers to include one database profile along with the other profiles they want to use. With Maven, if the user names a profile, then no other profile is activated by default, which means that our builds may run without a database profile, and this can break the tests. Activating the database profile with a property solves this problem.

Defining the database profiles

As I mentioned earlier, we really want a profile for each database that we want to test against. If we define the profiles in the parent POM, then all subprojects inherit them. So, the first step is to put in our parent POM a separate profile for each database configuration. Here is the HSQLDB configuration:

<profile>
  <id>hsqldb</id>
  <activation>
    <property>
      <name>database</name>
      <value>hsqldb</value>
    </property>
  </activation>
  <dependencies>
    <dependency>
      <groupId>hsqldb</groupId>
      <artifactId>hsqldb</artifactId>
      <version>1.8.0.2</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
  <properties>
    <database>hsqldb</database>
    <jpaSource.dialect>org.hibernate.dialect.HSQLDialect</jpaSource.dialect>
    <jpaSource.driverClassName>org.hsqldb.jdbcDriver</jpaSource.driverClassName>
    <jpaSource.url>jdbc:hsqldb:target/test/db/hsqldb/dna</jpaSource.url>
    <jpaSource.username>sa</jpaSource.username>
    <jpaSource.password />
  </properties>
</profile>

Note how the profile is activated when the “database” property matches a value (“hsqldb” in this case.) We also define the dependencies on the HSQLDB JDBC driver JAR, and we define the properties that we’ll use in our tests. And we have one of these for each database we want to test against.

But before we look at how those properties get injected into our test cases, let’s define the database profile that should be used if the “database” property is not set. We do this with a different activation strategy:

<profile>
 <id>default_dbms</id>
 <activation>
   <property>
     <name>!database</name>
   </property>
 </activation>
 <dependencies>
   <dependency>
     <groupId>hsqldb</groupId>
     <artifactId>hsqldb</artifactId>
     <version>1.8.0.2</version>
     <scope>test</scope>
   </dependency>
 </dependencies>
 <properties>
   <database>hsqldb</database>
   <jpaSource.dialect>org.hibernate.dialect.HSQLDialect</jpaSource.dialect>
   <jpaSource.driverClassName>org.hsqldb.jdbcDriver</jpaSource.driverClassName>
   <jpaSource.url>jdbc:hsqldb:target/test/db/hsqldb/dna</jpaSource.url>
   <jpaSource.username>sa</jpaSource.username>
   <jpaSource.password />
 </properties>
</profile>

I’ve chosen that the default database profile is identical to one of the other profiles, and this means I have some duplication in the POM file. Personally, the cleanliness for the user seems worth it.

Before we leave our parent POM, we also should probably define default values for all of the properties we use (or can use) in the database profiles. This will make it easier to inject those properties into our tests. So in the “<properties>” section of the parent POM, define a few default values:

<properties>
 <jpaSource.dialect/>
 <jpaSource.driverClassName/>
 <jpaSource.url/>
 <jpaSource.username/>
 <jpaSource.password/>
 <jpaSource.maximumConnectionsInPool>1</jpaSource.maximumConnectionsInPool>
 <jpaSource.minimumConnectionsInPool>0</jpaSource.minimumConnectionsInPool>
 <jpaSource.numberOfConnectionsToAcquireAsNeeded>1</jpaSource.numberOfConnectionsToAcquireAsNeeded>
 <jpaSource.maximumSizeOfStatementCache>100</jpaSource.maximumSizeOfStatementCache>
 <jpaSource.maximumConnectionIdleTimeInSeconds>0</jpaSource.maximumConnectionIdleTimeInSeconds>
 <jpaSource.referentialIntegrityEnforced>true</jpaSource.referentialIntegrityEnforced>
 <jpaSource.largeValueSizeInBytes>150</jpaSource.largeValueSizeInBytes>
 <jpaSource.autoGenerateSchema>create</jpaSource.autoGenerateSchema>
 <jpaSource.compressData/>
 <jpaSource.cacheTimeToLiveInMilliseconds/>
 <jpaSource.creatingWorkspacesAllowed/>
 <jpaSource.defaultWorkspaceName/>
 <jpaSource.predefinedWorkspaceNames/>
 <jpaSource.model/>
 <jpaSource.numberOfConnectionsToAcquireAsNeeded/>
 <jpaSource.referentialIntegrityEnforced>true</jpaSource.referentialIntegrityEnforced>
 <jpaSource.retryLimit>3</jpaSource.retryLimit>
 <jpaSource.rootNodeUuid/>
 <jpaSource.showSql>false</jpaSource.showSql>
</properties>

Injecting the database properties

Each of the database profiles define a bunch of properties, and we want to use those property values in each of our tests. The easiest way to do this is to use Maven filters to substitute the property values in some of our resource files when it copies them into the ‘target’ directory. We could do that in the parent POM, but it’s probably best to do that in each subproject, where we can be specific about the files we want filtered. JBoss DNA has a “dna-integration-test” subproject, and so in its POM file we just need to turn on filtering:

<build>
 ...
 <testResources>
   <testResource>
     <filtering>false</filtering>
     <directory>src/test/resources</directory>
     <includes>
       <include>*</include>
       <include>**/*</include>
     </includes>
   </testResource>
   <!-- Apply the properties set in the POM to the resource files -->
   <testResource>
     <filtering>true</filtering>
     <directory>src/test/resources</directory>
     <includes>
       <include>tck/jpa/configRepository.xml</include>
     </includes>
   </testResource>
 </testResources>
 ...
</build>

Here the first “<testResource>” fragment copies all of the files in the “src/test/resources” directory without doing any filtering, while the latter fragment copies the “src/test/resources/tck/jpa/configRepository.xml” file and does the substitution of the property values. For example, any “${japSource.url}” strings in the file are replaced with the appropriate value for this property as defined in the database profile (or the default).

The “configRepository.xml” file is a configuration file for the DNA JCR engine. What if we want to inject the database properties into our test cases? Well, one easy way is to have Maven filter a property file and then just have our test cases load that property file and use the data in it. We actually do this in one of our other projects, and it works great.

Running the tests

Now we can see the fruits of our labor. So we can run

mvn clean install

or

mvn -P integration clean install

to run all unit and integration tests, just as we could before. Since we don’t specify “-Ddatabase=dbprofile” on the command line, these builds will use the default database profile, which for us is HSQLDB. That means any database-related tests we run are fast and require no external resources. Brilliant!

Of course, once all of our tests pass with the default configuration, we can then run all the tests against different databases with a few simple commands:

mvn -P integration install -Ddatabase=mysql5
mvn -P integration install -Ddatabase=postgresql8
mvn -P integration install -Ddatabase=oracle10g
etc.

Pretty sweet! Well, as long as we have some patience.

Hat tip to the Hibernate and jBPM team, since our approach was largely influenced by a combination of their setup.

Filed under: techniques, testing

Git and SVN: the beginning

My previous post about Git and SVN hopefully intrigued you enough to want more. Yes, you can use Git locally even though your stuck with using Subversion for the central repository. In this post I’ll show you how to get started, while in the next post we’ll see how the Git Eclipse plug-in.

1. Install Git

You can install it several different ways, depending on your platform and your preference. I have a Mac, so I could download and compile it using MacPorts, or I could grab the latest installer. I like simple, so since I don’t yet have MacPorts I went for the installer. Piece of cake. I also chose to manually set up my path, so in my .bashrc file I added these statements:

export MANPATH=${MANPATH}:/usr/local/git/manexport PATH=${PATH}:/usr/local/git/bin:/usr/local/git/libexec/git-core/

(Okay, I actually used some variables in my real .bashrc to keep things DRY. I thought the above might cut through all that noise.) Note that we added “git/libexex/git-core/” to the PATH. This directory contains all the commands that start with “git-” (e.g., “git-svn …” is really the same as running “git svn …”).

2. Clone a remote SVN repository

The next step for me was creating a local git repository that mirrored the remote SVN repository.

$ git svn clone -s http://example.com/my_subversion_repo local_dir

The “-s” option tells git that the Subversion repository use the “standard” naming convention of using “trunk”, “branches”, and “tags” directories. When this command runs, it creates a subdirectory called “local_dir”, initializes the git repository in that “local_dir”, connects to the SVN repository at the URL (which does not include “trunk” or any other branch/tag information), proceeds to download all of the history (including branches and tags) in the SVN repository, does some cleanup, and checks out the equivalent of the SVN “trunk” to our working area. And, because of Git’s bridge with SVN, we’ll be able to regularly pull changes from SVN into our git repository, and we can even upload changes we make locally back into SVN.

We now have our new Git repository, so cd into it:

$ cd local_dir

If you look closely, you’ll see a couple of things. First, Git doesn’t proliferate your working area with “.svn” folders. Instead, there’s just one “.git” directory at the top. Git also uses a different technique than SVN for remembering which files and directories should be ignored. But we can bootstrap Git’s ignore information automatically from SVN’s. We’ll use the “.gitignore” file in our working area, since we want this information to be versioned and we want everyone using the repository to be able to use it. If your SVN repository already has a “.gitignore” file, then someone’s done the work for you. Otherwise, you’ll have to run the command to generate the file:

$ git svn show-ignore > .gitignore

This will take a minute or two, depending upon the size of your working area. I would then commit this new file, placing it in the master branch.

3. Committing (locally)

To commit our new file, we first have to tell Git that we want to start tracking this file. Or, in Git parlance, we want to stage the file by adding it to the index:

$ git add .gitignore

We can use the status command to see a summary of what’s already staged, what’s being tracked but hasn’t been staged, and what’s not being tracked at all:

$ git status .

We can also get a more detail report showing all the individual changes that have been staged:

$ git diff

We can then commit our staged changes:

$ git commit -m "Description of commit" .

or, if we want to automatically stage any modified or deleted files, we can add use the “–all” (or “-a”) option:

$ git commit -a -m "Description of commit" .

With the “commit” command, Git records the entire set of staged changes as a single commit to our current branch, and it moves the branch’s HEAD pointer to this last commit. Remember, Git works locally, so this commit is recorded locally. This may sound strange at first, but really it allows you to commit (locally) much more often, which can improve your development process since backing out things that don’t work is really easy.

To see the history of commits, use the “log” command

$ git log

This displays each time, message, and SHA-1 hash (the unique identifier) for the last n commits. You can use these hashes (or the first 4 characters or so) in the diff command.

$ git diff <hash>

displays the difference between a previous commit and changes, while:

$ git diff <hash1> <hash2>

displays the difference between two previous commits.

There are a lot of things Git can do – way too many to cover here. But look at “git bisect”, “git grep”, “git revert” jut to name a few. But let’s get back to our workflow.

4. Updating from Subversion

If you’re familiar with Subversion, you’re hopefully used to doing “svn update” before committing your changes. This pulls any revisions that others have made since you last updated, and its good form to do this and to make sure everything compiles and runs locally before committing.

In Git, you do this with the “git svn rebase” command. This command fetches revisions from the SVN used by the current HEAD (of the current branch), and “rebases” any current commits on the branch to apply to these latest SVN changes. This is analogous to creating a patch file for each local commit (relative to that commit’s parent) since the last rebase, updating the branch with the new SVN revisions, then sequentially applying the patches. In other words, it takes all your local changes (in the form of commits) and reapplies them to the latest SVN revisions.

Here’s the actually command to rebase the current branch:

$ git svn rebase

If you want to fetch all of the revisions on all branches in the SVN repository and rebase any local commits on those branches, you can add the “–fetch-all” option.

5. Committing back to Subversion

Once you have rebased your local commits on a branch, you will still have changes to that branch that aren’t yet in Subversion. We want to do the equivalent of an SVN commit, which in Git is to commit each diff on the branch back to SVN:

$ git svn dcommit

This creates a new revision in SVN for each local commit on the branch. Of course, if you want them to look like a single revision, you’d need to squash the commits before running the dcommit command.

Conclusion

This post focused mostly on how to get started with Git using a remote Subversion repository, so I didn’t talk much about how to go about branching and merging. I think you’ll agree, though, that Git’s Subversion bridge seems very intuitive, and in fact many of the same concepts used by the Subversion bridge are the same concepts used by the rest of Git. And once again I’m going to suggest reading the Git Community Book or watching Bart Trojanowski’s presentation.

Note: Portions of this post were taken from a previous post on my personal blog.

Filed under: techniques, tools

ModeShape is

a lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc.

Use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

ModeShape used to be 'JBoss DNA'. It's the same project, same community, same license, and same software.

ModeShape

Topics

Follow

Get every new post delivered to your Inbox.