ModeShape

An open-source, federated content repository

Creating and using tags in your content

UPDATE 2: Changed option 3 to use string identifiers, as WEAKREFERENCE and REFERENCE properties both maintain back-references.

UPDATE 1: Added a 5th option, as suggested by Bertrand Delacretaz.

(This post was inspired by a response I recently wrote to a Stack Overflow question. That answer was a bit long, but I thought it would also be suitable as a blog post.)

Many applications offer a way to tag “things” with either user-defined or system-defined tags. Assuming those “things” are nodes, what’s the best way to add tags to a ModeShape repository? I know of four five possible approaches, each with their own benefits and disadvantages.

Option 1: Use Mixins

This approach will use a separate mixin node type definition for each tag. The mixin is a marker mixin (e.g., it has no property definitions or child node definitions). One example of “known-issue” tag is the following (in CND format):

tag="http://www.example.com/tags"
[tag:known-issue] mixin

Create this tag by registering the node type definition using the NodeTypeManager, either by programmatically creating the node type template or by uploading a CND file.

To “tag” a particular node, simply add the tag’s mixin to the node:

node.addMixin("tag:knownIssue");

Note that any node can have multiple tags, since any node can have multiple mixins.

To find all nodes that have a particular tag, simply issue a query:

SELECT * FROM [tag:known-issue]

To find all nodes that have two tags, simply perform a UNION:

SELECT * FROM [tag:known-issue]
UNION
SELECT * FROM [tag:critical-issue]

This approach is pretty straightforward and really uses ModeShape’s mixin feature. However, it is fairly cumbersome to create new tags, since that requires registering new node types. Plus, you cannot easily rename tags, but instead would have to:

  1. create the mixin for the tag with the new name;
  2. find all nodes that have the mixin representing the old tag, and for each remove the old mixin and add the new one;
  3. finally remove the node type definition for the old tag (after it is no longer used anywhere).

Removing old tags is done in a similar manner. Finally, it’s not really possible to associate additional metadata (like a display name) with a tag, since extra properties aren’t allowed on node type definitions.

This approach should perform quite well, however.

Option 2: Use a taxonomy and references

This approach involves using one or more “taxonomies“, each of which consist of a parent node for the taxonomy and child nodes for each tag in that taxonomy. The exact node types used are entirely up to you, but the taxonomy structure can be as rich as you’d like it to be. For example, you can create inheritance between tags in much the same way that classes can inherit from other classes in an ontology. Obviously adding, renaming, and removing tags is straightforward.

To “tag” a node, this approach uses a REFERENCE property. One way to do this is to define a single node type for the tag nodes and a single mixin that we’ll use to add this REFERENCE property to “taggable” nodes:

tags="http://www.example.com/tags"
[tags:tag] > mix:title, mix:referenceable

[tags:taggable] mixin
- tags:tags (REFERENCE) multiple < 'tags:tag'

To “apply” the tag to a node, simply add the “tags:taggable” mixin to the node (if not already there) and add the REFERENCE to the desired tag node. Here’s some code that does this (although it is too simple and assumes the node hasn’t already been tagged):

Node tag = ... // find in taxonomy
Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
Value[] values = new Value[1];
values[0] = session.getValueFactory().createValue(tag);
n.setProperty("tags:tags",values);

To find all nodes of a particular tag, simply get the tag and call “getReferences()” on a tag node to find all of the nodes that contain a reference to the tag node:

Node tag = ...
NodeIterator iter = tag.getReferences("tags:tags");
while ( iter.hasNext() ) {
    Node tagged = iter.next();
}

Alternatively, you could use a query to find all of the nodes for a particular tag. Here’s one that finds all the nodes that are tagged with the ‘known-issues’ or ‘critical-issue’ tag (note how easy it is to search for nodes tagged with any of 1, 2, or n tags just by changing the set criteria):

SELECT * FROM [tags:taggable] AS taggable
JOIN [tags:tag] AS tag ON taggable.[tags:tags] = tag.[jcr:uuid]
AND LOCALNAME(tag) IN ('known-issue','critical-issue')

This approach has the benefit that all tags have to be controlled/managed within one or more taxonomies (including perhaps user-specific taxonomies).

However, there is one potentially substantial disadvantage: this option may not scale very well to large numbers of tagged nodes. ModeShape might start to degrade adding and removing REFERENCE values when there are hundreds of nodes pointing to the same tag node. Another disadvantage is that a tag cannot be removed from a taxonomy unless it is no longer used.

You can also use WEAKREFERENCE rather than REFERENCE. The only distinction is that with WEAKREFERENCE you can remove a tag from the taxonomy without having to remove it from the tagged nodes.

Option 3: Use taxonomy and identifier references

This option is similar to Option 2 above in that it involves formally managing one or more taxonomies, in exactly the same was as described above. The difference, however, is that rather than use a REFERENCE (or WEAKREFERENCE) the node that is to be tagged points to the tag node using a STRING property with the identifier of the tag node:

tags="http://www.example.com/tags"
[tags:tag] > mix:title, mix:referenceable

[tags:taggable] mixin
- tags:tags (STRING) multiple

Note that the tag has a “jcr:title” property, which you can use to hold the display name for the tag.

Tagging a node is done similarly to Option 2, except the value of the “tags:tag” property is a string:

Node tag = ... // find in taxonomy
String tagId = tag.getIdentifier();
Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
Value[] values = new Value[1];
values[0] = session.getValueFactory().createValue(tagId);
n.setProperty("tags:tags",values);

To find all nodes of a particular tag, simply use a query to find all of the nodes that have the identifier of a particular tag. Here’s one that finds all the nodes that are tagged with the ‘known-issues’ or ‘critical-issue’ tag (note how easy it is to search for nodes tagged with any of 1, 2, or n tags just by changing the set criteria):

SELECT * FROM [tags:taggable] AS taggable
JOIN [tags:tag] AS tag ON taggable.[tags:tags] = tag.[jcr:uuid]
AND LOCALNAME(tag) IN ('known-issue','critical-issue')

You’ll note that this is very similar to the query in Options 2 and 3. That’s because REFERENCE and WEAKREFERENCE properties are physically stored in a property value as an identifier.

Like option 2, this approach does enforce using one or more taxonomies, makes it a bit easier to control the tags, since they must exist in a taxonomy before they can be used. Renaming nodes is also pretty easy, although this is not necessary if using the “jcr:title” property for the display name , since renaming involves simply changing the title property value. Performance-wise, this is far better than the REFERENCE and WEAKREFERENCE approach, since non-reference properties will scale much better and perform better with large numbers of references, regardless of whether they all point to one node or many. Looking up the tag(s) from the “tags:tags” property is also very fast (and faster than navigating a path).

This approach is similar to Option 2 with WEAKREFERENCE properties in that you can remove a tag even if it is still used, although nodes’ “tags:tags” property values that point to that removed tag will not be usable anymore. This can be remedied with some conventions in your application, or by simply keeping tags around and using metadata on the taxonomy to say that a particular tag is “deprecated” and shouldn’t be used. (IMO, the latter is actually a benefit of this approach.)

This option will generally perform and scale much better than Option 2.

Option 4: Use string properties

The final approach is to simply use a STRING property to tag each node with the name of the tag(s) that are to be applied. This works great for ad hoc tags, which is when there is no formal taxonomy and any tag can be used at any time.

Here’s a mixin that defines a multi-valued STRING property:

tags="http://www.example.com/tags"
[tags:taggable] mixin
- tags:tags (STRING) multiple

To tag a node, simply add the mixin (if not already present) and add the name of the tag as a value on the “tags:tags” STRING property (again, if it’s not already present as a value). Here’s some simplified code that does none of the checking, but which gives the basic idea:

Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
String[] tags = new String[1]{"known-issue"};
n.setProperty("tags:tags",tags);

The primary advantage of this approach is that it is very simple: you’re simply using string values on the node that is to be tagged. To find all nodes that are tagged with a particular tag (e.g., “tag1″), simply issue a query:

SELECT * FROM [acme:taggable] AS taggable
WHERE taggable.[tags:tags] = 'known-issue'

Also, there is no taxonomy to manage. But if a tag is to be renamed, then you could simply process the “tags:tags” values. If a tag is to be deleted (and removed from the nodes that are tagged with it), then that can be done by removing the tag name from the “tags:tags” properties (perhaps in a background job).

Note that this allows any tag name to be used, and thus works best for cases where the tag names are not controlled at all. If you want to control the list of strings used as tag values, you could create a taxonomy in the repository (as described in Options 2 and 3 above) and have your application limit the values to those in the taxonomy. You can even have multiple taxonomies, some of which are perhaps user-specific. But this approach doesn’t have quite the same control as Options 2 or 3.

This option will perform just a bit better than Option 3 (since the queries are tad simpler), but will scale just as well.

Option 5: Use taxonomy and paths

A fifth option is very similar to Option 3, except that you use a PATH property (rather than a STRING property) that points to the tag, where the PATH values are paths to the tag. Here are some node types:

tags="http://www.example.com/tags"
[tags:tag] > mix:title

[tags:taggable] mixin
- tags:tags (PATH) multiple

(You could also use a STRING property instead of PATH; really the only advantage of using PATH is that it enforces that each value is a legal path value. But using PATH does not enforce that it is an existing path.)

To tag a node, simply add the mixin (if not already present) and add the path of the tag as a value on the “tags:tags” STRING property (again, if it’s not already present as a value). Here’s some simplified code that does none of the checking, but which gives the basic idea:

Node tag = ... // the tag node
Node n = ... // the node that we're going to tag
if ( !n.isNodeType("tags:taggable") ) {
    n.addMixin("tags.taggable");
}
String[] tags = new String[1]{tag.getPath()};
n.setProperty("tags:tags",tags);

Unlike Options 2 or 3, this approach does not even use taxonomies. In fact, you’ll notice that the “tags:tags” property node type has no constraints that require it to contain a path; this reduces the constraints and requires your application to use convention, which can be an advantage. Using a title on the tag for the displayable name obviates having to rename tags. Performance-wise, this is far better than the REFERENCE or WEAKREFERENCE approach, and (for ModeShape) just a bit worse than using the STRING property with an identifier (ModeShape can resolve an identifier faster than it can finding it by path). But it will scale far better than Option 2 and similarly to Option 3.

One advantage of this approach (and of Option 3) over Option 2 is that you can remove a tag even if it is still used, although nodes’ PATH properties that point to that removed tag will be readable but not resolvable. (If you’re using the tag’s title for the display name, this might not be useful since the path might not contain meaningful and usable information.) This can be remedied with some conventions in your application, or by simply keeping tags around and using metadata on the taxonomy to say that a particular tag is “deprecated” and shouldn’t be used. (IMO, the latter is actually a benefit of this approach.)

Summary

We looked at five different ways of incorporating tags into your application. Of course, which one works best for you will depend on the needs of your particular application. And use these as a starting point — feel free to customize them, combine them, or even come up with even other alternatives.

If you just need a way to associate informal tags with content, perhaps Option 4 is a good fit. For very small and limited tagging needs, Option 1 might work. Whereas you should seriously look at option 2 for smallish repositories that needs a formal taxonomy.

But for most applications, your repository will be large enough that you will probably want to look at Options 3, 4 or 5, with the deciding factor being whether you need formal or informal taxonomies. Personally, of these three I think I’d tend to lean toward Option 3.

Happy tagging!

Filed under: jcr, techniques

Concurrent writes

It’s almost a certainty that you will have multiple applications and multiple threads within those applications simultaneously update data in your database. The speed of your application will depend significantly on how fast your database can perform these simultaneous updates.

If you’re using ModeShape, the first thing to know is that reading content does not require any locks. In other words, applications or threads that are reading content can always do so with no contention. (ModeShape doesn’t need read locks because it via Infinispan uses MVCC to isolate readers from writers. See the details for more.)

The second thing to know is that, because ModeShape is a hierarchical database, all data is stored in a tree-like structure of nodes and properties, and any transaction updating content must obtain locks for all nodes being updated. Much of the time, applications and threads that change content do tend to update different parts (subtrees) of the database, which means completely different write locks are acquired by the different transactions. In other words, updates to different parts of the database never block each other.

There are times, however, when multiple applications and/or threads do attempt to update the same node at the same time. In this case, the transactions do compete for the node’s lock, and these transactions complete in essentially a serialized fashion. (Again, they still do not block any reading operations or any transactions updating other areas of the repository.) Occasionally two transactions may deadlock, because they each obtain a lock on separate nodes and then try to obtain a lock on the node currently locked by the other. If you run into this situation, you can enable deadlock detection to automatically detect such cases and roll back one of the deadlocked transactions, which your application can simply re-try by performing the save again.

It’s nice to know that most of the time, application will not have any contention. And when there is contention for concurrent writes to the same areas, ModeShape does the logical thing by serializing the transactions. (Isn’t ACID behavior nice?!)

But even after all this, you may find that your applications are still highly contentious while trying to concurrently update the same nodes. In these cases, you have several options:

  1. Can you initialize the highly-contentious area when the database is created? If so, then the different transactions will update different areas of the database.
  2. Can you alter the hierarchical design of your database to eliminate the contention? Consider if your hierarchy would improve by adding one or more time-based levels. Or consider inserting a level for different contexts (e.g., users, groups, customers, etc.).
  3. Can you centralize where/how your application is updating these areas? For example, a hierarchy that includes a level for users might have contention when adding users. Try centralizing the process of adding users. (Queues often work great for these kinds of patterns.)

By the way, how does ModeShape compare to other hierarchical data stores? Really well, actually. One of the more popular JCR implementations uses a single, cluster-wide, global write lock that guarantees that only one write will proceed at a time. Yikes.

Filed under: features, jcr, techniques

ModeShape 3.0.0.CR3 is available

We released our second candidate release (CR) of ModeShape 3.0 last week, and the community discovered a few more issues with our AS7 integration, backup/restore, and query processing. Although the changes were relatively isolated, we still wanted to give the community a chance to test these fixes before we cut the final release.

So, ModeShape 3.0.0.CR3 is available immediate in the JBoss Maven repository (you can follow our instructions for using ModeShape in your Maven application) and on our downloads page (which include a kit that install ModeShape as a service within JBoss AS7.1.1.Final). Check out our documentationrelease notesJavaDoc, and our code on GitHub; use our forums or IRC channel to ask questions, and log any issues in our JIRA.

We’re pleased with this candidate release, and we believe that we’ll be able to release the final release next week. So please give this latest release a spin and let us know what you think. As with earlier beta releases, this release passes 100% of the JSR-283 TCK tests with all JCR features that were available in ModeShape 2.x. (Note that we won’t officially certify until 3.0.0.Final, but you can run the TCK tests yourself by downloading our source and running “mvn -s settings.xml clean install -Pjcr-tck”.)

As always, we couldn’t do this without all the help from our community members. Well done, and keep up the great work!

Filed under: jcr, news, releases

When is ModeShape a good fit?

Update: changed the Scalability section to make more clear the scope of the term.

When it comes down to it, ModeShape is a database. But there are lots of kinds of databases, and it’s always very important to choose a database that fits your application’s needs. Here are some of the characteristics that distinguish ModeShape from other kinds of databases, which should help you decide whether ModeShape is a good fit for your use cases.

Strongly consistent

ModeShape is strongly-consistent and adheres to the ACID principles, meaning that all operations are atomic, consistent, isolated, and durable. Your applications create Sessions to interact with the information stored in a repository workspace. Each session sees the latest persisted information, even as other applications (or parts of your application) are persisting changes through their own sessions. Your session can make changes, which are overlaid upon the latest persisted information, but only your session sees these changes until you save your session and the changes are persisted. Internally, ModeShape uses a transaction to make sure that all the session’s changes are made (or none of them are), that the changes are consistent, are seen by other sessions only when the changes are completed, and that the changes are durable.

What this means for you is that it’s very easy to develop and write applications, and in many ways is very similar to how you’ve worked with other ACID systems (like relational databases) in the past. You can even use JTA transactions so that the changes are persisted only upon transaction commit. And your application is written the same way whether ModeShape is clustered or non-clustered.

In the last few years, eventually-consistent databases have become very popular, due in part to the increasingly popular goal of creating very large (distributed) databases. When a change is made to an eventually consistent database, that change is not immediately propagated to other processes, but eventually (after a period of time when no changes are made) the database will become consistent. This means that right after one client makes a change, there is no guarantee when other clients will see those changes, and yet those other clients can change the data that they see. The result is that there can be multiple “versions” of the data, and although the database may attempt to resolve these conflicts, it often can only do this for relatively simple conflicts. Ultimately, your application will likely have to deal with the conflict. Additionally, many eventually consistent databases suggest specific usage patterns to make such conflicts less likely, but those usage patterns are often more complicated than you’re used to using. There absolutely are use cases where eventually consistent databases are perfect fits, but there are also lots of use cases and applications that are perfectly unsuitable for eventually consistent databases.

(Note that the next generation of Apache Jackrabbit, codenamed “Oak”, will be eventually consistent. To do that, they are not going to support all the JCR features. When your application saves its session, any conflicts that arise and that can’t be automatically handled will result in exceptions. Their expectation is that your application should then try again to recreate the changes, and that in the worst case your application may have to explicitly resolve the conflicts.)

Hierarchical data

ModeShape stores data in a tree of nodes and properties, where you have full control over the design of that tree structure. At the top of the tree is a single root node, and every node can contain multiple child nodes. Every node has a name and a unique identifier, and can also be identified by a path containing the names of all ancestors, from the parent to the node itself. Names are comprised of a namespace and local part, and there is a namespace registry to centralize short prefixes for each namespace.

You can see that this looks very similar to how a file system is laid out. You already know how to organize a file system, and organizing a ModeShape repository is very similar. In fact, lots of data already has implicit hierarchical structure. Consider that URLs are essentially addresses into a website’s hierarchy. And hierarchical data is easy to use: simply navigate the nodes. It’s also often more efficient to navigate, since related data is very close by.

Scalable and highly available

ModeShape repositories can be small and embedded into Java applications, or they can be very large and distributed across a cluster of machines. You can even decide how (and if) ModeShape should persist your data, ranging from keeping data only in-memory, to storing data on the local file system, to storing data in a relational database, to leveraging the performance, scalability, and durability of an in-memory and elastic data grid. In may seem counter-intuitive, but storing your data in RAM is extremely fast as long as multiple copies of all your data are stored across multiple machines while ensuring that machines can be added and removed and the data is automatically and elastically distributed. This is exactly what a data grid can achieve, and this is how ModeShape can scale to very (very) large databases.

All of ModeShape’s functionality and features are built on top of Infinispan, which is a flexible, fast and highly scalable data grid. ModeShape stores each node in one or more entries in Infinispan (a small node will be stored as one entry, but larger ones are broken into multiple entries), and Infinispan is configured to replicate, distribute and persist the data.

Note that when we talk about scalability and large databases, we’re not talking about the kinds of scales that “big data” often refers to. ModeShape is not a “big data” database and doesn’t scale that big. We’re transactional, after all.

Schema validation

ModeShape supports a very powerful and flexible schema system, but interestingly you get to decide where and how much schema enforcement to use. At one extremely, you allow every node to contain any property and any children – this is essentially using ModeShape as a schema-less database, and it’s a perfectly valid way to use ModeShape. Your application becomes fully in-control of the database structure, making it easy to evolve the structure to suit new or changing requirements.

At the other extreme, you fully define every node to fit a particular node type that constrains the properties and child nodes to fit pre-defined patterns. ModeShape ensures that all the data always adheres to the schema, and your application doesn’t have to do any validation or enforcement.

But between these two extremes is where ModeShape really becomes interesting and advantageous. You can choose which subset of nodes in your tree you want to adhere to a schema, allowing parts of the database to be more schema-less and the rest to be more constrained. But more importantly, you can dynamically expand the schema for any individual node by mixing in additional node types with more property and child patterns. For example, you can define a node type that requires a “title” property, and you can add this node type to any node that is to have a title.

ModeShape’s schema system is very powerful and flexible, and makes it far easier to constrain your data while simultaneously enabling future changes to and evolution of your database’s schema.

Query and search

Navigation isn’t the only way to access your data. Your applications can also query a ModeShape repository to find the subset of content that meets application-specified criteria regardless of where in the hierarchy that data exists. ModeShape offers several query languages, including a subset of XPath, a full-text search language (much like internet search engines), and an extremely powerful SQL-like language called “JCR-SQL2″. Here’s a fairly simple example of a JCR-SQL2 query:

SELECT * FROM [veh:vehicle] AS vehicle
WHERE vehicle.[veh:make] IN ('Chevrolet', 'Toyota', 'Ford')

The queries can be much more complex and can include joins, rich criteria, subqueries, and limits/offsets. The results sets are tabular, but still allow you to access the corresponding node(s) in each row.

Of course, ModeShape evaluates each query across all of the data, even when the repository is distributed in a cluster. That means your application is written the same way, regardless of how ModeShape is configured.

Events

ModeShape provides an event API so that your application can be notified when content changes. Your application can register listeners using a variety of criteria (e.g., “only notify me of the addition or removal of nodes in this subgraph”, or “only notify me when nodes of this type are changed”, or  even “notify me of all node and property changes”, etc.), and can then respond to the events with application-specific behavior.

Again, this behavior works the same way regardless of whether ModeShape is clustered – applications see the changes made by sessions in all processes in the cluster.

Other features

ModeShape includes a number of other features, too. ModeShape can automatically manage the history of a subtree of content – all that’s required is adding the “mix:versionable” mixin to the node, and then calling “checkin()”, “checkout()” and “restore()”.

Individual nodes can be locked to prevent other applications from modifying that area of the repository. Locks are intended to be short-term (e.g., scoped to a single session), though it’s possible to lock nodes for a longer duration.

Take the next step

We’ve covered a lot of topics in this post, but hopefully now you have a clear understanding of what kind of database ModeShape is and whether it is a fit for your use cases. Give it a try. ModeShape 3.0.0.Final is due out next week, but get the latest candidate release.

Filed under: features, jcr, repository

ModeShape 3.0.0.CR2 is available

The ModeShape community is very proud and happy to announce the immediate availability of 3.0.0.CR2!

This is our second candidate release (CR) for 3.0, which means that if all goes well and no issues are found, we’ll simply retag this as 3.0.0.Final. However, if severe issues are discovered with CR2, then we’ll fix those and cut a CR3 release for additional testing. This will continue until we release 3.0.0.Final, which will be no earlier than October 17 so that people have time to test and use the candidate release.

All the artifacts are available in the JBoss Maven repository, and you can follow our instructions for using ModeShape in your Maven application. Or, download one of our distributions, including kits that install ModeShape as a service within JBoss AS7.1.1.Final or AS7.2 (which hasn’t yet been released, but you can build it from the source). Check out our documentationrelease notes, JavaDoc, and our code on GitHub; use our forums or IRC channel to ask questions, and log any issues in our JIRA.

If you’re using an alpha or beta release, we strongly urge you to upgrade to this release since we think this might be what we’ll deliver in 3.0.0.Final, and because we’ve made almost 3 dozen fixes and improvements compared with the previous beta versions.

As with earlier beta releases, this release passes 100% of the JSR-283 TCK tests with all JCR features that were available in ModeShape 2.x. (Note that we won’t officially certify until 3.0.0.Final, but you can run the TCK tests yourself by downloading our source and running “mvn -s settings.xml clean install -Pjcr-tck”.)

As always, we couldn’t do this without all the help from our community members. Well done, and keep up the great work!

Filed under: jcr, news, releases

ModeShape 3.0.0.CR1 is available

The ModeShape community is very proud and happy to announce the immediate availability of 3.0.0.CR1!

This is our first candidate release (CR) for 3.0, which means that if all goes well and no issues are found, we’ll simply retag this as 3.0.0.Final. However, if issues are discovered with CR1, then we’ll fix those and cut a CR2 release for additional testing. This will continue until we release 3.0.0.Final, which will be no earlier than October 17 so that people have time to test and use the candidate release.

All the artifacts are available in the JBoss Maven repository, and you can follow our instructions for using ModeShape in your Maven application. Or, download one of our distributions, including kits that install ModeShape as a service within JBoss AS7.1.1.Final or AS7.2 (which hasn’t yet been released, but you can build it from the source). Check out our documentationrelease notes, JavaDoc, and our code on GitHub; use our forums or IRC channel to ask questions, and log any issues in our JIRA.

If you’re using an alpha or beta release, we strongly urge you to upgrade to this release since we think this might be what we’ll deliver in 3.0.0.Final, and because we’ve made almost 3 dozen fixes and improvements compared with the previous beta versions.

As with earlier beta releases, this release passes 100% of the JSR-283 TCK tests with all JCR features that were available in ModeShape 2.x. (Note that we won’t officially certify until 3.0.0.Final, but you can run the TCK tests yourself by downloading our source and running “mvn -s settings.xml clean install -Pjcr-tck”.)

As always, we couldn’t do this without all the help from our community members. Well done, and keep up the great work!

Filed under: jcr, news, releases

ModeShape 3.0.0.Beta4 is available

The ModeShape community is proud and happy to announce the immediate availability of 3.0.0.Beta4. All the artifacts are available in the JBoss Maven repository, and you can follow our instructions for using ModeShape in your Maven application. Or, download one of our distributions, including kits that install ModeShape as a service within JBoss AS7.1.1.Final or AS7.2 (which hasn’t yet been released, but you can build it from the source). Check out our documentationrelease notes, JavaDoc, and our code on GitHub; use our forums or IRC channel to ask questions, and log any issues in our JIRA.

If you’re using an earlier alpha or beta release, we strongly urge you to upgrade to this release since we’ve made significant fixes and improvements compared with earlier versions:

  • internal caches no longer grow unbounded and instead use a separate in-memory Infinispan cache for each workspace cache;
  • improvements and fixes for the MongoDB binary stores (other options include storing on the file system, in Infinispan, or in a JDBC database);
  • improvements in JCR-SQL2 and JCR-QOM queries, especially queries that use INTERSECT, UNION and EXCEPT;
  • corrections to the Node.save() behavior, even though JCR 2.0 deprecated this method in favor of always using Session.save();
  • over two dozen bug fixes and improvements since Beta3.

Perhaps most importantly, this release passes 100% of the JSR-283 TCK tests with all JCR features that were available in ModeShape 2.x. This is a first release that has passed all of these tests! (Note that we won’t officially certify until 3.0.0.Final, but you can run the TCK tests yourself by downloading our source and running “mvn -s settings.xml clean install -Pjcr-tck”.)

We expect at least one more beta release (a.k.a., 3.0.0.Beta5), but hope to follow that with a candidate for our Final release. We’ve been making good headway on the remaining outstanding issues.

As always, we couldn’t do this without all the help from our community members. Well done, and keep up the great work!

Filed under: jcr, news, releases

New repository backup and restore in ModeShape 3

We recently added a new feature to ModeShape 3.0.0.Beta3 that enables repository administrators to create backups of an entire repository (even when the repository is in use), and to then restore a repository to the state reflected by a particular backup. This works regardless of where the repository content is persisted.

There are several reasons why you might want to restore a repository to a previous state, and many are quite obvious. For example, the application or the process it’s running in might stop unexpectedly. Or perhaps the hardware on which the process is running might fail. Or perhaps the persistent store might have a catastrophic failure (although surely you’re also using the persistent store’s backup system, too).

But there are also non-failure related reasons. Backups of a running repository can be used to transfer the content to a new repository that is perhaps hosted in a different location. It might be possible to manually transfer the persisted content (e.g., in a database or on the file system), but the process of doing so varies with different kinds of persistence options.  Also, ModeShape can be configured to use a distributed in-memory data grid that already maintains its own copies for ensuring high availability, and therefore the data grid might not persist anything to disk. In such cases, the content is stored on the data grid’s virtual heap, and getting access to it without ModeShape may be quite difficult. Or, you may initially configure your repository to use a particular persistence approach that suitable given the current needs, but over time the repository grows and you want to move to a different, more scalable (but perhaps more complex) persistence approach. Finally, the backup and restore feature can be used to migrate to a new major version of ModeShape.

In short, you may very well have the need to set the contents of a repository back to an earlier state. ModeShape’s backup and restore feature makes this easy to do.

Getting started

Let’s walk through the basic process of creating a backup of an existing repository and then restoring the repository. Both of these steps require an authenticated Session that has administrative privileges. It actually doesn’t matter which workspace the session uses:

javax.jcr.Repository repository = ...
javax.jcr.Credentials credentials = ...
String workspaceName = ...
javax.jcr.Session session = repository.login(credentials,workspaceName);

So far, this is basic and standard stuff for any JCR client.

Introducing the RepositoryManager

Each JCR Session instance has it’s own Workspace object that provides workspace-level functionality and access to a set of “manager” interfaces: the VersionManagerNodeTypeManagerObservationManagerLockManager, etc. The JSR-333 (aka, “JCR 2.1″) effort is still incomplete, but has plans to introduce a RepositoryManager that offers some repository-level functionality. The ModeShape public API has created such an interface, and accessing it from a standard JCR Session instance is pretty simple:

org.modeshape.jcr.api.Session msSession = (org.modeshape.jcr.api.Session)session;
org.modeshape.jcr.api.RepositoryManager repoMgr = ((org.modeshape.jcr.api.Session)session).getWorkspace().getRepositoryManager();

The interface is pretty self-explanatory, and defines several methods including two that are related to the backup and restore feature:

public interface RepositoryManager {

    ...

    /**
     * Begin a backup operation of the entire repository, writing the files
     * associated with the backup to the specified directory on the local
     * file system.
     *
     * The repository must be active when this operation is invoked, and
     * it can continue to be used during backup (e.g., this can be a
     * "live" backup operation), but this is not recommended if the backup
     * will be used as part of a migration to a different version of
     * ModeShape or to different installation.
     *

     *
     * Multiple backup operations can operate at the same time, so it is
     * the responsibility of the caller to not overload the repository
     * with backup operations.
     *

     *
     * @param backupDirectory the directory on the local file system into
     *        which all backup files will be written; this directory
     *        need not exist, but the process must have write privilege
     *        for this directory
     * @return the problems that occurred during the backup operation
     * @throws AccessDeniedException if the current session does not
     *         have sufficient privileges to perform the backup
     * @throws RepositoryException if the backup cannot be run
     */
    Problems backupRepository( File backupDirectory ) throws RepositoryException;

    /**
     * Begin a restore operation of the entire repository, reading the
     * backup files in the specified directory on the local file system.
     * Upon completion of the restore operation, the repository will be
     * restarted automatically.
     *
     * The repository must be active when this operation is invoked.
     * However, the repository <em>may not</em> be used by any other
     * activities during the restore operation; doing so will likely
     * result in a corrupt repository.
     *

     *
     * It is the responsibility of the caller to ensure that this method
     * is only invoked once; calling multiple times wil lead to
     * a corrupt repository.
     *

     *
     * @param backupDirectory the directory on the local file system
     *        in which all backup files exist and were written by a
     *        previous {@link #backupRepository(File) backup operation};
     *        this directory must exist, and the process must have read
     *        privilege for all contents in this directory
     * @return the problems that occurred during the restore operation
     * @throws AccessDeniedException if the current session does not
     *         have sufficient privileges to perform the restore
     * @throws RepositoryException if the restoration cannot be run
     */
    Problems restoreRepository( File backupDirectory ) throws RepositoryException;
}

Next, we’ll take a look at each of these two methods.

Creating a backup

The backupRepository(...) method on ModeShape’s RepositoryManager interface is used to create a backup of the entire repository, including all workspaces that existed when the backup was initiated. This method blocks until the backup is completed, so it is the caller’s responsibility to invoke the method asynchronously if that is desired. When this method is called on a repository that is being actively used, all of the changes made while the backup process is underway will be included; at some point near the end of the backup process, however, additional changes will be excluded from the backup. This means that each backup contains a fully-consistent snapshot of the entire repository as it existed near the time at which the backup completed.

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.backupRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems restoring the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
       System.out.println(problem);
    }
} else {
    System.out.println("The backup was successful");
}

Each ModeShape backup is stored on the file system in a directory that contains a series of GZIP-ed files (each containing representations of a approximately 100K nodes) and a subdirectory in which all the large BINARY values are stored.

It is also the application’s responsibility to initiate each backup operation. In other words, there currently is no way to configure ModeShape to perform backups on a schedule. Doing so would add significant complexity to ModeShape and the configuration, whereas leaving it to the application lets the application fully control how and when such backups occur.

Restoring a repository

Once you have a complete backup on disk, you can then restore a repository back to the state captured within the backup. To do that, simply start a repository (or perhaps a new instance of a repository with a different configuration) and, before it’s used by any applications, load into the new repository all of the content in the backup. Here’s a simple code example that shows how this is done:

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.restoreRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems backing up the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
         System.out.println(problem);
    }
} else {
    System.out.println("The restoration was successful");
}

Once a restore succeeds, the newly-restored repository will be restarted and will be ready to be used.

Migrating from ModeShape 2.8 to 3.0

Earlier I mentioned that backup and restore can be used to migrate from one version of ModeShape to the next major version of ModeShape. This is how we plan to support migrating from a ModeShape 2.8 repository instance to a new ModeShape 3.0 instance. We plan to cut one more release of ModeShape 2, which we’ll christen 2.8.4.Final, and that will include a utility that will create a 3.0-compatible backup of the ModeShape 2.8 instance. Then, simply use the “restoreRepository” method on the new (and empty) ModeShape 3.0 repository to load all the backed-up content.

Questions or feedback

This feature is still relatively new and was introduced in ModeShape 3.0.0.Beta3, and we’d love to get your feedback on our forums before we freeze the public API and cut the 3.0.0.Final release.

Filed under: features, jcr, repository, techniques, tools

ModeShape 2.8.3.Final is available

Today we’re announcing the immediate availability of ModeShape 2.8.3.Final, which contains just under a dozen bug fixes that were reported against earlier 2.8.x releases. The release artifacts are available in the JBoss Maven repository (see our Maven instructions) and on our downloads page. The Getting Started and Reference Guides are available, too.

If you’re using any 2.x version, we recommend upgrading to this release.

There is one more feature that we plan to add in an upcoming 2.8.4 release: a utility for transferring content from a 2.8 repository into a 3.0 repository. The utility will create a 3.0-compatible backup of the 2.8 repository, and then you can use 3.0′s restore capability to load that content into a new ModeShape 3.0 repository.

However, we do expect that the upcoming 2.8.4.Final release will be the last release for the 2.x line.

As always, thanks to the entire ModeShape community for the continued use of 2.8.x and for help in finding and fixing the issues. Great job, everyone!

Filed under: jcr, news, releases

ModeShape 3.0.0.Beta3 is available

The ModeShape community is proud and happy to announce the availability of 3.0.0.Beta3. All the artifacts are available in the JBoss Maven repository, and you can follow our instructions for using ModeShape in your Maven application. Or, download one of our distributions, including kits to install ModeShape as a service within JBoss AS7.1.1.Final or AS7.2 (which hasn’t yet been released, but you can build it from the source). Check out our documentationrelease notes, JavaDoc, and our code on GitHub; use our forums or IRC channel to ask questions, and log any issues in our JIRA.

This release is improved over Beta2 with:

  • an improved RESTful API with support for handling large files;
  • a new feature to backup repositories (even when they’re being used) and restore repositories to a previously backed-up state;
  • the ability to store large binary values in Infinispan, JDBC database, and MongoDB (in addition to the file system available in previous releases); and
  • a number of other bug fixes and improvements.

We’re also passing 99.8% of the JSR-283 TCK tests, with all of the same JCR features that were available in ModeShape 2.x. The 4 outstanding failures should be fixed by the next release.

One more word of note. We’re making great progress on the remaining issues that we want to fix before releasing 3.0.0.Final, and although we had been releasing betas every 2 weeks, we’re planning to release 3.0.0.Beta4 in 3 weeks. Hopefully by then we’ll have most of the remaining issues fixed, and will be closer to issuing Final.

Thanks to all our community members that have helped with ModeShape 3, whether it was giving our alpha and beta releases a try, asking questions in the forums, reporting issues, offering suggestions, contributing code, or all of the above. Well done, and keep up the great work!

Filed under: jcr, news, releases

ModeShape is

a lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc.

Use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

ModeShape used to be 'JBoss DNA'. It's the same project, same community, same license, and same software.

ModeShape

Topics

Follow

Get every new post delivered to your Inbox.