ModeShape 3.0 Alpha1 is here, and it rocks!

January 28, 2012 • 10:56 am

ModeShape 3.0 Alpha1 is here, and it rocks!

The ModeShape team is happy to announce that we’ve issued the first alpha release of ModeShape 3. This is the first alpha release we’ve ever made, and it’s still rough around the edges. But we’re so excited about ModeShape 3 that we had to share. (And, yes, this post is really long, but it’s a good read.)

Our goal for ModeShape 3 is for it to be the seriously fast, very scalable, and highly available JCR implementation. To do that, we’ve made some pretty significant architectural changes. Some of these are:

We’re using Infinispan for all caching and storage. This gives the foundation we need to meet our goals while giving us the flexibility for how to store the content (via cache stores). ModeShape can still be embedded into applications, but Infinispan will help us scale out to create truly distributed, multi-site, content grids. This completely replaces our old connector framework.
So far our tests show ModeShape 3 is ridiculously fast. It’s all around faster than 2.7 – in fact, most operations are at least one (if not several!) orders of magnitude faster. We’ll publish proper performance and benchmarking results closer to the final release.
Scalability not only includes clustering (and “scaling out”), but it also means handling a wider range of node structures. We’ve tested our new approach with 100s of thousands of child nodes under a single parent, even when those nodes have ordered children with same-name-siblings. Yet it’s still almost just as fast as nodes with just a few child nodes!
Configuring repositories is hopefully much easier. There is no more global configuration of the engine; instead, each repository is configured with a separate JSON file that conforms to a JSON Schema and that your application can validate with one method call. Check out this entirely valid sample configuration file. You can deploy new repositories at runtime, and can even change a repository’s configuration while it is running (some restrictions apply). For example, you can add/change/remove sequencers, authorization providers, and many other configuration options while the repository is being actively used.
ModeShape continues to have great options for storing your content. ModeShape 2 had its own connector framework, but with ModeShape 3 we’re simply using Infinispan’s cache stores, with a number of great options out-of-the-box:
- In-memory (no cache store)
- BerkleyDB, which is quite fast but has license restrictions
- JDBM, a free alternative to BerkleyDB
- Relational databases (via JDBC), including in-memory, disk-based, or remote
- File system
- Cassandra
- Cloud storage (e.g., Amazon’s S3, Rackspace’s Cloudfiles, or any other provider supported by JClouds)
- Remote Infinispan grid
Every session now immediately sees all changes persisted/committed by other sessions, although transient changes of the session still take precedence. This behavior is different from in 2.x, and when combined with the new way node content is being store will hopefully reduce the potential for conflicts during session save operations. This means that all the Sessions using a given workspace can share the cache of persisted content, resulting in faster performance and smaller memory footprint. That means that ModeShape can handle more sessions at the same time in a single process.
Our Session, Workspace, NodeTypeManager and other components are thread safe. The JCR specification only requires that the Repository and RepositoryFactory interfaces are thread-safe. But making our implementations thread-safe means that it’s possible for multiple threads to share one Session for reading. Of course, Session is inherently stateful, so sharing a Session for writes is still a bad thing to do.
We have a new public API for monitoring the history, activity and health of ModeShape.
We’ve changed our sequencing API to use the JCR API. This should make it much easier to create your own sequencers, plus sequencers can also dynamically register namespaces and node types. We’ve already migrated most of our 2.x sequencers to this new API, and will be migrating the rest over the next few weeks.
Handling of binary values is greatly improved with a new facility that can store binary values of all sizes, including those that are (much) larger than available memory. In fact, only small binary values are stored in memory (this is configurable), while all other binary value are only streamed. We’ve started out with a file system store that will work even in clustered environments, but we also plan to add stores that use Infinispan and DBMSes.
We’re still using Lucene for our indexes, but we’re now using Hibernate Search to give us durable and fast ways to update the indexes, even in a cluster. Note that Hibernate Search is part of the Hibernate family, but it’s a small library that does not use, depend on, or require JPA or the Hibernate ORM.

As if that’s not enough, we still have a lot to do:

Kits for deploying ModeShape 3 as a service in JBoss AS7, allowing you to use the AS7 tooling to configure, deploy, manage, monitor, and undeploy your JCR repositories. Infinispan and JGroups are also built-in services in AS7 and can be managed the same way. Plus, ModeShape clustering will work out of the box using AS7’s built-in clustering (domain management) mechanism. ModeShape and JBoss AS7 will be the easiest way to deploy, manage and operate enterprise-grade repositories.
JTA support will allow JCR Sessions to participate in XA and container-managed transactions. We’re already using JTA transactions internally with Infinispan, so we’re already a good way toward this feature.
Map-Reduce is a great way to process in parallel large amounts of information. ModeShape will let you validate the entire repository content against the current set of node types or even a proposed set of node types, making it far easier to safely and confidently change the node types in a large repository. And we’ll provide a way for you to write your own mappers, reducers, and collectors to implement any kind of (read-only) analysis you want.

Hopefully you’re just as excited as we are. We love how far we’ve able to come with ModeShape 3, and we’re only part way there.

The good news is that you can start kicking the tires and seeing for yourself just how fast ModeShape 3 is. Most of the JCR features are working and are ready for trial and testing. In fact, please file bug reports if you find anything that doesn’t work. But unfortunately a few things still aren’t complete or working well enough:

Queries will parse but can’t be executed. Most of it works, but a few key pieces don’t work. Consequently, the JDBC drivers don’t work.
Clustering and shareable nodes don’t work.
AS7 kits are incomplete and not yet usable.
The RESTful and WebDAV services aren’t working as we’d like, so we excluded them from the alpha.
Federation is not yet working; see this discussion for how we want to expand federation capabilities.

We’re also overhauling our documentation to make it even more useful. But it’s a little sparse at the moment, we’re focusing on the code. Our What’s New and Getting Started pages are pretty useful, though, and should help you get your testing going. We also have some sample (and stand-alone) example Maven projects on GitHub that you can clone and hack to start putting ModeShape 3 through its paces.

What’s next? Well, we’re continuing to implement the missing and incomplete features, and we plan to release a second alpha in the next few weeks. We’ll follow that up over the following month with a couple of feature-complete beta releases and the final 3.0. release. Stay tuned!

Now, wasn’t that worth a few minutes of your time? We’re really excited about ModeShape 3, and think you’ll really like it, too.

Filed under: features, jcr, news, releases, repository, testing

Plans for ModeShape 2.x « ModeShape says:

January 28, 2012 at 11:01 am

[…] Older » […]

Hendy Irawan says:

January 28, 2012 at 11:23 pm

Congratulations Randal !

Wow… major rearchitecturing…
Since ModeShape 3 goes through additional data model layer (Infinispan key/value) will this mean ModeShape 3 content will be “incompatible” with how a non-Infinispan JCR (like Jackrabbit, or ModeShape 2.7) implementation stores its data ?

Randall says:

January 29, 2012 at 1:09 pm

Thanks, Hendy.

I don’t think it’s accurate to say that there’s an additional data layer. ModeShape 2 (and Jackrabbit, for that matter) both have persistence layers and cache layers. ModeShape 3 also has persistence and caching, but Infinispan is doing this for us.

Also, each JCR implementation is free to persist data however it wants, so compatibility is a misnomer. The JCR API specifies how you can import and export content (e.g., transfer content between repository instances).

Best regards

Hendy Irawan says:

January 29, 2012 at 1:47 pm

Thank you Randall for the explanation.

An issue I worry about is a scenario like this: if I want to expose a table through ModeShape, then I’m exposing the rows as JCR nodes, which means a single JSON object in Infinispan. Let’s say the person table contains name, age, and a lot of columns.

Now if I query ModeShape to get only the ages of all people, the table cache loader will need to return, for each row, a JSON containing all column values (not just age). The query result might get cached by Infinispan (is it?) but the initial query will be very expensive. Doing the same in SQL (select age from person) is very cheap.

Perhaps my assumptions are wrong… or maybe this is not a use case for ModeShape?

- Randall says:
  
  January 29, 2012 at 3:57 pm
  
  I think that’s a perfectly valid use case. But ModeShape’s query infrastructure uses Lucene indexes to find which nodes meet the criteria, without having to load any of the node data from persistent storage. Only when the application reads through the rows/nodes in the query results is the node data is lazily loaded from Infinispan. And even when reading through many, many results, there still is a cache that is discarding node data from memory when it hasn’t been used.
  
  Secondly, most JCR implementations (including Jackrabbit and ModeShape 2 and 3) store the information in persistent storage not in a tabular form but as blobs. In other words, even when databases are used for stores, their really used as key-value stores. And a key-value store (with cache) is *exactly* what Infinispan is excels!
  
  Thirdly, while a node with relatively small numbers of child nodes might be stored in a single JSON document, as the size of the node data increases, ModeShape can (optionally) separate the child references into blocks, where each block is stored in separate JSON documents.
  
  Finally, we’re doing a lot of things under the covers to maximize performance in terms of memory efficiency, speed, latency, etc.
- Hendy Irawan says:
  
  January 30, 2012 at 1:13 am
  
  Thanks Randall.
  
  “as the size of the node data increases, ModeShape can (optionally) separate the child references into blocks” this seems to be the answer, and represents a balanced tradeoff between convenience and performance 🙂

January 30, 2012 at 1:19 am

By the way, when you put it that way… it makes ModeShape a direct competitor to document databases like MongoDB.

However, MongoDB has server-side filters and operations, i.e. it doesn’t just store a document as a “dumb” JSON data, but it recognizes the structure inside JSON, and can index, filter, and update individual references inside each document with optimization.

ModeShape’s strong point is the uniform access interface (JCR, WebDAV, etc.) and federation. However, with 3.0 the only supported “connector framework” is via Infinispan cache loaders. I think storing the JSON “JCR nodes” directly as MongoDB documents (without Infinispan in between) will be a good use case (essentially implementing JCR API on top of MongoDB storage). What do you think?

For performance, which one is faster, disk-persistence-Infinispan-backed ModeShape or MongoDB backed ModeShape?

Randall says:

February 7, 2012 at 12:45 pm

@Hendy wrote:

By the way, when you put it that way… it makes ModeShape a direct competitor to document databases like MongoDB.

However, MongoDB has server-side filters and operations, i.e. it doesn’t just store a document as a “dumb” JSON data, but it recognizes the structure inside JSON, and can index, filter, and update individual references inside each document with optimization.

ModeShape, Infinispan, and MongoDB will certainly have similar features/functionality, but (as you mention) each will have some distinct capabilities and features. Like with anything, users need to choose what tool best fits their needs and requirements. Once 3.0 (and maybe 3.1) is out, we do hope to do more comparisons and publish our findings.

ModeShape’s strong point is the uniform access interface (JCR, WebDAV, etc.) and federation. However, with 3.0 the only supported “connector framework” is via Infinispan cache loaders. I think storing the JSON “JCR nodes” directly as MongoDB documents (without Infinispan in between) will be a good use case (essentially implementing JCR API on top of MongoDB storage). What do you think?

Using MongoDB directly underneath ModeShape would result in a lack of caching layer, so each request to load a particular node would be pushed to MongoDB. The result, I suspect, would be slower performance. I think the ideal integration would be to implement a custom Infinispan cache loader that uses MongoDB that is aware of the Document objects that ModeShape uses. These Document objects are compatible with (and in fact serialize to) BSON, so there’s great potential for really tight MongoDB integration. And with this approach, Infinispan acts more like an in-memory cache, improving overall performance.

For performance, which one is faster, disk-persistence-Infinispan-backed ModeShape or MongoDB backed ModeShape?

We haven’t done the in-depth performance analysis of various ModeShape 3 configurations to know which is (or will be) faster. But I think the ideal ModeShape/Infinispan/MongoDB integration mentioned above has the potential to be insanely fast. But we’ll see.

Frieder Heugel says:

January 30, 2012 at 11:05 am

Nice work. The new version looks very promising. I’m wondering if you guys have any plans on making modeshape available as a set of OSGi bundles?

Frieder Heugel says:

January 30, 2012 at 11:17 am

Hmm ignore my question, I somehow had the impression that not all modeshape libs are OSGi bundles guess I was wrong 😉

Sten Roger Sandvik says:

February 2, 2012 at 10:04 am

Great news. Is the search engine pluggable like in ModeShape 2? I am thinking of using ElasticSearch instead of lucene for the search part. Is this still viable in ModeShape 3?

Randall says:

February 7, 2012 at 12:33 pm

We are planning for the search engine to be pluggable, with the interface stabilizing over the next few weeks. Replacing the Lucene-based engine may be a fairly significant undertaking, though. If you’re interested, perhaps you might consider contributing an implementation?

ModeShape 2.8.0.Final is available « ModeShape says:

February 27, 2012 at 5:42 pm

[…] is the last planned release of the 2.x line. We’ve already released 3.0.0.Alpha1 and hope to release Alpha2 very, very soon. Give it a try – we think you’ll like […]

ModeShape 3.0.0.Alpha2 is ready « ModeShape says:

March 12, 2012 at 4:18 pm

[…] ModeShape 3.0.0.Alpha2 is now available in the JBoss Maven repository, and it’s ready for you to give it a spin. Most of the JCR features are implemented, and this second alpha release fixes quite a few issues and adds support for queries (except for full-text search) and clustering. See our release notes for details. For an overview of what’s new in 3.0, check out our Alpha1 announcement. […]

ModeShape