ModeShape

An open-source, federated content repository

ModeShape 3.8.1.Final is available

ModeShape 3.8.1.Final is now available, with almost four dozen bug fixes. This release depends upon Infinispan 5.2.10, and the subsystem can be installed into EAP 6.3 Beta. See the release notes for details.

As usual, the artifacts are in the JBoss Maven repository and will soon be pushed into Maven Central. Or you can download a zip file with the libraries.

Give it a whirl and let us know on IRC or in our forums if you have any problems.

Last planned community 3.x release

Please note that this is the last planned community release of ModeShape 3.x. The community has already switched its focus to the 4.x stream: we’ve already released 4.0.0.Final and are hard at work on 4.1. We encourage any 3.x users to give 4.x a try, which contains all of the fixes in 3.8.1.Final and lots of new features and improvements.

JBoss Data Virtualization 6.1

However, if you are not able to move to 4.x but are looking for professional support, please take a look at Red Hat’s JBoss Data Virtualization platform version 6.1, which is nearing release and will include a completely support version of ModeShape that is based upon 3.8.1.Final. Contact Red Hat sales for more information.

 

Filed under: jcr, news, releases

ModeShape 4.0.0.Final is available

The ModeShape community is proud to announce the immediately availability of our latest stable release, ModeShape 4.0.0.Final. The JARs and other artifacts are available in the JBoss Maven repository and in our downloads area, and will be soon in Maven Central. See our Getting Started guide for details.

Thanks to our whole community for the work that’s gone into this release!

What’s new?

This major release contains new features and lots of fixes. Here’s a rundown of the most important features and changes in 4.0:

  • JDK 7 – ModeShape 4.0 requires JDK 7. We’ve not yet begun testing with Java 8, but we’d be happy to hear about it if you do.
  • Queries – The new query engine is more capable than in 3.x, and it buffers results off-heap to prevent large queries from exhausting your application’s memory. The engine still supports a variety of query languages, though JCR-SQL2 is still the most powerful and with 4.0 has a few more extensions. Explicitly define indexes to make your queries faster. All indexes are stored locally on the file system, and in clustered repositories each process in the cluster maintains its own copy of the indexes. In 4.1 we’ll start offering the ability to mix these with indexes stored in Solr, ElasticSearch, and/or Lucene.
  • Clustering – Configuring a cluster of ModeShape repositories is even easier. All configuration is done within the Infinispan’s clustering setup – if Infinispan is clustered, then ModeShape is part of the same cluster. We’ve also upgraded to a newer version of JGroups.
  • Journaling – ModeShape has a new event journal mechanism that helps the new (or returning) processes in a cluster better come up to speed with the history of events. You get all this with no work on your part, but even your applications can use the new feature via the JCR 2.0 event journal API.  This is a great alternative to JCR event listeners that in some situations might be very expensive or time-consuming.
  • Event bus – We’ve completely rewritten the way ModeShape repositories internally handle events. We now use a ring buffer that is substantially faster than what we had in 3.x. There’s no change in the event APIs so your listener implementations will continue to work unchanged – they’ll just be faster. The speed improvement is important, because we’re internally using listeners in more areas.
  • Infinispan – We’ve moved to Infinispan 6.0.x.Final, which is faster and has new cache stores. Some older and poorly-performaing cache stores are no longer valid, so check out the new file-based cache stores. Also, the LevelDB cache store is supposedly very fast.
  • Wildfly 8 – You can embed ModeShape within your applications, or you can install ModeShape as a subsystem within Wildfly so that your web apps and services can store and access content via the JCR API.
  • Repository Explorer – We’ve redesigned this web application to be much more usable.

These are just some of the new features in this release. In this and all of the 4.0 pre releases, we’ve addressed a total of 118 issues.

Give it a try and let us know what you think!

Filed under: features, jcr, news, releases

The JBoss Asylum talks ModeShape

The latest JBoss Asylum podcast is out. On this episodeEmmanuel Bernard and Max Rydahl Andersen talk with Horia and me about the project, some key features, and some really good ways to use ModeShape (and some not so great).

You can listen to this and all of their episodes online and in iTunes.

Filed under: appearances, news

ModeShape 4.0.0.Beta2 is available

The ModeShape community is proud to announce the immediately availability of our latest 4.0 pre-release, ModeShape 4.0.0.Beta2. The JARs and other artifacts are available in the JBoss Maven repository and in our downloads area, and will be soon in Maven Central. See our Getting Started guide for details.

Thanks to our whole community for the work that’s gone into this release!

What’s new?

This beta release contains 19 fixes and several new features. In this release, indexes can be updated synchronously (before save returns) or asynchronously (which we recommend since it’s all-around faster). There’s also a new CHILDCOUNT dynamic operand in ModeShape’s extended JCR-SQL2 grammar. In the previous releases we previewed a new query engine that no longer indexes everything in the repository like in 3.x, but instead always executes a query even if no indexes are available and can be used. You can and should explicitly define indexes that index only the information necessary to make your queries faster. This means lower overhead, smaller footprint, and more efficient query processing. .

Here’s a rundown of the most important features and changes in 4.0 so far:

  • JDK 7 – ModeShape 4.0 requires JDK 7. We’ve not yet begun testing with Java 8, but we’d be happy to hear about it if you do.
  • Queries – The new query engine is more capable than in 3.x, and it buffers results off-heap to prevent large queries from exhausting your application’s memory. The engine still supports a variety of query languages, though JCR-SQL2 is still the most powerful and with 4.0 has a few more extensions. Explicitly define indexes to make your queries faster. All indexes are stored locally on the file system, and in clustered repositories each process in the cluster maintains its own copy of the indexes. In 4.1 we’ll start offering the ability to mix these with indexes stored in Solr, ElasticSearch, and/or Lucene.
  • Clustering – Configuring a cluster of ModeShape repositories is even easier. All configuration is done within the Infinispan’s clustering setup – if Infinispan is clustered, then ModeShape is part of the same cluster. We’ve also upgraded to a newer version of JGroups.
  • Journaling – ModeShape has a new event journal mechanism that helps the new (or returning) processes in a cluster better come up to speed with the history of events. You get all this with no work on your part, but even your applications can use the new feature via the JCR 2.0 event journal API.  This is a great alternative to JCR event listeners that in some situations might be very expensive or time-consuming.
  • Event bus – We’ve completely rewritten the way ModeShape repositories internally handle events. We now use a ring buffer that is substantially faster than what we had in 3.x. There’s no change in the event APIs so your listener implementations will continue to work unchanged – they’ll just be faster. The speed improvement is important, because we’re internally using listeners in more areas.
  • Infinispan – We’ve moved to Infinispan 6.0.x.Final, which is faster and has new cache stores. Some older and poorly-performaing cache stores are no longer valid, so check out the new file-based cache stores. Also, the LevelDB cache store is supposedly very fast.
  • Wildfly 8 – You can embed ModeShape within your applications, or you can install ModeShape as a subsystem within Wildfly so that your web apps and services can store and access content via the JCR API.
  • Repository Explorer – We’ve redesigned this web application to be much more usable.

The set of 4.0 alpha and beta releases also include 118 bug fixes and other improvements.

What’s next?

Over the next few weeks we’ll keep fixing bugs and trying to stabilize the release. But as of today, there are just a handful of outstanding issues. Not only is the codebase already pretty stable, but we may be able to get to the final release pretty quickly.

What can you do?

Simple: test this release. Download it, use it, try the new features, and put it through its paces. Try it out and see how Infinispan 6 works, how much faster it is, and try one of the new and high-performance cache stores. Try out ModeShape in Wildfly 8. Give queries a whirl, and let us know if there are any queries that worked in 3.x no longer work in 4.x; remember they’ll probably be slower than in 3.x until you explicitly add indexes that cover your query constraints.

Filed under: features, jcr, news, releases

ModeShape 4.0.0.Beta1 is available

The ModeShape community is proud to announce the immediately availability of our latest 4.0 pre-release, ModeShape 4.0.0.Beta1. The JARs and other artifacts are available in the JBoss Maven repository and in our downloads area, and will be soon in Maven Central. See our Getting Started guide for details.

Thanks to our whole community for the work that’s gone into this release!

What’s new?

This beta release contains 46 fixes and a few new features. In the previous alpha releases we previewed a new query engine that no longer indexes everything in the repository like in 3.x, but instead always executes a query even if no indexes are available and can be used. With this beta release, you can now explicitly define indexes that index only the information necessary to make your queries faster. This means lower overhead, smaller footprint, and more efficient query processing. .

Here’s a rundown of the most important features and changes in 4.0 so far:

  • JDK 7 – ModeShape 4.0 requires JDK 7. We’ve not yet begun testing with Java 8, but we’d be happy to hear about it if you do.
  • Queries – The new query engine is more capable than in 3.x, and it buffers results off-heap to prevent large queries from exhausting your application’s memory. The engine still supports a variety of query languages, though JCR-SQL2 is still the most powerful and with 4.0 has a few more extensions. Explicitly define indexes to make your queries faster. All indexes are stored locally on the file system, and in clustered repositories each process in the cluster maintains its own copy of the indexes. In 4.1 we’ll start offering the ability to mix these with indexes stored in Solr, ElasticSearch, and/or Lucene.
  • Clustering – Configuring a cluster of ModeShape repositories is even easier. All configuration is done within the Infinispan’s clustering setup – if Infinispan is clustered, then ModeShape is part of the same cluster. We’ve also upgraded to a newer version of JGroups.
  • Journaling – ModeShape has a new event journal mechanism that helps the new (or returning) processes in a cluster better come up to speed with the history of events. You get all this with no work on your part, but even your applications can use the new feature via the JCR 2.0 event journal API.  This is a great alternative to JCR event listeners that in some situations might be very expensive or time-consuming.
  • Event bus – We’ve completely rewritten the way ModeShape repositories internally handle events. We now use a ring buffer that is substantially faster than what we had in 3.x. There’s no change in the event APIs so your listener implementations will continue to work unchanged – they’ll just be faster. The speed improvement is important, because we’re internally using listeners in more areas.
  • Infinispan – We’ve moved to Infinispan 6.0.x.Final, which is faster and has new cache stores. Some older and poorly-performaing cache stores are no longer valid, so check out the new file-based cache stores. Also, the LevelDB cache store is supposedly very fast.
  • Wildfly 8 – You can embed ModeShape within your applications, or you can install ModeShape as a subsystem within Wildfly so that your web apps and services can store and access content via the JCR API.
  • Repository Explorer – We’ve redesigned this web application to be much more usable.

The set of 4.0 alpha and beta releases also include 118 bug fixes and other improvements.

What’s next?

Over the next few weeks we’ll keep fixing bugs and trying to stabilize the release. But as of today, there are just a few outstanding issues. Not only is the codebase already pretty stable, but we may be able to get to the final release pretty quickly.

What can you do?

Simple: test this release. Download it, use it, try the new features, and put it through its paces. Try it out and see how Infinispan 6 works, how much faster it is, and try one of the new and high-performance cache stores. Try out ModeShape in Wildfly 8. Give queries a whirl, and let us know if there are any queries that worked in 3.x no longer work in 4.x; remember they’ll probably be slower than in 3.x until you explicitly add indexes that cover your query constraints.

Filed under: features, jcr, news, releases

Improving performance with large numbers of child nodes

A JCR repository is by definition a hierarchical database, and it’s important that the hierarchical node structure be properly designed to maintain good functionality and performance. If the hierarchy is too deep, you’re applications will spend a lot of time navigating lots of nodes just to find the one they’re interested in. Or, if the hierarchy is too wide, then there will be lots of children under a single parent and this large parent might become a performance bottleneck.

Unfortunately, it’s difficult to come up with hard and fast rules about what it means for a repository structure to be “too deep” or “too wide”. In this post we talk in detail about the performance of accessing a single node with lots of children, but applications rarely access just one node at a time. Instead, most applications access multiple nodes when performing most application-specific operations, and these patterns will greatly affect the total performance of the application and repository. So no matter what, measure the performance of your application using a variety of repository designs.

How does the number of child nodes affects performance?

ModeShape stores each node separately in the persistent store, and each node representation stores a reference to its parent and to all children. The parent reference makes it easy to walk up the tree, while the list of children makes it fast to walk children and to maintain the order of the children even as nodes are reordered and nodes with same-name-sibilngs are used.

A parent node that has 10s of thousands of children will thus have a pretty large representation in the persistent store, and this adds to the cost of reading and writing that representation. This is why we recommend not having large numbers of children under a single parent.

ModeShape does have the ability to break up the list of children into segments, and to store these segments separately from the parent node. This behavior is not enabled by default, but it can be enabled as a background optimization process.

Avoiding large numbers of child nodes

Sometimes it’s quite easy to design a node structure that doesn’t have parent nodes with large numbers of children. A blog application might organize the posts by date (e.g., “/posts/{year}/{month}/{date}/{title}”), and this works quite well simply because at every level the number of children is limited. For example, there will never be more than 12 nodes under a given year, and never more than 31 nodes under a given month. Also, it is unlikely to create many posts on a single day, so the number of titles under a given day will even be quite small.

While there are many data structures that can naturally organize your hierarchy of nodes, there are situations where there is no obvious natural hierarchy. Consider an application that maintains customers, where each customer is identified by a unique identifier. Your application may be able to organize the customer by region, by date, or by some other characteristic. But that’s not always possible or ideal. In that case, it may be useful to base the hierarchy on an artificial characteristic.

Consider the common case where the identifiers are UUIDs, which are unique and very easily generated. UUIDs are also very nicely and uniformly distributed, meaning that the characters of the hexadecimal form (e.g., “eb751690-23cb-11e4-8c21-0800200c9a66″) of two consecutively generated UUIDs will differ in most of the characters. We can exploit the hexadecimal representation and the uniform distribution of UUIDs to create a hierarchical structure that can store a lot of nodes with just a few levels in the hierarchy.

For example, if we use the first two hexadecimal characters as the name of our first level of nodes, and the next two characters for the second level of our node structure, then we can easily store 1 million nodes in a structure that never has more than 256 children under a single parent. The customer with ID “eb751690-23cb-11e4-8c21-0800200c9a66″ can be found by turning the ID into a path:

/customers/eb/75/1690-23cb-11e4-8c21-0800200c9a66

We could vary the design to use 3 characters for the first level and no second intermediate level. That means we can store our 1M nodes with fewer intermediate nodes, while still ensuring that the first level contains no more than 4096 children, while each of those intermediate nodes contains around 256 children. That same customer would be found at:

/customers/eb7/51690-23cb-11e4-8c21-0800200c9a66

Or, we might try 4 levels each with a single character, resulting in a lot more intermediate nodes but each with a very small number of children. Then, that same customer would be found at:

/customers/e/b/7/5/1/690-23cb-11e4-8c21-0800200c9a66

The point is that you can often create a hierarchy that does not require parent nodes with large numbers of children. Of course, if you’re whole hierarchy is designed around these artificial traits and no natural traits, then you may be misusing a hierarchical database and might consider other technologies.

Designing with large numbers of child nodes

Sometimes almost all of your hierarchy design will use the natural traits of the data to create a nice hierarchy, but you have one area or level at which you’d like to store parents with relatively large numbers of child nodes under. If you’re careful and follow these guidelines, you may be able to design it so that ModeShape still performs well for your application without having to use artificial traits.

One of the more expensive operations is adding a child to a parent that already has lots of children. This is because the JCR specification requires that ModeShape validate a number of things before allowing the new child. But with proper design, you can minimize or even eliminate much of that expensive validation.

  • The parent’s primary type and mixins should have a total of one one child node definition, and it should allow same-name-siblings. When this is the case, the single child node definition means that ModeShape can use optimal algorithm that is much faster than 2 or more child node definitions. Also, because the child node definition allows SNS, ModeShape does not have to determine if there is already an existing child with the same name (can be very expensive) before it can pick the child node definition. It also means that when saving changes to the parent, ModeShape doesn’t have to re-validate that there are no children with duplicate names. This saves a tremendous amount of time.
  • Large parent should not be versionable. When a parent contains lots of children, make sure that the parent’s node types and mixins are not mix:versionable, and that all child node definitions have an on parent versioning of ignore. This allows ModeShape to speed up quite a few potentially-expensive operations, including addNode(…).
  • Do not use same name siblings. Even though the node types would allow it, we recommend not using same-name-siblings and having your node structure design or your application ensure that you don’t add duplicates. For example, if your node structure uses UUIDs or SHA-1s as subpaths, the nature of those values ensures that there will not be clashes.
  • Add children in batches. ModeShape can very quickly add lots of nodes using a single save operation. For example, it only takes a few seconds to add 10k child nodes under one parent using a single session and a single save. Use as large of batches as possible. Even when repeating that many times (e.g., adding 200k child nodes under one parent using batches), the performance is pretty quick. On the other hand, it is far more expensive and time consuming to add 200k nodes one at a time.
  • When possible, add multiple children under the same parent before creating other nodes. When ModeShape adds a child node with a given name and primary type under a parent, it has to look at the parent’s primary type and mixins to determine if a child node definition allows adding that child. We’ve added some improvements in ModeShape 3.8.1 and later so that ModeShape caches in each thread the last primary type and mixins that were used previously, and this saves a lot of time to add lots of children under the same parent using one session (even across multiple saves).
  • Do not use versioning. JCR’s versioning actually makes a lot of operations quite expensive. For example, before a child can be added or even before a property can be modified, ModeShape has to make sure that the node nor any of its ancestors are checked out. If any of the ancestors have large numbers of children, materializing that node could be very expensive. In ModeShape 3.8.1 and later, we’ve added an optimization to completely skip these checks when there are no version histories.

Other operations, like getting the path of a node, can also be expensive if any of the ancestors is large or expensive to read. ModeShape normally caches nodes, and if they’re frequently used they’ll stay in the cache. But these cached representations are discarded as soon as the node is modified. This is why adding or modifying nodes can impact read performance.

Use the latest version of ModeShape

As mentioned above, ModeShape 3.8.1 and later will include a number of changes that will improve ModeShape’s overall performance and, especially, performance when working with parents that have lots of children. Look for ModeShape 3.8.1 in the next month or so, and more 4.0 pre- and final releases over the next few months.

 

 

Filed under: features, jcr, performance, techniques

ModeShape 4.0.0.Alpha4 is available

The ModeShape community is proud to announce the immediately availability of our fourth 4.0 pre-release, ModeShape 4.0.0.Alpha4. The JARs and other artifacts are available in the JBoss Maven repository and in our downloads area, and will be soon in Maven Central. See our Getting Started guide for details.

Thanks to our whole community for the work that’s gone into this release!

What’s new?

This alpha release contains a number of fixes on top of the fixes and features in the previous alpha releases. We’ve made some improvements to the new query engine, including initial support for CONTAINS. We’ve also improved the index provider SPI, and have started implementing the local provider. Perhaps the biggest new feature is the redesigned Repository Explorer web application.

Alpha4 also includes changes that were in Alpha1, Alpha2 and Alpha3, including simpler clustering. ModeShape now automatically piggybacks on the Infinispan clustering configuration, and nothing clustering-specific is needed in the ModeShape configuration. We’re also improving how ModeShape logs events so that it’s far easier and less time-consuming to have processes (re)join the cluster.

As of 4.0.0.Alpha1, ModeShape uses Infinispan 6.0, and that comes with improved performance and several very attractive cache stores, especially the one for LevelDB. And ModeShape 4.0.0.Alpha4 can deploy on Wildfly 8.0, making it very easy for your applications to simply look up and use repositories that managed using Wildfly tooling and configuration.

ModeShape 4.0 is also licensed under the Apache Software License (ASL) 2.0, and we no longer require Contributor License Agreements (CLAs).

What’s next?

We hope that the next release is our first beta, meaning we hope to have completed all of the features. We’ll keep releasing betas until the codebase is stable, at which point we’ll start issuing candidate releases and ultimately a final release.

So our next step is to complete the index providers for the file system and Lucene, and to then start putting the whole new query system through its paces. If anyone is interested in helping us with index providers for Solr and ElasticSearch, please let us know; without some contributions they will likely be available in 4.1. However, we’d love to get feedback on the index provider SPI before 4.0.0.Final.

What can you do?

Although this is an alpha release not suitable for production, we’d really appreciate the community picking up this release and at least putting it through the basics. Try it out and see how Infinispan 6 works, how much faster it is, and try one of the new and high-performance cache stores. Try out ModeShape in Wildfly 8. Give queries a whirl, and let us know if there are any queries that worked in 3.x no longer work in 4.x; remember they’ll probably be slower than in 3.x because we don’t have any indexes yet.

Filed under: features, jcr, news, releases

ModeShape 3.8.0.Final is available

ModeShape 3.8.0.Final is now available, with almost a dozen bug fixes and a few minor features. This release depends upon Infinispan 5.2.10, and the subsystem can be installed into EAP 6.2.1.GA or EAP 6.3 Beta. See the release notes for details.

As usual, the artifacts are in the JBoss Maven repository and will soon be pushed into Maven Central. Or you can download a zip file with the libraries.

Give it a whirl and let us know on IRC or in our forums if you have any problems.

Filed under: jcr, news, releases

ModeShape 4.0.0.Alpha3 is available

The ModeShape community is proud to announce the immediately availability of our third 4.0 pre-release, ModeShape 4.0.0.Alpha3. The JARs and other artifacts are available in the JBoss Maven repository and in our downloads area, and will be soon in Maven Central. See our Getting Started guide for details.

Thanks to our whole community for the work that’s gone into this release!

What’s new?

This alpha release contains a number of fixes on top of the fixes and features in the previous alpha releases. We’re still working on the query index functionality, so very little has changed there. The biggest new feature in Alpha3 is that the event system uses our new ring buffer that is substantially faster.

Alpha3 also includes changes that were in Alpha1 and Alpha2, including simpler clustering. ModeShape now automatically piggybacks on the Infinispan clustering configuration, and nothing clustering-specific is needed in the ModeShape configuration. We’re also improving how ModeShape logs events so that it’s far easier and less time-consuming to have processes (re)join the cluster.

As with Alpha3, Alpha2 and Alpha1 all use Infinispan 6.0, and that comes with improved performance and several very attractive cache stores, especially the one for LevelDB. And ModeShape 4.0.0.Alpha3 can deploy on Wildfly 8.0, making it very easy for your applications to simply look up and use repositories that managed using Wildfly tooling and configuration.

What’s next?

We plan to continue issuing more alpha releases about every 3 weeks until we’ve completed all features, at which point we’ll start issuing beta releases that fix any issues that will come up. When the codebase is stable and ready for a release, we’ll start issuing candidate releases and ultimately a final release.

So our next step is to add index providers for the file system and Lucene, and to then start putting the whole new query system through its paces. If anyone is interested in helping us with index providers for Solr and ElasticSearch, please let us know; without some contributions they will likely be available in 4.1.

What can you do?

Although this is an alpha release not suitable for production, we’d really appreciate the community picking up this release and at least putting it through the basics. Try it out and see how Infinispan 6 works, how much faster it is, and try one of the new and high-performance cache stores. Try out ModeShape in Wildfly 8. Give queries a whirl, and let us know if there are any queries that worked in 3.x no longer work in 4.x; remember they’ll probably be slower than in 3.x because we don’t have any indexes yet.

Filed under: features, jcr, news, releases

ModeShape 3.7.4.Final is available

ModeShape 3.7.4.Final is now available and contains several bug fixes for Access Control Lists (ACLs), sequencer initialization, and moving nodes. See the release notes for details.

As usual, the artifacts are in the JBoss Maven repository and will soon be pushed into Maven Central. Or you can download a zip file with the libraries.

Give it a whirl and let us know on IRC or in our forums if you have any problems.

Filed under: jcr, news, releases

ModeShape is

a lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc.

Use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

ModeShape used to be 'JBoss DNA'. It's the same project, same community, same license, and same software.

ModeShape

Topics

Follow

Get every new post delivered to your Inbox.