ModeShape

An open-source, federated content repository

Federate external data into ModeShape

I’m really excited to announce that ModeShape 3.1 (due out in a few days) will re-introduce the ability to federate data from external systems into ModeShape repositories. That might sound kind of esoteric, but let’s look at some simple scenarios that show how powerful this is.

federation

(Long-time readers might recall that ModeShape 2.x had federation, but due to time constraints we didn’t bring the feature along when we moved to the new architecture in 3.0. We’re now fixing that, except that federation in ModeShape 3 is massively improved compared to what it was in 2.x! In fact, federation in 3.0 is so different, you should probably forget everything you know about federation in ModeShape 2.x.)

Scenario 1: Federating files

Imagine that you have a ModeShape repository (aka database), and it contains all of the data that your application needs. Your application can upload files, but they get stored in ModeShape along with the rest of your content. But you also have a separate file system that is directly accessed and exposed by your web servers. You now want your application to allow users to browse those files and, perhaps, simply pick one so that your application can automatically create a link. Your application already can work with “nt:file” and “nt:folder” nodes, but since that other file system isn’t managed by ModeShape, you have to write new logic to access regular files and folders.

Federation changes this dramatically. You can have ModeShape connect to that separate file system and project the files and folders as “nt:file” and “nt:folder” nodes inside the existing ModeShape repository. In other words, ModeShape will act as though there is an “nt:folder” node inside the repository, but it actually is dynamically created because there is a folder on the separate file system. As your application accesses the name and children of that “nt:folder” node, ModeShape transparently and dynamically maps those requests onto the corresponding file system operations. So your application can continue to work with “nt:file” and “nt:folder” nodes, but ModeShape does all the work of really accessing the files and folders on the separate file system.

Scenario 2: Federating Git repositories

Consider another similar scenario in which the external file system is actually a Git repository, and you want to be able to navigate and access the files and folders in any commit, branch or tag. Again, you could change your application to directly access Git, but that’s quite a bit of work. After all, your application is already accessing most of its content directly from ModeShape.

With federation, ModeShape could access the Git repository and expose not only the files and folders as “nt:file” and “nt:folder” nodes, but it can also expose the Git-specific information on those files and folders, like what was the last commit that changed them. And, you’d also like to be able to navigate (as nodes) the commits (e.g., history), branches, and tags in the Git repository.

How does federation work?

The first thing to understand is that ModeShape does not copy the data from the external system into the repository. Instead, ModeShape (with the help of connectors) dynamically creates nodes upon demand to represent the external data. If the external data doesn’t change too often or is okay to be slightly out of date, then you can optionally have ModeShape cache the nodes in-memory. But either way, the external system remains the owner of its data.

Secondly, federation is transparent to clients. Once federation is configured, the repository’s regular content and federated content all looks to client applications like regular content.

Thirdly, a repository does not use federation by default; you have to configure it for each repository. To do that, a repository configuration must specify:

  1. how ModeShape is to communicate with the external system (e.g., which connector implementation is to be used)
  2. the properties that the connector needs to talk with a particular external system (e.g., an external source)
  3. where and how the data in the external system is to be projected into the repository

All of this is defined inside the “externalSources” area of a repository’s configuration file. Here’s a simple JSON repository configuration that defines one external source called “downloads” and another called “sourceCode”:

{
  "name" : "MyRepository",
  "workspaces" : {
    "predefined" : [ "ws1", "ws2" ],
    "initialContent" : {
      "default" : "resources/initialContent.xml"
    }
  },
  "externalSources" : {
    "downloads" : {
      "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
      "directoryPath" : "/opt/downloads",
      "readonly" : true,
      "cacheTtlSeconds" : 5,
      "projections" : [ "default:/files/downloads => /" ]
    },
    "sourceCode" : {
      "classname" : "org.modeshape.connector.git.GitConnector",
      "directoryPath" : "data/repo",
      "remoteName" : "origin,upstream",
      "queryableBranches" : "master",
      "cacheTtlSeconds" : 5,
      "projections" : [ "default:/sources/ => /" ]
    }
  }
}

Each external source is identified by a unique name that you assign, and specifies the name of the connector implementation class and other connector-specific properties. A connector is simply a subclass of the “org.modeshape.jcr.federation.spi.Connector” class that contains that logic of how to create nodes that represent external data and, optionally, how to update the external data based upon changes to the nodes. We’ve designed the SPI so that you can easily create your own subclasses of Connector (or ReadOnlyConnector if the connector should never update data in the external system).

Let’s look at this configuration file a bit more. The “downloads” external source (line 10) defines several other properties:

  • the “directoryPath” is the location on the local file system of the top-level directory that is to be accessed by the connector; we use an absolute path here, though relative paths also work.
  • the “readonly” property specifies that ModeShape should never update any of the files or folders on the file system (yes, the FileSystemConnector is capable of creating, updating, and deleting files and folders on the file system in response to applications creating, updating, or deleting the corresponding nodes in the repository).
  • the “cacheTtlSeconds” is the time in seconds (5 in our case) that the nodes created by the connector to represent external files/folders should be cached.
  • the “projections” field is an array of string values that define the paths of the “federated” nodes that will represent the objects in Git. Our value of “default:/files/downloads => /” means that the top-level directory of the external source (that is, the “/opt/downloads” folder) should be projected as a node at “/files/downloads” in the “default” workspace. (Note that we’re also specifying that the “default” workspace should be populated with nodes described by the “resources/initialContent.xml” file. It’s here that we’d define the node type and any properties for the “/files” node.)

The file system connector will create a structure that mirrors the files and folders on the file system. So if the “/opt/downloads” directory contained the following:

aircraft
aircraft/Boeing
aircraft/Boeing/747.jpg
aircraft/Boeing/787.jpg
aircraft/Airbus
aircraft/Airbus/A380.jpg
aircraft/Airbus/A380.jpg
aircraft/Airbus/A320.jpg

then we would then have the following nodes inside the “default” workspace:

/files   (primary type “nt:unstructured”)
/files/downloads   (primary type “nt:folder”)
/files/downloads/aircraft   (primary type “nt:folder”, represents “/opt/downloads/aircraft”)
/files/downloads/aircraft/Boeing   (primary type “nt:folder”, represents “/opt/downloads/aircraft/Boeing”)
/files/downloads/aircraft/Boeing/747.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Boeing/747.png”)
/files/downloads/aircraft/Boeing/787.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Boeing/787.png”)
/files/downloads/aircraft/Airbus   (primary type “nt:folder”, represents “/opt/downloads/aircraft/Airbus”)
/files/downloads/aircraft/Airbus/A380.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Airbus/A380.png”)
/files/downloads/aircraft/Airbus/A320.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Airbus/A320.png”)

Note how the “/files” and “/files/downloads” nodes exist in the workspace, but the “/files/downloads/aircraft” node is dynamically projected to mirror the “/opt/downloads/aircraft” folder. If the “/opt/downloads/aircraft” folder were to be removed, then the “/files/downloads/aircraft” node would automatically be removed as well.

The “sourceCode” external source is pretty similar, but it access a local Git repository:

  • the “classname” specifies that the “org.modeshape.connector.git.GitConnector” connector implementation be used. This class will be included in ModeShape 3.1.0.Final. This connector is read-only.
  • the “directoryPath” is the location on the local file system of the top-level directory that contains a valid Git repository (e.g., it contains a “.git” directory); we use a relative path here.
  • the “remotename” is the name (or comma-separated list of names) of the remote(s).
  • the “cacheTtlSeconds” is the time in seconds (5 in our case) that the nodes created by the connector to represent Git’s files, folders, commits, tags, and branches should be cached.
  • the “projections” field specifies where in the repository the Git nodes should appear. Our value of “default:/sources => /” means that the top-level directory of the external source (that is, the Git repository folder) should be projected as a node at “/sources” in the “default” workspace.

The Git connector doesn’t really work with a local working directory of the Git repository. Instead, it basically exposes all of the commits, branches, and tags (plus the files and folders in each). It does this by mapping Git functionality onto some special nodes:

  • The “branches” node is a container under which all branches can be found.
  • The “tags” node is a container under which all tags can be found.
  • The “commits” node is a container under which all commits appear, with the most recent commits appearing first in the children.
  • The “branches/{branchName}” nodes represent the information about each branch, including the commit ID and references to the node representing the commit.
  • The “tags/{tagName}” nodes represent the information about each tag, including the commit ID and references to the node representing the commit.
  • The “commits/{branchOrTagOrCommit}” nodes show the history of commits for the given branch/tag/commit.
  • The “commits/{branchOrTagOrCommit}/{commitId}” shows the details of a specific commit in the history of a particular branch/tag/commit.
  • The “commit/{branchOrTagOrCommit}” shows the details of the specified commit
  • The “tree/{branchOrTagOrCommit}” is a container for the files and folders within the specified branch, tag or commit.

Thus in our repository, all the Git information is projected under “/sources”, so a number of nodes would appear in our repository:

  • A node representing each branch appears as children under “/sources/branches” (e.g., “/sources/branches/master”)
  • A node representing each tag appears as children under “/sources/tags” (e.g., “/sources/tags/release-1.0”)
  • A node representing each commit appears as children under “/sources/commits” (e.g., “/sources/commits/bbfa3f3d76b0…”)
  • A node structure (with “nt:file” and “nt:folder” descendant nodes) representing the workspace (with its files/folders) for each commit, branch and tag appears as children under “/sources/tree” (e.g., “/sources/tree/master/pom.xml”)

By the way, you can control with the projections which subset of the nodes exposed by the connector should be projected into the repository. For example, if you wanted to expose only a specific branch (e.g., “master”) in the Git repository under the “/sources” node, you could change the projection rule to be “default:/sources => /tree/master”.

Programmatically creating projections

ModeShape’s public API now contains a new FederationManager class that can be used to programmatically create and remove projections. However, external sources still must be configured via the JSON configuration file.

Custom connectors

As we mentioned earlier, we provide two connectors out-of-the-box in 3.1. We do plan to add a few more, but we’ve always expected the some developers would want to create their own custom connectors. Hopefully our SPI is simple enough that doing so is very straightforward. So if you’re interested in this, please take a look and give us feedback on our forums. The SPI should be pretty stable, but if we find some glaring problems, we may need to change the SPI slightly before the 3.2 release; after that, however, we’ll lock down the SPI.

Summary

We described a few ways that federating content from external sources into your ModeShape repository can be quite useful, and we also took a very detailed look into how federation is configured. In reality, however, we just skimmed the surface of what is actually possible with ModeShape federation.

Stay tuned for more on the 3.1 release later this week.

Advertisement

Filed under: features, federation, jcr

Presentation: An Overview of ModeShape

Here’s a brand new presentation that provides a high-level overview of ModeShape and attempts to answer two pressing questions:

  1. Why use JCR?
  2. Why use ModeShape?

Filed under: federation, jcr, presentation, repository, rest

Announcing ModeShape 2.1

Once again I’m pleased to announce a new release of ModeShape. This time, it’s ModeShape 2.1, and it’s available in the JBoss Maven repository (under the “org.modeshape” group ID) and on our downloads page. As usual, we’ve updated our Getting Started Guide, Reference Guide , and JavaDoc.

ModeShape 2.1 introduces several major new features:

  • Clustering – It’s now possible to create a cluster of ModeShape instances running on different processes. This makes it easy to scale ModeShape to handle more load, and it means that sessions work the same (including getting the same events) no matter in which process they’re running. ModeShape clustering uses the powerful, flexible and mature JGroups library to handle all network communication within the cluster. JGroups provides a wealth of capabilities, including automatically detecting new engines in the cluster (called discovery), reliable multicast communication, and automatic determination of the master node in the cluster. JGroups has a flexible protocol stack, works across firewalls, WANs and LANs, and supports multiple transport protocols, failure detection, reliable unicast and multicast message transmission, and encryption. But clustering is not enabled by default – if you want to use it, you need to enable it.
  • Shareable nodes – This optional JCR 2.0 API feature allows a node that exists under one parent to be shared under multiple other nodes. These are similar to symbolic links in a *nix file system, and can be pretty powerful when you need it.
  • ModeShape Kit for JBoss AS – Deploying ModeShape to a JBoss Application Server has just gotten very easy: download and unzip into a profile. ModeShape will run as a service in JBoss AS, so you can simply deploy applications that use the standard JCR 2.0 API to find and access their javax.jcr.Repository instance. You also get the WebDAV and RESTful services, and even a technology preview of a monitoring, alerting, and administration plugin for JOPR.

There are other smaller features, improvements, and quite a few bug fixes. See the release notes for the complete list.

ModeShape is  lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc. It implements all of the required JCR 2.0 features (repository acquisition, authentication, reading/navigating, query, export, node type discovery, and permissions and capability checking) and most of the optional JCR 2.0 features (writing, import, observation, workspace management, versioning, locking, node type management, same-name siblings, orderable child nodes, and shareable nodes). That means you can use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

Many thanks to the ModeShape community of users and contributors, who have (once again) shown what a fantastic and active community can accomplish in a very short time. After all, it’s been just over 4 weeks since we released ModeShape 2.0!

Give ModeShape 2.1 a try, and let us know what you think!

Filed under: features, federation, jcr, news, repository

ModeShape 2.0 is here

The ModeShape project proudly announces that version 2.0 is now available and ready for use. As usual, the artifacts are in the JBoss Maven repository (under the “org.modeshape” group ID) and on our downloads page, and we’ve updated our Getting Started Guide, Reference Guide , and JavaDoc. ModeShape 2.0 also includes bug fixes and improvements; see the release notes complete list of bug fixes, new features, tasks, and other changes.

This is a significant milestone for us, because ModeShape 2.0 now implements the JCR 2.0 specification (JSR-283). Specifically, ModeShape supports all the JCR 2.0 required features:

  • repository acquisition
  • authentication
  • reading/navigating
  • query
  • export
  • node type discovery
  • permissions and capability checking

and most of the JCR 2.0 optional features:

  • writing
  • import
  • observation
  • workspace management
  • versioning
  • locking
  • node type management
  • same-name siblings
  • orderable child nodes
  • mix:etag, mix:created and mix:lastModified mixins with autocreated properties

ModeShape 2.0 supports five query languages: the JCR-SQL2 and JCR-QOM query languages defined in JSR-283, the XPath and JCR-SQL languages defined in JSR-170 but deprecated in JSR-283, and a search-engine-like language that is actually just the full-text search expression grammar used in the CONTAINS(…) function of the JCR-SQL2 language. See our documentation for details.

As with earlier releases, ModeShape repositories can be traditional self-contained stores, or they can federate and unify content from multiple stores, including files systems, databases, data grids, other JCR repositories, or other systems (using custom connectors). Plus, ModeShape is also able to automatically extract and store useful content from files you upload into the repository using its library of sequencers, making that information much more accessible and searchable than if it remains locked up inside the stored files. Finally ModeShape provides WebDAV and RESTful services so non-Java and remote clients can access the repository content.

The ModeShape 2.0 release has not yet passed the JSR-283 TCK. ModeShape passes nearly all (99%) of the TCK tests, but we’ve identified several issues in the TCK tests (see JCR-2648JCR-2661JCR-2662, and JCR-2663). Once an updated TCK becomes available, we’ll get our certification.

If you’re already using JCR 1.0, consider switching to JCR 2.0 and ModeShape. The new features and enhancements are much improved over JCR 1.0. And we’ve created a high-level migration guide to help you understand you may and may not have to change in your application.

Many, many thanks to the ModeShape community members. Our users and contributors are simply stellar! Congratulations!

Filed under: features, federation, jcr, news, repository

ModeShape 1.2 adds JCR-SQL and lots of fixes and improvements

I’m very pleased to announce that ModeShape 1.2 is now available for download and in the new JBoss Maven repository. If you haven’t already switched your Maven settings to this new repository, be sure to do so.

This release brings support for JCR-SQL query language and a number of fixes and improvements. This means that ModeShape implements all of the JCR 1.0 Level 1 and Level 2 features plus the optional locking, observation, query, and versioning features. And, there’s no better time to switch from those older and crustier JCR implementations that can’t do federation or use a variety of back-end content stores (like Infinispan).

For details on the changes in this release, see our release notes. And, as usual, we’ve updated our Getting Started and Reference Guides as well as our JavaDoc. If you have any questions, please ask. And we’re always interested in new contributors, so if you’re interested please get involved.

Many kudos to the ModeShape community of users and contributors. You guys rock!

Filed under: features, federation, jcr, repository

ModeShape 1.0.0.Final

I’m very happy to announce that ModeShape 1.0.0.Final is now available. With this release, ModeShape now implements all JCR Level 1 and Level 2 features and the optional locking and observation features! [1] And as with our 1.0.0.Beta1 release, ModeShape supports querying repository content using the JCR-SQL2 query language defined by the JSR-283 specification.

This release also includes quite a few fixes and minor improvements. For example, our JARs are also now OSGi compliant, making it easier to deploy ModeShape into an OSGi-compliant container.

As usual, it’s in the JBoss Maven repository and in our project’s downloads area. Of course, our Getting Started guide and Reference Guide are great places to see. And we always have JavaDocs and release notes.

Last but not least, a huge thank you to our fantastic community of users and contributors, for all the help with testing, fixing issues, and implementing these features. You guys all rock!


[1] Note that ModeShape has not yet been certified as being compliant with JCR 1.0, but we plan on attaining this certification in the very near future.

Filed under: features, federation, jcr, news, repository

The shape of your information

This is the first in a series of posts about specific features in the JCR API. We start with node types because they’re such an important and empowering feature of JCR, and critical to so many other features.

We are inundated with data stores: relational databases, file systems, repositories, document stores, or proprietary systems. Newer stores, like data grids and distributed (e.g., “NoSQL”) databases, are likely on the horizon (if not already in use). How do we write applications that use all this valuable information without copying it (horrors!) and without resorting to lots of different APIs? And how much work will it take to update our applications as new information becomes available or the existing information evolves?

Fortunately there are different approaches to federation, which means we get to pick the one that best suits our needs.  One such approach is warehouses and ETL, but technically that’s copying and not federation. Another approach is using relational-based technologies like the Teiid project, which uses a relational engine to do the heavy lifting, and provides a virtual database complete with SQL, JDBC and ODBC support to give applications a way of interacting with the data using tables that mirror what the application wants/needs. It is a database – it just gets the data from other sources. This is perfect for some use cases, but for others the relational nature of the interaction is less than ideal.

ModeShape uses a graph-oriented approach that works well in cases where the information is hierarchical and/or has a structure that evolves over time. ModeShape is a JCR implementation that looks and behaves like a regular JCR repository, except that it federates in real-time some (or all) of the content from other systems. In fact, ModeShape doesn’t even have to federate any information, and in such cases it works just like any other JCR implementation (albeit with a wider selection of persistent storage options via its connectors).

ModeShape federated repository

ModeShape can do this because of the power and flexibility of the JCR API, which uses a graph of nodes and properties. These nodes form a simple tree, but use properties to create relationships or references to other nodes [1]. Here’s a simple representation:

Graph nodes and properties

This approach makes the JCR API very good at exposing information with varying and evolving structure, whether that information exists within the repository itself or defined by and housed in other external systems.

Of course, very generic data structures can have their own challenges.  Flexible and abstract stores place few constraints on how you organize your data, but that means you need another way of describing or constraining the structure and shape of your information.

Node types

The JCR API solves this issue very nicely by using a simple but very powerful system of node types. Every node in a JCR repository has a primary type that defines the names, types, and characteristics of the properties and children that the node may (or must) have.  Additionally, each node may have one or more mixin types that further define properties and children that beyond those defined by the primary type. You can add mixin types at any time, and can even remove them at any time as long as doing so doesn’t violate the primary type or remaining mixin types.

JCR node types dictate the kinds of properties that can (or must) exist on a node, and can constrain the property’s name, the number and type of values allowed, default values, constraints on the values, whether the property must exist, whether/how to change property, whether the values are queryable and searchable, and the kinds of query operators that apply to the values. All of these are optional, so it’s actually possible to define a node type that can allow any number of properties with any name and any values. JCR calls these property definitions without a name pattern/constraint “residual” properties, because they apply only if there isn’t a more applicable and specific “non-residual” property definition. Node types are also capable of dictating the names, types and order of child nodes.  Node types can also define residual child node definitions for cases where a node can contain children with any name and type.  Like residual property definitions, these are only used if there isn’t a child definition that is more specific.  And node types support inheritance, so it’s possible to reuse and extend other node types. It is even possible to override and further constraint property and child definitions inherited from supertypes.

Know your types – and how to use them

Carefully selecting, defining, extending, and using node types in a JCR repository provides an incredible amount of control over your information and can let your information take on its natural shape. In cases where you do want to constraint the structure, use a primary type with no residual properties or children ensures that the nodes always fit the desired shape. In cases where flexibility is more important, use a primary type that allows any properties and any children (e.g., the “nt:unstructured” built-in node type).

Of course, most situations are probably somewhere in the middle, and this is where mixins shine.  Create these nodes using a liberal primary type (like “nt:structured”), and then on a node-by-node basis add mixins defining various “facets” or “characteristics” needed to capture the desired information.  For example, the “mix:created” mixin node type defines a “jcr:created” date property and a “jcr:createdBy” string property, and can be mixed into any node where there’s a need to store the “creation” information. This mixin can even be removed from a node without having to remove the properties!

Node types also play a critical role in JCR queries, because they allow forming sets of nodes that have a similar properties. These sets are naturally similar to the fundamental concepts in various query languages: relational tables, XML element types, and Java classes. For example, you could query all the “mix:created” nodes to find all nodes created within a certain period and order the results by the name of the creator.

Note types are also critically important in ModeShape because they describe the structure and semantics of the graph that ModeShape creates from the information in the underlying sources. And as the underlying information changes shape or meaning, the graph can adapt by altering the structure and node types.

Summary

We really just touched the surface of JCR node types, but hopefully we’ve given you a glimpse of how extremely powerful they make the JCR API. Node types make it possible to work with a very flexible graph system while controlling, describing, and understanding the shape of the information content in a JCR repository – even when this information lives in external systems.


[1] JSR-283 (aka “JCR 2.0”) takes this a step further by introducing “shared nodes” that share properties and children with other nodes. For example, if a node at the path “/a/b/c/d” is shared with the node at “/x/y/z”, then a property on the “d” node is also a property of the “z” node and a child of “d” also appears as a child of “z”. Thus, shared nodes make it possible for another node to appear in multiple places in the repository and have multiple paths.

Filed under: features, federation, jcr, repository

ModeShape isn’t your father’s JCR

It’s true: ModeShape is a JCR implementation that is pretty new.  Why on earth would we create another JCR implementation when other implementations have been around for so long?

For many years, the assumption in the persistent storage world is that each store should own all the information. Database vendors tried to sell their databases and claim how easy it would be to migrate all of your data into their system. ETL vendors talk about how to load up a data warehouse with all the useful information you need, so it’s all in once place. Document storage systems and other content management systems worked wonders, as long as everything was in their repository. And the JCR implementations followed suit by implementing the JCR API on top of silo repositories (that often used a relational database under the covers).

We see the world differently. We understand that you already have too many information stores, be they databases, file systems, repositories, document stores, or proprietary systems. We believe you shouldn’t need separate APIs to access all of it, and that you shouldn’t have to move all this information into one big silo (and rewrite the applications you already have).  Instead, your databases and repositories should federate all this existing information and provide the view of your information that your applications need [1].  And you should be able to write applications that can take advantage of the information you already have where it is today. And those applications should have to change as little as possible when you have new or different information tomorrow.

It all boils down to using the JCR API to access a variety of information in all kinds of places. The JCR API is an excellent abstraction with powerful features that make it very easy to work with the information in the shape it wants to be today while easily adapting to the shape it will take tomorrow.  This is what the ModeShape project is doing, and here’s how we’re doing it:

Killer Feature #1: Connectors

Implementing the full JCR API on top of multiple kinds of systems would be expensive, time consuming, and painful. We’ve created a simple connector framework that is simple enough that its easy to write new connectors, yet efficient enough to do many, many operations with one (potentially remote) call. Your applications can use ModeShape to make these existing system look and behave like a real JCR repository.

ModeShape JCR and connectors

We’ve begun building a library of connectors that allow us to store content on a data grid (Infinispan), on a distributed cache (JBoss Cache), in relational databases (via Hibernate), and in-memory within the Java process (for small transient use cases).  Our library also includes connectors that access existing file systems, SVN repositories, and even the schemas from existing JDBC databases.

ModeShape connector library

Of course, we’re already working on more connectors, including a connector to other JCR repositories.  And we envision lots of connectors, including connectors to other CMIS repositories, version control systems (like Git and CVS), document databases (like CouchDB and Cassandra), distributed file systems, customer management systems, Maven repositories, LDAP directories, and existing databases. Just to name a few. And we designed the connector framework so that you can write your own.

Killer Feature #2: Federation

Remember all those different silos of information? Using JCR to access each of these is pretty interesting, but what’s really killer is federating the information from multiple sources into a single virtual repository. To your applications, ModeShape looks and behaves like a regular JCR repository. They use the standard JCR API to navigate, search, create, change, and listen for changes in the content. But under the covers, ModeShape is able to federate content from multiple back-end systems using our connectors, ensuring that the repository content stays up-to-date and in-sync with those systems.  And those external systems can continue “owning” the information, and existing applications can continue using them, but new applications using ModeShape can easily access the unified and integrated information.

ModeShape federation

Killer Feature #3: Sequencing

A lot of repositories exist to store files and other important artifacts, and contained in all those files is a ton of very valuable information. Sure, the repository might process them for searching, but that just extracts the words and phrases. Or, your applications can read the files and process them one at a time. ModeShape sequencers are able to unlock this valuable structured information and put it back into the repository, where it’s accessible via navigation, queries, and searches.

Sequencing is fully automated and done in the background. Simply configure the sequencer and start uploading content.  ModeShape has a library of sequencers, including support for CND, DDL, XML, ZIP, MP3, images, Java source, Java class, text files (character-separated and fixed-width), and Microsoft Office® documents. Of course, we designed it so that you can write your own sequencers, too.

Killer Feature #4: JCR-SQL2

The JCR API provides a single mechanism for querying the repository content, using a variety of query languages. JSR-170 (aka “JCR 1.0”) requires repositories support the JCR XPath language (a subset of XPath 2.0), and defines the optional language called “JCR-SQL” that is a simple subset of SQL SELECT statements. JSR-283 (aka “JCR 2.0”) deprecates both XPath and JCR-SQL, and instead mandates support for an improved “JCR-SQL2” language that is better and more powerful adaptation of SQL.

ModeShape currently supports JCR 1.0, and thus it does support the XPath query language defined by the spec. However, ModeShape also supports the newer JCR-SQL2 query language, along with several major enhancements [2]. In fact, our enhanced JCR-SQL2 is so powerful that ModeShape implements the XPath support by translating XPath expressions into JCR-SQL2 queries.

Not your father’s JCR

Traditionally, applications that use JCR are working with content repositories and content management systems. But chances are you have a lot of valuable information that your JCR repository can’t get to. And you’ve probably come to really like the JCR API, and can imagine how nice it would be to use it to access all that existing information.

So chances are, you need ModeShape. Or at least you need to give it a try. After all, ModeShape is not your father’s JCR. It’s better. Much better.


[1] Federation is in our DNA. The ModeShape project actually came out of the team that built the MetaMatrix commercial data integration and federation engine. MetaMatrix was the first true EII product that allowed applications to access unified and integrated data housed in multiple disparate back-end systems through a single, scalable, virtual database using SQL via JDBC and ODBC. MetaMatrix was acquired by Red Hat in 2007, and seeded the Teiid and Teiid Designer open source projects.

[2] Though not included in JSR-283’s JCR-SQL2, ModeShape adds support for: all the JOIN operators; UNION, INTERSECT and UNION [ALL] set operations, removal of duplicates via SELECT DISTINCT; LIMIT and OFFSET clauses; new DEPTH and PATH dynamic operands for use in constraint clauses; constraints using IN and NOT IN and BETWEEN clauses; and arithmetic operations on dynamic operands. For details, see our Reference Guide.

Filed under: features, federation, jcr, repository

More on federated repositories

Interesting post by Mark Little, which talks about the benefits of a federated repository and compares it with the way in which the human brain stores stuff. Cool stuff. DNS works in a similar way. Now throw in how Git works, and things get really interesting.

I really hope that this is where JBoss DNA federation is heading. Go to your DNA federated repository, and let it get the information from the system that has it. Obviously that’s not the complete federation story, so we have more to do. For example, we need more connectors so that a DNA federated repository can access information from other DNA or JCR repositories (in addition to other systems).

But I think that our core architecture and caching strategies will let us grow into these much larger and much more interesting scenarios.

Filed under: features, federation, repository, techniques

JBoss DNA, a unified repository

DZone has posted an interview with Sacha Labourey, CTO of Red Hat Middlware, in which he discusses the upcoming JBoss AS 5 and some of the new and updated JBoss enterprise services. He mentions JBoss DNA and the importance and benefits of having a unified repository, starting with changes in JBoss AS 5:

“We have completely decoupled the metadata management from the application server. It’s something we call the profile service, and this profile service, as you can imagine, is fully pluggable. It could be simply a set of files, it can be a database, it can be a JCR repository. And the idea here is to end up with a unified way to store metadata for one node or a cluster or a farm of JBoss instances, and very quickly provision new instances.”

Having a unified repository not only provides a better way to manage configuration information, but it also integrates access to other systems and other information that are needed by the middleware and the business components:

“I think, in the future, it’s going to be increasingly important to further split some of the notions, which today are still linked way too much as part of a monolithic entity. I think, metadata is something that will be very important. What I call metadata, you know, it’s a non-fancy name to describe everything that is being done in the various repositories out there.

When I look at the way that you configure an operating system, or the way you populate an LDAP tree, the way you get access to the topology of your network, everything you need to manage; when I look at the configuration of an application server on ESB, or when I look at the various steps which need to go in an application before being validated and being able to go to production, all of that is just metadata, whatever fancy name you want to put on that.

Being able to extract all of that information as part of a unified metadata repository, I think is going to be a great step forward to enable the kind of provisioning we will need in the next years. Today, I think, we go through way too many ad hoc and manual processes to get there. We’re not going to get the kind of economy of scale by doing so. We need to radically go to another level.”

This gets to the heart of why a unified repository is needed, and why we’re trying to crack that nut with JBoss DNA. There are too many places where metadata exists to use one-off and ad hoc ways to get at that information. And since that information changes all the time, you can’t suck it into a monolythic repository – you have to provide real-time access, which is exactly what we’re doing with JBoss DNA federation: use JCR to access to the wealth of metadata and content from all kinds of systems. But more on that in a future post.

I’ve just touched on the aspects of Sacha’s interview that related to JBoss DNA. If you have some time, watch the interview to get the whole JBoss middleware picture.

Filed under: features, federation, jcr, repository

ModeShape is

a lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc.

Use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

ModeShape used to be 'JBoss DNA'. It's the same project, same community, same license, and same software.

ModeShape

Topics