ModeShape

An open-source, federated content repository

Federate external data into ModeShape

I’m really excited to announce that ModeShape 3.1 (due out in a few days) will re-introduce the ability to federate data from external systems into ModeShape repositories. That might sound kind of esoteric, but let’s look at some simple scenarios that show how powerful this is.

federation

(Long-time readers might recall that ModeShape 2.x had federation, but due to time constraints we didn’t bring the feature along when we moved to the new architecture in 3.0. We’re now fixing that, except that federation in ModeShape 3 is massively improved compared to what it was in 2.x! In fact, federation in 3.0 is so different, you should probably forget everything you know about federation in ModeShape 2.x.)

Scenario 1: Federating files

Imagine that you have a ModeShape repository (aka database), and it contains all of the data that your application needs. Your application can upload files, but they get stored in ModeShape along with the rest of your content. But you also have a separate file system that is directly accessed and exposed by your web servers. You now want your application to allow users to browse those files and, perhaps, simply pick one so that your application can automatically create a link. Your application already can work with “nt:file” and “nt:folder” nodes, but since that other file system isn’t managed by ModeShape, you have to write new logic to access regular files and folders.

Federation changes this dramatically. You can have ModeShape connect to that separate file system and project the files and folders as “nt:file” and “nt:folder” nodes inside the existing ModeShape repository. In other words, ModeShape will act as though there is an “nt:folder” node inside the repository, but it actually is dynamically created because there is a folder on the separate file system. As your application accesses the name and children of that “nt:folder” node, ModeShape transparently and dynamically maps those requests onto the corresponding file system operations. So your application can continue to work with “nt:file” and “nt:folder” nodes, but ModeShape does all the work of really accessing the files and folders on the separate file system.

Scenario 2: Federating Git repositories

Consider another similar scenario in which the external file system is actually a Git repository, and you want to be able to navigate and access the files and folders in any commit, branch or tag. Again, you could change your application to directly access Git, but that’s quite a bit of work. After all, your application is already accessing most of its content directly from ModeShape.

With federation, ModeShape could access the Git repository and expose not only the files and folders as “nt:file” and “nt:folder” nodes, but it can also expose the Git-specific information on those files and folders, like what was the last commit that changed them. And, you’d also like to be able to navigate (as nodes) the commits (e.g., history), branches, and tags in the Git repository.

How does federation work?

The first thing to understand is that ModeShape does not copy the data from the external system into the repository. Instead, ModeShape (with the help of connectors) dynamically creates nodes upon demand to represent the external data. If the external data doesn’t change too often or is okay to be slightly out of date, then you can optionally have ModeShape cache the nodes in-memory. But either way, the external system remains the owner of its data.

Secondly, federation is transparent to clients. Once federation is configured, the repository’s regular content and federated content all looks to client applications like regular content.

Thirdly, a repository does not use federation by default; you have to configure it for each repository. To do that, a repository configuration must specify:

  1. how ModeShape is to communicate with the external system (e.g., which connector implementation is to be used)
  2. the properties that the connector needs to talk with a particular external system (e.g., an external source)
  3. where and how the data in the external system is to be projected into the repository

All of this is defined inside the “externalSources” area of a repository’s configuration file. Here’s a simple JSON repository configuration that defines one external source called “downloads” and another called “sourceCode”:

{
  "name" : "MyRepository",
  "workspaces" : {
    "predefined" : [ "ws1", "ws2" ],
    "initialContent" : {
      "default" : "resources/initialContent.xml"
    }
  },
  "externalSources" : {
    "downloads" : {
      "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
      "directoryPath" : "/opt/downloads",
      "readonly" : true,
      "cacheTtlSeconds" : 5,
      "projections" : [ "default:/files/downloads => /" ]
    },
    "sourceCode" : {
      "classname" : "org.modeshape.connector.git.GitConnector",
      "directoryPath" : "data/repo",
      "remoteName" : "origin,upstream",
      "queryableBranches" : "master",
      "cacheTtlSeconds" : 5,
      "projections" : [ "default:/sources/ => /" ]
    }
  }
}

Each external source is identified by a unique name that you assign, and specifies the name of the connector implementation class and other connector-specific properties. A connector is simply a subclass of the “org.modeshape.jcr.federation.spi.Connector” class that contains that logic of how to create nodes that represent external data and, optionally, how to update the external data based upon changes to the nodes. We’ve designed the SPI so that you can easily create your own subclasses of Connector (or ReadOnlyConnector if the connector should never update data in the external system).

Let’s look at this configuration file a bit more. The “downloads” external source (line 10) defines several other properties:

  • the “directoryPath” is the location on the local file system of the top-level directory that is to be accessed by the connector; we use an absolute path here, though relative paths also work.
  • the “readonly” property specifies that ModeShape should never update any of the files or folders on the file system (yes, the FileSystemConnector is capable of creating, updating, and deleting files and folders on the file system in response to applications creating, updating, or deleting the corresponding nodes in the repository).
  • the “cacheTtlSeconds” is the time in seconds (5 in our case) that the nodes created by the connector to represent external files/folders should be cached.
  • the “projections” field is an array of string values that define the paths of the “federated” nodes that will represent the objects in Git. Our value of “default:/files/downloads => /” means that the top-level directory of the external source (that is, the “/opt/downloads” folder) should be projected as a node at “/files/downloads” in the “default” workspace. (Note that we’re also specifying that the “default” workspace should be populated with nodes described by the “resources/initialContent.xml” file. It’s here that we’d define the node type and any properties for the “/files” node.)

The file system connector will create a structure that mirrors the files and folders on the file system. So if the “/opt/downloads” directory contained the following:

aircraft
aircraft/Boeing
aircraft/Boeing/747.jpg
aircraft/Boeing/787.jpg
aircraft/Airbus
aircraft/Airbus/A380.jpg
aircraft/Airbus/A380.jpg
aircraft/Airbus/A320.jpg

then we would then have the following nodes inside the “default” workspace:

/files   (primary type “nt:unstructured”)
/files/downloads   (primary type “nt:folder”)
/files/downloads/aircraft   (primary type “nt:folder”, represents “/opt/downloads/aircraft”)
/files/downloads/aircraft/Boeing   (primary type “nt:folder”, represents “/opt/downloads/aircraft/Boeing”)
/files/downloads/aircraft/Boeing/747.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Boeing/747.png”)
/files/downloads/aircraft/Boeing/787.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Boeing/787.png”)
/files/downloads/aircraft/Airbus   (primary type “nt:folder”, represents “/opt/downloads/aircraft/Airbus”)
/files/downloads/aircraft/Airbus/A380.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Airbus/A380.png”)
/files/downloads/aircraft/Airbus/A320.jpg   (primary type “nt:file”, represents “/opt/downloads/aircraft/Airbus/A320.png”)

Note how the “/files” and “/files/downloads” nodes exist in the workspace, but the “/files/downloads/aircraft” node is dynamically projected to mirror the “/opt/downloads/aircraft” folder. If the “/opt/downloads/aircraft” folder were to be removed, then the “/files/downloads/aircraft” node would automatically be removed as well.

The “sourceCode” external source is pretty similar, but it access a local Git repository:

  • the “classname” specifies that the “org.modeshape.connector.git.GitConnector” connector implementation be used. This class will be included in ModeShape 3.1.0.Final. This connector is read-only.
  • the “directoryPath” is the location on the local file system of the top-level directory that contains a valid Git repository (e.g., it contains a “.git” directory); we use a relative path here.
  • the “remotename” is the name (or comma-separated list of names) of the remote(s).
  • the “cacheTtlSeconds” is the time in seconds (5 in our case) that the nodes created by the connector to represent Git’s files, folders, commits, tags, and branches should be cached.
  • the “projections” field specifies where in the repository the Git nodes should appear. Our value of “default:/sources => /” means that the top-level directory of the external source (that is, the Git repository folder) should be projected as a node at “/sources” in the “default” workspace.

The Git connector doesn’t really work with a local working directory of the Git repository. Instead, it basically exposes all of the commits, branches, and tags (plus the files and folders in each). It does this by mapping Git functionality onto some special nodes:

  • The “branches” node is a container under which all branches can be found.
  • The “tags” node is a container under which all tags can be found.
  • The “commits” node is a container under which all commits appear, with the most recent commits appearing first in the children.
  • The “branches/{branchName}” nodes represent the information about each branch, including the commit ID and references to the node representing the commit.
  • The “tags/{tagName}” nodes represent the information about each tag, including the commit ID and references to the node representing the commit.
  • The “commits/{branchOrTagOrCommit}” nodes show the history of commits for the given branch/tag/commit.
  • The “commits/{branchOrTagOrCommit}/{commitId}” shows the details of a specific commit in the history of a particular branch/tag/commit.
  • The “commit/{branchOrTagOrCommit}” shows the details of the specified commit
  • The “tree/{branchOrTagOrCommit}” is a container for the files and folders within the specified branch, tag or commit.

Thus in our repository, all the Git information is projected under “/sources”, so a number of nodes would appear in our repository:

  • A node representing each branch appears as children under “/sources/branches” (e.g., “/sources/branches/master”)
  • A node representing each tag appears as children under “/sources/tags” (e.g., “/sources/tags/release-1.0″)
  • A node representing each commit appears as children under “/sources/commits” (e.g., “/sources/commits/bbfa3f3d76b0…”)
  • A node structure (with “nt:file” and “nt:folder” descendant nodes) representing the workspace (with its files/folders) for each commit, branch and tag appears as children under “/sources/tree” (e.g., “/sources/tree/master/pom.xml”)

By the way, you can control with the projections which subset of the nodes exposed by the connector should be projected into the repository. For example, if you wanted to expose only a specific branch (e.g., “master”) in the Git repository under the “/sources” node, you could change the projection rule to be “default:/sources => /tree/master”.

Programmatically creating projections

ModeShape’s public API now contains a new FederationManager class that can be used to programmatically create and remove projections. However, external sources still must be configured via the JSON configuration file.

Custom connectors

As we mentioned earlier, we provide two connectors out-of-the-box in 3.1. We do plan to add a few more, but we’ve always expected the some developers would want to create their own custom connectors. Hopefully our SPI is simple enough that doing so is very straightforward. So if you’re interested in this, please take a look and give us feedback on our forums. The SPI should be pretty stable, but if we find some glaring problems, we may need to change the SPI slightly before the 3.2 release; after that, however, we’ll lock down the SPI.

Summary

We described a few ways that federating content from external sources into your ModeShape repository can be quite useful, and we also took a very detailed look into how federation is configured. In reality, however, we just skimmed the surface of what is actually possible with ModeShape federation.

Stay tuned for more on the 3.1 release later this week.

Filed under: features, federation, jcr

4 Responses

  1. Kev d'Salvo says:

    Thank you Randall!

    What a pleasant surprise ! :-)

  2. Serge Libotte says:

    Interesting news.
    Can Modeshape federate another JCR?

    • Randall says:

      It can’t out of the box just yet, though we have that on our roadmap. The capability is there, but we need a connector. Currently connectors can register node types and namespaces at any time, but we also plan to enhance the federation system so that we can push queries and versioning operations down to the connector.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ModeShape is

a lightweight, fast, pluggable, open-source JCR repository that federates and unifies content from multiple systems, including files systems, databases, data grids, other repositories, etc.

Use the JCR API to access the information you already have, or use it like a conventional JCR system (just with more ways to persist your content).

ModeShape used to be 'JBoss DNA'. It's the same project, same community, same license, and same software.

ModeShape

Topics

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: