This is the first in a series of posts about specific features in the JCR API. We start with node types because they’re such an important and empowering feature of JCR, and critical to so many other features.
We are inundated with data stores: relational databases, file systems, repositories, document stores, or proprietary systems. Newer stores, like data grids and distributed (e.g., “NoSQL”) databases, are likely on the horizon (if not already in use). How do we write applications that use all this valuable information without copying it (horrors!) and without resorting to lots of different APIs? And how much work will it take to update our applications as new information becomes available or the existing information evolves?
Fortunately there are different approaches to federation, which means we get to pick the one that best suits our needs. One such approach is warehouses and ETL, but technically that’s copying and not federation. Another approach is using relational-based technologies like the Teiid project, which uses a relational engine to do the heavy lifting, and provides a virtual database complete with SQL, JDBC and ODBC support to give applications a way of interacting with the data using tables that mirror what the application wants/needs. It is a database – it just gets the data from other sources. This is perfect for some use cases, but for others the relational nature of the interaction is less than ideal.
ModeShape uses a graph-oriented approach that works well in cases where the information is hierarchical and/or has a structure that evolves over time. ModeShape is a JCR implementation that looks and behaves like a regular JCR repository, except that it federates in real-time some (or all) of the content from other systems. In fact, ModeShape doesn’t even have to federate any information, and in such cases it works just like any other JCR implementation (albeit with a wider selection of persistent storage options via its connectors).
ModeShape can do this because of the power and flexibility of the JCR API, which uses a graph of nodes and properties. These nodes form a simple tree, but use properties to create relationships or references to other nodes [1]. Here’s a simple representation:
This approach makes the JCR API very good at exposing information with varying and evolving structure, whether that information exists within the repository itself or defined by and housed in other external systems.
Of course, very generic data structures can have their own challenges. Flexible and abstract stores place few constraints on how you organize your data, but that means you need another way of describing or constraining the structure and shape of your information.
Node types
The JCR API solves this issue very nicely by using a simple but very powerful system of node types. Every node in a JCR repository has a primary type that defines the names, types, and characteristics of the properties and children that the node may (or must) have. Additionally, each node may have one or more mixin types that further define properties and children that beyond those defined by the primary type. You can add mixin types at any time, and can even remove them at any time as long as doing so doesn’t violate the primary type or remaining mixin types.
JCR node types dictate the kinds of properties that can (or must) exist on a node, and can constrain the property’s name, the number and type of values allowed, default values, constraints on the values, whether the property must exist, whether/how to change property, whether the values are queryable and searchable, and the kinds of query operators that apply to the values. All of these are optional, so it’s actually possible to define a node type that can allow any number of properties with any name and any values. JCR calls these property definitions without a name pattern/constraint “residual” properties, because they apply only if there isn’t a more applicable and specific “non-residual” property definition. Node types are also capable of dictating the names, types and order of child nodes. Node types can also define residual child node definitions for cases where a node can contain children with any name and type. Like residual property definitions, these are only used if there isn’t a child definition that is more specific. And node types support inheritance, so it’s possible to reuse and extend other node types. It is even possible to override and further constraint property and child definitions inherited from supertypes.
Know your types – and how to use them
Carefully selecting, defining, extending, and using node types in a JCR repository provides an incredible amount of control over your information and can let your information take on its natural shape. In cases where you do want to constraint the structure, use a primary type with no residual properties or children ensures that the nodes always fit the desired shape. In cases where flexibility is more important, use a primary type that allows any properties and any children (e.g., the “nt:unstructured” built-in node type).
Of course, most situations are probably somewhere in the middle, and this is where mixins shine. Create these nodes using a liberal primary type (like “nt:structured”), and then on a node-by-node basis add mixins defining various “facets” or “characteristics” needed to capture the desired information. For example, the “mix:created” mixin node type defines a “jcr:created” date property and a “jcr:createdBy” string property, and can be mixed into any node where there’s a need to store the “creation” information. This mixin can even be removed from a node without having to remove the properties!
Node types also play a critical role in JCR queries, because they allow forming sets of nodes that have a similar properties. These sets are naturally similar to the fundamental concepts in various query languages: relational tables, XML element types, and Java classes. For example, you could query all the “mix:created” nodes to find all nodes created within a certain period and order the results by the name of the creator.
Note types are also critically important in ModeShape because they describe the structure and semantics of the graph that ModeShape creates from the information in the underlying sources. And as the underlying information changes shape or meaning, the graph can adapt by altering the structure and node types.
Summary
We really just touched the surface of JCR node types, but hopefully we’ve given you a glimpse of how extremely powerful they make the JCR API. Node types make it possible to work with a very flexible graph system while controlling, describing, and understanding the shape of the information content in a JCR repository – even when this information lives in external systems.
[1] JSR-283 (aka “JCR 2.0”) takes this a step further by introducing “shared nodes” that share properties and children with other nodes. For example, if a node at the path “/a/b/c/d” is shared with the node at “/x/y/z”, then a property on the “d” node is also a property of the “z” node and a child of “d” also appears as a child of “z”. Thus, shared nodes make it possible for another node to appear in multiple places in the repository and have multiple paths.
Filed under: features, federation, jcr, repository