IDRegistry

IDRegistry is a generic object generator and identity map for Ruby.

This document provides an in-depth introduction to using IDRegistry. For a quick introduction, see the README.

Identity Maps and Registries

IDRegistry combines two patterns from Martin Fowler's book "Patterns of Enterprise Application Architecture": the Identity Map pattern and the Registry pattern.

An Identity Map is essentially an in-memory cache that references objects based on a unique identifier. Whenever you want to obtain an object, you first check the cache to see if the object already exists—technically, if the cache contains an object for the given unique identifier. If so, you pull that existing instance directly from the cache. Otherwise, you construct the object, making any necessary database calls, and then insert it into the cache. It has now effectively been memoized, and the next time you request that identifier, that same instance will be returned again from the cache.

In addition to the performance benefits of a cache, an Identity Map also ensures that at most one copy of any object will exist in your system at any time. This effectively eliminates a whole class of bugs that could arise if you construct multiple copies of an object whose states get out of sync with each other.

Identity Map is a highly useful pattern, but by itself it tends to be cumbersome to implement. This is because you have to find every point in the code that constructs a model object, and inject some code to manage the cache. To solve this, we combine it with the Registry pattern. A Registry is simply a central source for object procurement; any code that wants to obtain a model object must get it from the Registry. The Registry knows how to construct the object if needed. Thus, it is able to encapsulate the Identity Map logic, providing a one-stop solution for object procurement with caching and duplication elimination. This combination is what we call an IDRegistry.

Tuples and Patterns

A common practice is to use database primary keys as the unique identifiers for an Identity Map. However, if you are managing multiple kinds of objects, objects that span multiple tables or aren’t associated with any particular database row, or objects that otherwise need to be identified across more than one dimension, you need a more versatile unique identifier.

IDRegistry uses arrays, or tuples as we will call them, as unique identifiers. This allows us to support a wide variety of identification strategies. A simple example is to employ a two-element tuple: the first element being a type indicator, and the second being a database primary key. For example, suppose your application had two types of entities: users and blog posts. Your user objects could employ unique identifiers of the form [:user, user-id], and your post objects could employ identifiers of the form [:post, post-id]. So the tuple [:user, 1] identifies the user with ID 1, and the tuple [:post, 210] identifies the post with ID 210.

Such “forms” of tuples that correspond to “types” of objects, we denote as patterns. A pattern is an array like a tuple, but some of its elements are placeholders rather than values. For example, [:user, 1] is a tuple that identifies the particular user with ID 1, while [:user, Integer] is the pattern followed by user identifiers. The Integer element is a placeholder that matches certain kinds of values—in this case, integer IDs. Similarly, [:post, Integer] is the pattern followed by blog post identifiers.

It is also common to use String as a placeholder element used in patterns. For example, if you have contacts that should be identified by a unique phone number, you could use tuples with the pattern [:contact, String] where the unique phone number is represented as a string.

Indeed, technically, a placeholder element can be any object that responds appropriately to the === operator. In most cases, these will be class or module objects such as Integer or String above; however, patterns can utilize regular expressions, or any other object that classifies based on the === operator.

Additionally, tuples (and patterns) can be longer and more complex than the two-element examples above. Suppose you have objects that represent relationships between users in a social network. Each relationship might be identified by the two user IDs of the users involved. Correspondingly, your identifier tuples might follow the pattern [:relationship, Integer, Integer].

Basic IDRegistry usage

Creating an IDRegistry is as simple as calling the create method:

my_registry = IDRegistry.create

The job of an IDRegistry is to procure objects by either returning a cached object or creating a new one. Thus, it needs to know how to construct your model objects. You accomplish this by configuring the registry at application initialization time, telling it how to construct the various types of objects it needs to manage.

In the previous section, we saw how a “type” of object can be identified with a pattern. Through configuration, we “teach” an IDRegistry how to construct objects for a given pattern. For example, our user objects have identifiers matching the pattern [:user, Integer]. We can configure a registry as follows:

my_registry.config.add_pattern |pattern|
  pattern.pattern [:user, Integer]
  pattern.to_generate_object do |tuple|
    my_create_user_object_given_id(tuple[1])
  end
end

The add_pattern configuration command teaches the IDRegistry about a certain pattern. If the registry encounters a tuple identifier matching that pattern, it specifies how to construct the object given that tuple.

You can add any number of pattern configurations to a registry, covering any number of object types.

Once you have configured your IDRegistry, using it is simple. Call the lookup method to obtain an object given a tuple. The registry will check its cache and return the object, or it will invoke the matching pattern configuration to create it.

user1 = my_registry.lookup(:user, 1)   # constructs a user object
user1a = my_registry.lookup(:user, 1)  # returns the same user object

If you want to clear the cache and force the registry to construct new objects, use the clear method:

my_registry.clear

Multiple identifiers for an object

Sometimes there will be several different mechanisms for identifying an object to procure. Consider a tree stored in the database, in which each node has a name that is unique among its siblings. Now, each node might have a database primary key, so you could identify objects using the pattern [:node, Integer]. However, since the combination of parent and name is unique, you could also uniquely identify objects using that combination: [:node, Integer, String] for parent ID and child name. These two patterns represent two different ways of looking up a tree node:

object = find_tree_node_id(id)
object = find_tree_from_parent_id_and_child_name(parent_id, child_name)

These two ways to lookup the object correspond to two different tuples—two different unique identifiers. In a simple Identity Map, this would not work. It might create an object using the first identifier, and then create a second object using the second identifier, even if semantically they should be the same object.

IDRegistry, however, supports this case by giving you the ability to define multiple patterns and associate them with the same object type. Here's how.

my_registry.config do |config|
  config.add_pattern do |pattern|
    pattern.type :treenode
    pattern.pattern [:node, Integer]
    pattern.to_generate_object do |tuple|
      find_tree_node_id(tuple[1])
    end
    pattern.to_generate_tuple do |obj|
      [:node, obj.id]
    end
  end
  config.add_pattern do |pattern|
    pattern.type :treenode
    pattern.pattern [:node, Integer, String]
    pattern.to_generate_object do |tuple|
      find_tree_from_parent_id_and_child_name(tuple[1], tuple[2])
    end
    pattern.to_generate_tuple do |obj|
      [:node, obj.parent_id, obj.name]
    end
  end
end

Let’s unpack this. First, notice that we are now specifying a “type” for each pattern. The type is a name for this type of object. If you omit it (as we did earlier), IDRegistry treats each pattern as a separate anonymous type. In this case, however, we set it explicitly. This lets us specify that both patterns describe the same type of object, a tree node object. Each tree node will now have TWO identifiers, one of each pattern. Doing a lookup for either identifiers will return the same object.

Second, now in addition to the to_generate_object block, we now provide a to_generate_tuple block. We need to tell IDRegistry how to generate a tuple (identifier) from an object. Why?

Well, suppose were were to look up a tree node by ID:

# Look up a node by database ID
node = my_registry.lookup(:node, 10)

At this point, IDRegistry can cache the object and associate it with the tuple [:node, 10] so that you can look it up using that tuple again. However, that object also has a parent and a name (suppose the parent ID is 9 and the name is “foo”). This means we would like to be able to look up that same object using the tuple [:node, 9, "foo"].

# Should the same object as the original node
node1 = my_registry.lookup(:node, node.parent_id, node.name)

IDRegistry therefore needs you to teach it how to generate that other tuple for the object. Similarly, if you originally looked up the tree node by parent ID and name, IDRegistry needs you to teach it how to generate the corresponding simple ID tuple.

So to summarize… if you have only one way to look up an object, you can simply specify the pattern and a to_generate_object block. For objects that can be looked up in more than one way, you should also include a type, to connect the various patterns together, and a to_generate_tuple block, which tells IDRegistry how to generate missing tuples.

Categories

Identity maps generally support one-to-one correspondence between objects and unique identifiers. We have already seen how IDRegistry can support multiple identifiers for an object, for object types that require multiple modes of lookup. In this section, we cover categories, which are special identifiers that can reference zero, one, or multiple objects that are already present in the identity map. IDRegistry can look up the collection of objects that match a given category, or use categories to delete groups of objects out of the identity map quickly.

We’ll cover an example. Take the “tree node” object type that we covered earlier. One of the patterns for a tree node identifies the node by parent ID and child name: [:tree, Integer, String]. This pattern lets us look up a specific child of a given node. However, suppose we want all the children (that have been loaded into the identity map so far) of a given parent. Somehow, we want to provide a specific parent ID, but a wildcard for the name.

That is how categories are defined. We start with an identifier pattern such as our example [:tree, Integer, String]. Now we choose some number (zero or more) of the placeholder elements that we want to specify. In this example, we want to specify the parent ID, so we choose array index 1, which is the parent ID in our pattern. We call this sequence of element indexes the “category indexes”. In this case, the category indexes are [1]. The rest of the placeholders (in our example, array index 2, the child name) will be treated as wildcards.

To define such a category, provide a name, a pattern for identifiers, and an array of category indexes, in the registry’s configuration as follows:

my_registry.config do |config|
  add_category(:node_parent, [:tree, Integer, String], [1])
end

Note that what we’ve done here is create a class of categories—specifically, the class of categories that include the children of given parent nodes. In order to specify a particular category, we must say which parent node. To do that, we take each of the indexes in the category index array (currently just one) and we assign each a particular value. In our example, suppose we want to specify the particular category with parent ID 9. We assign the value 9 to the first and only category index, and we get the array [9]. This new array is what we call the “category spec”. It specifies a particular category out of our class, in this case, the category of tree nodes whose parent ID is 9.

To look up nodes by category, for example, you must provide the name of the class of categories, and the category spec indicating which particular category.

objects = my_registry.objects_in_category(:node_parent, [9])

There are many different ways to specify categories. Sometimes you want a single “one-off” category that isn’t part of a class of categories. For example, you could define a category that includes ALL tree node objects like this:

my_registry.config do |config|
  add_category(:all_nodes, [:tree, Integer, String], [])
end

Note that we set the same pattern but now we have no category indexes. This means, we are not going to specify any parameters in order to identify a particular category. Instead, both parent ID and name string are wildcards, and all tree nodes will fall into the same category. This is still a “class” of categories, but a degenerate one with no parameters. When we want to look up the objects, our category spec is similarly the empty array, since we have no category indexes for which to provide values.

objects = my_registry.objects_in_category(:all_nodes, [])

Some classes of categories may be parameterized by multiple elements. For example, if we set up a category like this:

my_registry.config do |config|
  add_category(:parent_and_name, [:tree, Integer, String], [1, 2])
end

Now our category is parameterized by both parent and name. This isn’t a very useful class of categories since each will contain at most one element. I show it merely to illustrate that you can have multiple category indexes. So here’s how to get all the objects in the “category of objects with parent ID 9 and name foo”:

objects = my_registry.objects_in_category(:parent_and_name, [9, "foo"])

Finally, here is one more example. Suppose you did this:

my_registry.config do |config|
  add_category(:named_foo, [:tree, Integer, 'foo'], [])
end

Notice that we’ve replaced the “child name” placeholder with a particular ID. That element is no longer a placeholder, but a specific value. This is now a one-off category (i.e. a degenerate class of categories with no parameters) of objects whose name is “foo”. The pattern that you use to define a category doesn’t have to be exactly one of the patterns that you use to define an object type. It just has to match actual identifier tuples in your registry.

Web requests

In a web application, you will typically want to “scope” an identity map to a request. That is, you usually won’t want objects from one request leaking into the next. IDRegistry provides two different mechanisms to limit the scope of a registry.

First, if you are certain that only one request will be run at a time in any given process, you can keep a single global registry, and just clean it out at the end of a request. IDRegistry provides a Rack middleware for this purpose. If you are running Rails, you can include a Railtie that installs the middleware for you, and gives you a configuration that you can use to specify which registry to clear.

If you are running multithreaded, then you will need multiple registries, one for each request. The easiest way to do this is to configure a registry as a “template” at startup, and then use the spawn_registry method to create a new empty registry from that template for each request. A Rack middleware that implements this strategy is also provided.

Thread safety

IDRegistry is fully thread safe. All object lookup and manipulation methods, as well as configuration methods, are re-entrant and can be called concurrently from multiple threads. The desired invariant, that a registry will contain at most one copy of any given object, will be maintained.

There is, however, one caveat that has to do with the callbacks for constructing objects or generating tuples. IDRegistry does not include those callbacks within its critical sections. (To do so would invite deadlocks.) Instead, we run callbacks eagerly and serialize later on insertion. This means it is possible, in a multi-threaded environment, for a callback to be called but its result to be discarded. Here is the scenario:

Two threads simultaneously call Registry#lookup for the same tuple that is not already present in the identity map.
Within IDRegistry, both threads check for the tuple and determine that it is not present.
Both threads then call the callback to create the object. Because IDRegistry does not put any mutual exclusion on callbacks, they execute concurrently and create two separate objects.
However, only one of the two objects will actually be used. IDRegistry will use the object created by whichever thread finishes first, and throw away the object created by the slower thread. Both threads will return the same object, the first thread’s copy, from the call.

Therefore, our desired behavior still holds; however, you should be aware that it is possible for more than one copy of an object to have been created in the interim.