IDRegistry is a generic object generator and identity map for Ruby.
This document provides an in-depth introduction to using IDRegistry. For a quick introduction, see the README.
IDRegistry combines two patterns from Martin Fowler's book "Patterns of Enterprise Application Architecture": the Identity Map pattern and the Registry pattern.
An Identity Map is essentially an in-memory cache that references objects based on a unique identifier. Whenever you want to obtain an object, you first check the cache to see if the object already exists—technically, if the cache contains an object for the given unique identifier. If so, you pull that existing instance directly from the cache. Otherwise, you construct the object, making any necessary database calls, and then insert it into the cache. It has now effectively been memoized, and the next time you request that identifier, that same instance will be returned again from the cache.
In addition to the performance benefits of a cache, an Identity Map also ensures that at most one copy of any object will exist in your system at any time. This effectively eliminates a whole class of bugs that could arise if you construct multiple copies of an object whose states get out of sync with each other.
Identity Map is a highly useful pattern, but by itself it tends to be cumbersome to implement. This is because you have to find every point in the code that constructs a model object, and inject some code to manage the cache. To solve this, we combine it with the Registry pattern. A Registry is simply a central source for object procurement; any code that wants to obtain a model object must get it from the Registry. The Registry knows how to construct the object if needed. Thus, it is able to encapsulate the Identity Map logic, providing a one-stop solution for object procurement with caching and duplication elimination. This combination is what we call an IDRegistry.
A common practice is to use database primary keys as the unique identifiers for an Identity Map. However, if you are managing multiple kinds of objects, objects that span multiple tables or aren’t associated with any particular database row, or objects that otherwise need to be identified across more than one dimension, you need a more versatile unique identifier.
IDRegistry uses arrays, or tuples as
we will call them, as unique identifiers. This allows us to support a wide
variety of identification strategies. A simple example is to employ a
two-element tuple: the first element being a type indicator, and the second
being a database primary key. For example, suppose your application had two
types of entities: users and blog posts. Your user objects could employ
unique identifiers of the form [:user, user-id]
, and
your post objects could employ identifiers of the form [:post,
post-id]
. So the tuple [:user, 1]
identifies
the user with ID 1, and the tuple [:post, 210]
identifies the
post with ID 210.
Such “forms” of tuples that correspond to “types” of objects, we denote as
patterns. A pattern is an array like a tuple, but some of its
elements are placeholders rather than values. For example, [:user,
1]
is a tuple that identifies the particular user with ID 1, while
[:user, Integer]
is the pattern followed by user identifiers.
The Integer
element is a placeholder that matches certain
kinds of values—in this case, integer IDs. Similarly, [:post,
Integer]
is the pattern followed by blog post identifiers.
It is also common to use String
as a placeholder element used
in patterns. For example, if you have contacts that should be identified by
a unique phone number, you could use tuples with the pattern
[:contact, String]
where the unique phone number is
represented as a string.
Indeed, technically, a placeholder element can be any object that responds
appropriately to the === operator. In most cases, these will be class or
module objects such as Integer
or String
above;
however, patterns can utilize regular expressions, or any other object that
classifies based on the === operator.
Additionally, tuples (and patterns) can be longer and more complex than the
two-element examples above. Suppose you have objects that represent
relationships between users in a social network. Each relationship might be
identified by the two user IDs of the users involved. Correspondingly, your
identifier tuples might follow the pattern [:relationship, Integer,
Integer]
.
Creating an IDRegistry is as simple as
calling the create
method:
my_registry = IDRegistry.create
The job of an IDRegistry is to procure objects by either returning a cached object or creating a new one. Thus, it needs to know how to construct your model objects. You accomplish this by configuring the registry at application initialization time, telling it how to construct the various types of objects it needs to manage.
In the previous section, we saw how a “type” of object can be identified
with a pattern. Through configuration, we “teach” an IDRegistry how to construct objects for a given
pattern. For example, our user objects have identifiers matching the
pattern [:user, Integer]
. We can configure a registry as
follows:
my_registry.config.add_pattern |pattern| pattern.pattern [:user, Integer] pattern.to_generate_object do |tuple| my_create_user_object_given_id(tuple[1]) end end
The add_pattern
configuration command teaches the IDRegistry about a certain pattern. If the
registry encounters a tuple identifier matching that pattern, it specifies
how to construct the object given that tuple.
You can add any number of pattern configurations to a registry, covering any number of object types.
Once you have configured your IDRegistry,
using it is simple. Call the lookup
method to obtain an object
given a tuple. The registry will check its cache and return the object, or
it will invoke the matching pattern configuration to create it.
user1 = my_registry.lookup(:user, 1) # constructs a user object user1a = my_registry.lookup(:user, 1) # returns the same user object
If you want to clear the cache and force the registry to construct new
objects, use the clear
method:
my_registry.clear
Sometimes there will be several different mechanisms for identifying an
object to procure. Consider a tree stored in the database, in which each
node has a name that is unique among its siblings. Now, each node might
have a database primary key, so you could identify objects using the
pattern [:node, Integer]
. However, since the combination of
parent and name is unique, you could also uniquely identify objects using
that combination: [:node, Integer, String]
for parent ID and
child name. These two patterns represent two different ways of looking up a
tree node:
object = find_tree_node_id(id) object = find_tree_from_parent_id_and_child_name(parent_id, child_name)
These two ways to lookup the object correspond to two different tuples—two different unique identifiers. In a simple Identity Map, this would not work. It might create an object using the first identifier, and then create a second object using the second identifier, even if semantically they should be the same object.
IDRegistry, however, supports this case by giving you the ability to define multiple patterns and associate them with the same object type. Here's how.
my_registry.config do |config| config.add_pattern do |pattern| pattern.type :treenode pattern.pattern [:node, Integer] pattern.to_generate_object do |tuple| find_tree_node_id(tuple[1]) end pattern.to_generate_tuple do |obj| [:node, obj.id] end end config.add_pattern do |pattern| pattern.type :treenode pattern.pattern [:node, Integer, String] pattern.to_generate_object do |tuple| find_tree_from_parent_id_and_child_name(tuple[1], tuple[2]) end pattern.to_generate_tuple do |obj| [:node, obj.parent_id, obj.name] end end end
Let’s unpack this. First, notice that we are now specifying a “type” for each pattern. The type is a name for this type of object. If you omit it (as we did earlier), IDRegistry treats each pattern as a separate anonymous type. In this case, however, we set it explicitly. This lets us specify that both patterns describe the same type of object, a tree node object. Each tree node will now have TWO identifiers, one of each pattern. Doing a lookup for either identifiers will return the same object.
Second, now in addition to the to_generate_object
block, we
now provide a to_generate_tuple
block. We need to tell IDRegistry how to generate a tuple (identifier)
from an object. Why?
Well, suppose were were to look up a tree node by ID:
# Look up a node by database ID node = my_registry.lookup(:node, 10)
At this point, IDRegistry can cache the
object and associate it with the tuple [:node, 10]
so that you
can look it up using that tuple again. However, that object also has a
parent and a name (suppose the parent ID is 9 and the name is “foo”). This
means we would like to be able to look up that same object using
the tuple [:node, 9, "foo"]
.
# Should the same object as the original node node1 = my_registry.lookup(:node, node.parent_id, node.name)
IDRegistry therefore needs you to teach it how to generate that other tuple for the object. Similarly, if you originally looked up the tree node by parent ID and name, IDRegistry needs you to teach it how to generate the corresponding simple ID tuple.
So to summarize… if you have only one way to look up an object, you can
simply specify the pattern and a to_generate_object
block. For
objects that can be looked up in more than one way, you should also include
a type, to connect the various patterns together, and a
to_generate_tuple
block, which tells IDRegistry how to generate missing tuples.
Identity maps generally support one-to-one correspondence between objects and unique identifiers. We have already seen how IDRegistry can support multiple identifiers for an object, for object types that require multiple modes of lookup. In this section, we cover categories, which are special identifiers that can reference zero, one, or multiple objects that are already present in the identity map. IDRegistry can look up the collection of objects that match a given category, or use categories to delete groups of objects out of the identity map quickly.
We’ll cover an example. Take the “tree node” object type that we covered
earlier. One of the patterns for a tree node identifies the node by parent
ID and child name: [:tree, Integer, String]
. This pattern lets
us look up a specific child of a given node. However, suppose we
want all the children (that have been loaded into the identity map
so far) of a given parent. Somehow, we want to provide a specific parent
ID, but a wildcard for the name.
That is how categories are defined. We start with an identifier pattern
such as our example [:tree, Integer, String]
. Now we choose
some number (zero or more) of the placeholder elements that we want to
specify. In this example, we want to specify the parent ID, so we
choose array index 1, which is the parent ID in our pattern. We call this
sequence of element indexes the “category indexes”. In this case, the
category indexes are [1]
. The rest of the placeholders (in our
example, array index 2, the child name) will be treated as wildcards.
To define such a category, provide a name, a pattern for identifiers, and an array of category indexes, in the registry’s configuration as follows:
my_registry.config do |config| add_category(:node_parent, [:tree, Integer, String], [1]) end
Note that what we’ve done here is create a class of
categories—specifically, the class of categories that include the children
of given parent nodes. In order to specify a particular category,
we must say which parent node. To do that, we take each of the
indexes in the category index array (currently just one) and we assign each
a particular value. In our example, suppose we want to specify the
particular category with parent ID 9. We assign the value 9 to the first
and only category index, and we get the array [9]
. This new
array is what we call the “category spec”. It specifies a particular
category out of our class, in this case, the category of tree nodes whose
parent ID is 9.
To look up nodes by category, for example, you must provide the name of the class of categories, and the category spec indicating which particular category.
objects = my_registry.objects_in_category(:node_parent, [9])
There are many different ways to specify categories. Sometimes you want a single “one-off” category that isn’t part of a class of categories. For example, you could define a category that includes ALL tree node objects like this:
my_registry.config do |config| add_category(:all_nodes, [:tree, Integer, String], []) end
Note that we set the same pattern but now we have no category indexes. This means, we are not going to specify any parameters in order to identify a particular category. Instead, both parent ID and name string are wildcards, and all tree nodes will fall into the same category. This is still a “class” of categories, but a degenerate one with no parameters. When we want to look up the objects, our category spec is similarly the empty array, since we have no category indexes for which to provide values.
objects = my_registry.objects_in_category(:all_nodes, [])
Some classes of categories may be parameterized by multiple elements. For example, if we set up a category like this:
my_registry.config do |config| add_category(:parent_and_name, [:tree, Integer, String], [1, 2]) end
Now our category is parameterized by both parent and name. This isn’t a very useful class of categories since each will contain at most one element. I show it merely to illustrate that you can have multiple category indexes. So here’s how to get all the objects in the “category of objects with parent ID 9 and name foo”:
objects = my_registry.objects_in_category(:parent_and_name, [9, "foo"])
Finally, here is one more example. Suppose you did this:
my_registry.config do |config| add_category(:named_foo, [:tree, Integer, 'foo'], []) end
Notice that we’ve replaced the “child name” placeholder with a particular ID. That element is no longer a placeholder, but a specific value. This is now a one-off category (i.e. a degenerate class of categories with no parameters) of objects whose name is “foo”. The pattern that you use to define a category doesn’t have to be exactly one of the patterns that you use to define an object type. It just has to match actual identifier tuples in your registry.
In a web application, you will typically want to “scope” an identity map to a request. That is, you usually won’t want objects from one request leaking into the next. IDRegistry provides two different mechanisms to limit the scope of a registry.
First, if you are certain that only one request will be run at a time in any given process, you can keep a single global registry, and just clean it out at the end of a request. IDRegistry provides a Rack middleware for this purpose. If you are running Rails, you can include a Railtie that installs the middleware for you, and gives you a configuration that you can use to specify which registry to clear.
If you are running multithreaded, then you will need multiple registries, one for each request. The easiest way to do this is to configure a registry as a “template” at startup, and then use the spawn_registry method to create a new empty registry from that template for each request. A Rack middleware that implements this strategy is also provided.
IDRegistry is fully thread safe. All object lookup and manipulation methods, as well as configuration methods, are re-entrant and can be called concurrently from multiple threads. The desired invariant, that a registry will contain at most one copy of any given object, will be maintained.
There is, however, one caveat that has to do with the callbacks for constructing objects or generating tuples. IDRegistry does not include those callbacks within its critical sections. (To do so would invite deadlocks.) Instead, we run callbacks eagerly and serialize later on insertion. This means it is possible, in a multi-threaded environment, for a callback to be called but its result to be discarded. Here is the scenario:
Two threads simultaneously call Registry#lookup for the same tuple that is not already present in the identity map.
Within IDRegistry, both threads check for the tuple and determine that it is not present.
Both threads then call the callback to create the object. Because IDRegistry does not put any mutual exclusion on callbacks, they execute concurrently and create two separate objects.
However, only one of the two objects will actually be used. IDRegistry will use the object created by whichever thread finishes first, and throw away the object created by the slower thread. Both threads will return the same object, the first thread’s copy, from the call.
Therefore, our desired behavior still holds; however, you should be aware that it is possible for more than one copy of an object to have been created in the interim.