Saturday, 11 December 2010

The Iterative Reference Model: A new approach to OO

Object Oriented Models, even when very loosely coupled and very highly cohesive, can become quite complex over time. Also, when developing applications from scratch, it can be difficult to encapsulate all the requirements of the customer into one coherent model. In this article, I will share an interesting proposal to the architecture of OO-models that makes developing large systems highly iterative: You can completely develop one piece of the application, without having to take future extensions into account. This is ideal when you're using an Agile development method, such as SCRUM, where you have to show the customer a working application at the end of each iteration.

This approach is not so much something I invented after a long train of thought, though. Actually, it is something that I discovered while developing applications and that proved itself very useful even before I thought of it as a generally applicable pattern. Now, I try to use it as much as I can in every new model I design and it really makes developing applications so much faster and cleaner. I want to warn all the OO-purists out there: you may think that this approach is extremely wrong and that I am out of my mind, but bear with me! I will discuss the consequences of the approach later on.

If you want to improve, be content to be thought foolish and stupid

- Epictetus -

Before we look at the Iterative Reference Model, let's first start with the traditional approach of OO-models.

The Traditional Approach
Traditionally, OO-designs are based around "has-a" and "is-a" relationships. They try to model the different entities (classes) as you would do in real life. For example, take the class "Car" and the class "Wheel". Normally, a Car would have four Wheel-objects inside it (and maybe the Wheel-objects have a reference to the Car too). This is typically an example of "has-a", a Car has-a Wheel (or 4). Let's apply this to a more complex model, that of an (extremely) simple CRM application:


  • Company: A company that you do business with. A Company has Contacts and has ContactMoments.
  • Contact: A representative of a Company, i.e. a person, an employee of that Company, who is your contact inside that Company. There could be several Contacts per Company. A Contact has ContactMoments too.
  • ContactMoment: A moment of contact, i.e. a phone call or an email, that you have with a Company. Could be specific to one or more Contacts of that Company.
The arrows represent the references that a class has to another class. We forget about the entire UML-spec, as it is not necessary for this article and we keep it uni-directional, to keep it simple. The multiplicities however, are specified. The picture shows the traditional "has-a" relationships: A Company "has" Contacts and "has" ContactMoments. A Contact also "has" ContactMoments.

Let's also look at the underlying relational model. We don't want to "pollute" our domain tables with list indices, so we create separate tables for the one-to-many and many-to-many relations (a list_index denotes the position of, for example, a Contact within the list of Contacts of a Company). I use MySQL syntax, but that's irrelevant.
CREATE TABLE Company (
 id                  BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 ... properties of Company ...
)

CREATE TABLE Contact (
 id                  BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 ... properties of Contact ...
)

CREATE TABLE ContactMoment (
 id                  BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 ... properties of ContactMoment ...
)

CREATE TABLE Company_Contact (
 companyId           BIGINT NOT NULL,
 contactId           BIGINT NOT NULL,
 list_index          INTEGER NOT NULL,
 
 CONSTRAINT FOREIGN KEY (companyId) REFERENCES Company (id) ON DELETE CASCADE,
 CONSTRAINT FOREIGN KEY (contactId) REFERENCES Contact (id) ON DELETE CASCADE
)

CREATE TABLE Company_ContactMoment (
 companyId           BIGINT NOT NULL,
 contactMomentId     BIGINT NOT NULL,
 list_index          NTEGER NOT NULL,
 
 CONSTRAINT FOREIGN KEY (companyId) REFERENCES Company (id) ON DELETE CASCADE,
 CONSTRAINT FOREIGN KEY (contactMomentId) REFERENCES ContactMoment (id) ON DELETE CASCADE
)

CREATE TABLE Contact_ContactMoment (
 contactId           BIGINT NOT NULL,
 contactMomentId     BIGINT NOT NULL,
 list_index          INTEGER NOT NULL,
 
 CONSTRAINT FOREIGN KEY (contactId) REFERENCES Contact (id) ON DELETE CASCADE,
 CONSTRAINT FOREIGN KEY (contactMomentId) REFERENCES ContactMoment (id) ON DELETE CASCADE
)
This approach has the following drawbacks:

  • It is not iterative: you have to develop the entire domain model before you can start with other tiers, such as the database or the user-interface. Ideally, you want to focus on one class at a time. For example, you want to create the entire Company class first, top-to-bottom, so that you can show that to the customer at the end of the week, without having to worry about Contacts and ContactMoments just yet. That's for next week! In this model, a Company has references to Contacts and ContactMoments, so you have to create them (or at least mock them), before you can continue. Of course, this is a very simple example, but the same applies to more complicated models.

  • You need no less than 6 database tables to represent this OO-model in a clean way (that is, putting the one-to-many and many-to-many relationships in separate tables).

  • Suppose you want to delete a Contact. Now, you have to modify the list indices of all other Contacts of that Company that have a higher index, which means (if you use Hibernate or another ORM-library), deleting the Contact from the list of Contacts of the corresponding Company and than saving that Company again. This is a big hassle.

  • Suppose a Contact moves to another Company or you have mistakenly added a Contact to the wrong Company. Same hassle, you have to modify the list of Contacts of the Company that previously held the Contact and, separately, add the Contact to the correct Company.

  • Suppose that you want to delete a Company and its Contacts. In this model, when you delete a Company, the entries in the one-to-many tables will be deleted (because of the ON DELETE CASCADE), but the Contacts in the Contact table will remain. You have to delete them manually or do some tricky stuff with your ORM-library to take care of it.

  • Suppose that, for whatever reason, you cannot use the lazy-loading techniques of your ORM-library (such as Hibernate). Now, all Contacts and ContactMoments are loaded into memory if you want to adjust just one little field of a Company.
The Iterative Approach
So let's now introduce the Iterative approach. This approach is centered around the principle that you can add one domain object at a time and that the existing domain objects know nothing about the objects that are added later. As we will see, besides the iterative nature, this approach provides some additional benefits, too.


As we can see in the picture, basically all arrows are flipped the other way. The multiplicities however, remain where they were and are therefore on the other side of the arrow now! Let's go through the model:

  • Company: A company that you do business with. Holds its own properties, but knows nothing about the other classes.
  • Contact: A representative of a Company. Each Contact has a reference to the Company it belongs to. Multiple Contacts could refer to the same Company, but one Contact refers to one Company. A Contact knows nothing of ContactMoments.
  • ContactMoment: A moment of contact. Has a reference to the Company it belongs to and to the Contacts it applies to.
Let's look at the relational model:
CREATE TABLE Company (
 id                  BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 ... properties of Company ...
)

CREATE TABLE Contact (
 id                  BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 companyId           BIGINT NOT NULL,
 ... properties of Contact ...
 
 CONSTRAINT FOREIGN KEY (companyId) REFERENCES Company (id) ON DELETE CASCADE
)

CREATE TABLE ContactMoment (
 id                  BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 companyId           BIGINT NOT NULL,
 ... properties of ContactMoment ...
 
 CONSTRAINT FOREIGN KEY (companyId) REFERENCES Company (id) ON DELETE CASCADE
)

CREATE TABLE ContactMoment_Contact (
 contactMomentId     BIGINT NOT NULL,
 contactId           BIGINT NOT NULL,
 list_index          INTEGER NOT NULL,
 
 CONSTRAINT FOREIGN KEY (contactMomentId) REFERENCES ContactMoment (id) ON DELETE CASCADE,
 CONSTRAINT FOREIGN KEY (contactId) REFERENCES Contact (id) ON DELETE CASCADE
)
And here are the benefits of this approach:

  • It is highly iterative: you can completely develop the Company class top-to-bottom, show it to the customer, then develop the Contact class, show it to the customer and finally the ContactMoments. You never have to take future expansions into account.

  • You are down to only 4 database tables. We still need one many-to-many table, but we have nevertheless achieved a 33% decrease in the number of tables.

  • Suppose you want to delete a Contact. All you have to do is delete the Contact! The Company will not be affected and if a ContactMoment references the Contact, it will remain, which is probably what you want, since it still represents a moment of contact with the Company and it might reference other Contacts. The entry in the many-to-many table will be deleted, though, which is, again, what you want.

  • Suppose a Contact moves to another Company or you have mistakenly added a Contact to the wrong Company. Just update the Contact with the correct Company-reference and you are done!

  • In fact, there are no more list-indices associated with Contacts and ContactMoments in a Company. They are constructed at runtime, based on the order-by clause of the SQL (or HQL) query. Therefore, you can easily change the ordering of the Contacts and ContactMoments should you need to.

  • If you delete a Company now, all Contacts and ContactMoments will automatically be cleaned up (because of the ON DELETE CASCADE). No more manual deletions.
Some considerations

  • You might argue that you do want your Contacts referenced in your Company, because you will want to display them on the company page anyway. In the Iterative model, you have to perform a separate query to retrieve the Contacts of a Company, whereas they were readily available to you in the Traditional model. This is certainly true from an OO-perspective, but when it comes to performance and the number of queries to the database, it makes no difference. Your ORM-framework will make the same query to retrieve all the Contacts for a Company in the Traditional model too. You don't have to do it yourself anymore though, that is certainly true.

  • You might also argue that this goes against the object oriented principle that you should have proper "has-a" constructs. This might be true, but on the other hand, it is just a matter of how you look at it. Let's go back to our earlier example of the Car that "has" Wheels. This might seem logical, but does a Car really "care" whether it has wheels? The axes will turn when the engine is on, with or without wheels. You could argue that a Wheel "has-a" Car that provides it with torque. This may seem far-fetched, but I want to show you that traditional "has-a" relationships are tightly linked to the human interpretation of how things are and that the traditional OO principle may not be the only way to model things.

    Most people have principles in order to avoid the effort of thinking

    - Fliegende Bl├Ątter -

  • You might say: "Gert-Jan! You should never ever let your OO model influence your relational model and vice versa! They should be independent and the mismatch should be solved at the ORM layer!" I then would say: "You are absolutely right!" After all, my main motivation for this approach is the iterative nature of it: the ability to complete one part of an application without having to think of the next part. Trust me, it makes development so much easier! The positive consequences on the DB-side are just an additional benefit. Who says there's no such thing as a free lunch?

  • And finally: I'm definitely not suggesting that this is a Golden Hammer or a Holy Grail! There's no such thing... All I'm saying is that this approach could lead to faster and easier development in use-cases that lend themselves for it. And you can easily mix this approach with other Design Patterns, so that you can take full benefit of all the good things that OO has to offer!
Update: I got a lot of questions about why I would use join-tables to model the one-to-many relations in the Tradional approach. Basically, the reason is twofold:

  • I don't want to "pollute" domain tables with list indices, as I said before.

  • Suppose you have a one-to-many relationship Car -> Wheel. Many people suggested that a Wheel should have a foreign key to Car. But now we are going to add the one-to-many relationship Bike -> Wheel. Now what? You're stuck. When you use join-tables, you simply add a table for the new relationship. In the Iterative model, the one-to-many becomes many-to-one, so in that case, you can do it with a foreign key. But as I said, you should only apply this approach when applicable. In the Car, Wheel, Bike story, it is clearly not the best way to go, but there are a lot of cases where it is.
And please remember, the topic is about the Iterative approach in OO, the DB-things are just a sidestep. End of update.