Domain Modeling and Object Relational Mapping in Agile Settings
This is the first in series of articles about Object Relational Mapping.
Lately, I have seen many agile projects choosing code generation tools to implement a persistence mapping layer, using Hibernate as their ORM of choice.
In most cases, the relational schema has already been defined by the customer and the team is developing the backend system around it. This common scenario leads to the Bottom up approach for building the persistence layer in which the development begins from an existing database schema and data model.
One can see why it is hard to resist the temptation of using code generation tools to generate the Java persistence entities as well as the DAO layer. Many developers have chosen this approach believing they were following the XP mantra: “Do the Simplest Thing that could possibly work”.
“The simplest thing” is sometimes misinterpreted as “the quickest thing” since code generation is lot quicker then hand crafting code. In this article series, I will talk about the dangers of the reverse-engineering-code-generation approach, and look into how to apply the Simplest Thing without compromising the quality of the domain model.
The reverse mapping of a database schema into an Object Domain model is a nice idea but it is fundamentally flawed given the impedance mismatch between these two models. It’s like trying to translate from one language to another using Yahoo Babel Fish. It can work if you have simple text, but as soon as you introduce more complexity in the language structure of the original text then things can get lost in translation or misinterpreted and in some cases the final result become completely bogus. Same with these tools, it can work if the database schema is very simple.
The Paradigm Mismatch versus Code Generation Tools
Granularity
Perhaps the most pervasive problem is related to granularity, which refers to the relative size of types we deal with, to illustrate this problem consider the following Users table:
create table USERS (
USERNAME varchar(30) not null primary key,
FIRSTNAME varchar(50) not null,
LASTNAME varchar(50) not null,
STREET varchar(100),
CITY varchar(100),
STATE varchar(15),
ZIPCODE varchar(10)
)
If we were to use the code generation tool (pick your favorite tool, it does not matter), from this schema the tool would generate the following java class: (complete code omitted for simplicity)
@Entity
@Table(name = "USERS")
public class Users implements Serializable {
private String city; private String userName; private String firstName; private String lastName; private String street; private String state; private String zipcode;
... }
What is the problem with the above class?
The tool is doing a “literal” translation from the database schema into Java. Just because they choose to mix User information and Address information in the same SQL table, most likely to avoid table joins, that doesn’t mean the Java equivalent has to follow. A much better mapping or translation would be:
@Entity
@Table(name = "USERS")
public class Users implements Serializable {
private String userName; private String firstName; private String lastName; private Address address
....
}
In the above class, the Address type is being used to replace the scattered address attributes. By doing so we not only construct a semantically richer model, but we can now re-use the address concept elsewhere in the application. Here, the Address class is an example of a Value Object (See Domain Driven Design by Eric Evans). The point here is that only the domain modeler has this kind of knowledge to construct a rich object model.
On the next segment, we will look into other aspects of the Paradigm Mismatch so that we can be better prepared to fix any translation issues when using the bottom up approach.

March 1st, 2009 at 12:09 pm
The example of name and address is such a great example because it seems so self-evidently simple but upon closer inspection its not. For example, what about past addresses, billing addresses, email addresses, addresses without occupants. If I dont need these things today, might I need them in the future?
Database models (or any static model for that matter) cant possibly capture all the semantics you might need. Its kind of like saying that the dictionary holds all the meanings of the infinite number of sentences you could potentially construct.
March 6th, 2009 at 1:34 am
That is a great start – I’m quite interested in the series on ORM please continue!
I’ve read Eric Evans – Domain Driven Design and love the Ubiquitous Language concept he implores us to use.
http://domaindrivendesign.org/discussion/messageboardarchive/UbiquitousLanguage.html
I believe we can learn a lot from his modeling concept of Entities (aka Reference Objects) versus Value Objects. I learned a lot form his book and wish that UML had his concept of surrounding the Entity with a line (circle – enclosure) to denote the whole vs. the sub-part. This semantic is what is missing in the DB model of the table, it is what you have added in your object model of the User class, a containment of Address. However it is never explicitly called out in any of our modeling tools! Strange! Do you think it is so unimportant?
March 6th, 2009 at 11:49 am
Thanks for posting comments!
Object oriented modeling provides a different level of abstraction when compared to relational modeling. In the Object world the key aspects are Information Hiding, Data Abstraction, Encapsulation, Polymorphism and Inheritance.
In the example above, we are utilizing Data Abstraction when we created the Address type to better represent the address concept (The Address type is an example of a Value Object in Eric Evan’s terminology).
The Address type is not originally present in User table, mainly because the relational model does not allow for the creation of user defined types (some databases do, but it’s not a standard), and if the relational modeler where to model Address as table, it would require a table join to link these two during run time. That could be an unjustified performance penalty depending on the volume of these two tables.
So, when defining java Entities, why not take advantage and use the powerful abstractions mechanisms at hand? Tools that auto-magically generate the domain model from a database schema do not posses the knowledge to do this correctly.
In the next article I will explore other aspects of ORM, stay tuned!