O/R, databases and “leaky abstractions”

O/R has been on my mind in varying degrees since I first heard Scott Ambler (http://www.ambysoft.com/mappingObjects.html) speak at SD Expo a few years back.  Every time I have gotten close to taking the leap, by attempting to implement Scott’s UML model, learning EJB (poorly), and now by evaluating various .net O/R products, I am struck more by what I’d be losing than what I’d be gaining. Sometimes it makes me feel like a Luddite, but lately I have been putting more thought into my discomfort.  Joel Spolsky’s clever “Law of Leaky Abstractions” has put more perspective on the problem for me. Seeing my own systems grow to many millions of records has had an impact too. The whole java vs .net petshop discussion contributed to my thinking. Part of the problem is that simpler cases used to prototype technologies all work, every one. Every one works, so long as the data size remains smallish, the assumptions remain close the original assumptions, the complications minimal. After that, things start to get interesting.

As much as I have admonished, browbeaten, berated my programmers over the years to reduce the importance of relational databases and especially stored procedures in their designs, I have come to the conclusion that ignoring the power of a relational database system is foolish. Failure to understand and embrace the very important features they offer is foolish. It seems to me that part of the spirit of O/R is to make the database an invisible persistence mechanism, unimportant to the overall architecture. In most cases, I think this is a mistake. This I think could easily become a huge “Leaky Abstraction”, on a scale that makes COM+ transactions seem like a good idea.  If you look at a database server as a system with tables and indexes for which you must use this clunky 1960’s vintage language to populate and query, then the view that making this transparent is probably very appealing. However, if you look at a database as a powerful system of storing, caching, indexing, optimizing the search of values, I think you will view the database as a first class asset, to be used to it’s full potential. This isn’t a question of whether or not to use stored procedures or not. It’s a question of how to include the ability to use the power that hand tuned sql, which could be expressed as stored procedures, parameterized dynamic sql or plain sql.

This view is not the same as believing that the database is the “first tier”, a phrase used by Frans Bouma to describe those who prefer to use stored procedures for all their database access calls in a recent advanced-dotnet post. I would never choose to base my domain model directly on my database schema. I think you have to recognize that stored procs and dynamic sql are both commands to a database. It’s a recognition that the database can do things that are very difficult or impossible to accomplish in memory in your domain model. And, that some of those commands will require hand tuned queries, and intimate knowledge of the structure and context of a particular database to adequately construct. So, in my ideal O/R mapping scheme, while I may be perfectly fine with generated statements, I insist on having the ability to do my object mapping based on hand written code when needed. I insist on knowing that if a database performance problem appears, as it inevitably will, I will have a way to carefully examine a query, tune it, and replace it with a different query or a stored proc call.  Whether or not it’s dynamic sql is not the issue, it’s whether or not I can substitute a bad query that’s the issue. And, it’s not trying to hide the database as an ugly step child.

In my mind, it should be possible to have transparent database access and still allow customized sql. In my database layer, all (read most) sql is stored outside the code. The statements are referred to by name with parameters, but the names do not have to correspond to anything database specific. The names can be thought of as domain model data commands. The statements could easily be generated from domain model metadata, just as I can generate statements from tables and stored procedures now.  A factory produces the actual command calls and executes them, an architecture that allows for interception, multiple databases and data providers, chains, decorators and all the other OO goodness you would like to apply to a problem like this. Calling code has no idea if it’s using a stored procedure or parameterized sql. Since I don’t have an O/R layer, the programmer must know which commands to call. If an O/R tool used a DAL approach like mine, I don’t think I’d have the same sort of hesitation about using the tool.

Finally, in my experience, databases and code evolve at different timeframes, especially if many applications share the same data. Even if you have the ability to architect every data call to use a shared domain model, in any large organization, you will not be able to migrate all of your applications at the same time. If project timeframes overlap, you always have a long cycle of retiring old, maintaining current and rolling out new code. In most environments, you also have software that simply will never be able to use your domain model directly. And, in my experience, the database will outlive every specific application, architecture and hot trend. The trick then is to acknowledge this basic fact, acknowledge the power of the database, but not let it dictate how any particular application or domain model should execute.

Advertisements

2 responses to “O/R, databases and “leaky abstractions”

  1. Hi Philip,

    VERY interesting!

    🙂

    Best Regards,

    Jimmy

    http://www.jnsk.se/weblog/

    ###

  2. Michael K. Campbell

    Philip,

    Great perspectives. I think many O/R stategies are based around avoiding what many people don’t like: work. As programers we want to spend our time doing what is cool/l33t/sexy. Coding sprocs just isn’t that (unless you’re a freak like me — but even I get moderately tired of the CRUD sprocs).

    Accepting, however, the fact that the database is a tool, and not a liability seems to be the greatest problem i see in most O/R schemes. If an app is going to live/grow then it will surely amass data if it is used. Once you get to a point where performance issues HAVE to be addressed… generated inline sql just scares me.

    I MUCH prefer to

    1) store data in the database in a solid/normalized fashion. (proper normalization is essential as it allows for extensibility down the road… even at the expense of performance).

    2) build a logical schema on top of my physical schema… i.e. mostly views etc, as well as sprocs and the like. The purpose of this layer/schema is to abstract out full-blown entities — as my app will see them. Performance is addressed by allowing flexibility under this layer (sometimes even to perf tables where complex values are routinely calculated/stored for views that sit over the perf table representing entitites).

    3) DAL. Here’s where I can see an O/R tool being useful… as long as I’ve got some control over it. For the most part I see it as a good option to pull in collections of my entitities — binding directly against my LOGICAL schema (which affords me the ability to move filegroups, change indexes, keys, etc down in the physical schema as needed, as well as using perf tables etc.) ….

    anyhow…

    great insights….