ObjectSpaces followup

I boldly have posted where only a newbie can go a couple of weeks ago on my notes from the ObjectSpaces talks at the PDC. There were some comments made that got me wondering if I was dreaming it all and had gotten it wrong, so I went ahead and followed up to find out.  Thanks to the laudible openess that Microsoft has been extending across it’s whole developement community, I got some very good answers, and I think a much better understanding of how OS will work, and maybe more importantly where it will fit in the panopoly of ORM tools.  Having that understanding has made me anticipate the release much more than I had, because I can see exactly how I would use it, where I might have used techniques based on datasets in the past.

First of all, on the more controversial “shortcomings” I identified, I was essentially correct in my understanding of how OS works.  OPath is implemented in T-SQL, even more so than I guessed. And OS does have to keep a *copy* of all the data used to create a persistable collection to determine whether or not anything in the copy has changed in order to persist the changes without the objects themselves “telling” OS they’ve changed. But as always, the devil’s in the details, and thanks to some detailed messages from Microsoft’s Andrew Conrad, I now have more of these to help paint a fuller picture.

OPath is executed in T-SQL might be a better description. The way he described it, the syntax is parsed and turned in a graph of tuples which at the last step is turned into T-SQL and executed. This design should enable to them to implement the last step in some other way in the future, including a design based on the in memory object graph.  However it won’t be in the OS 1.0 release timeframe.  So, if you use OPath, you must know that you will hit the database, and you will get a new collection.  That led to the fairly obvious question of object identity. I gave an example of a problem of finding the intersection of two sets, on of which you already have one in memory, and one of which you would find with OPath.  If I had implemented this, I would expect that even if two object instances were created from the the same data, object a1, and object a2, a1 == a2 would not be true, but a1.Equals(a2) would.  That’s a heavy burden, but OS has provisions to preserve object identity so that on subsequent OPath queries, a1 == a2 would be true allowing simple and fast comparisons.  This is accomplished, in ways I haven’t attempted to understand yet, with the ObjectManager. Score one for ObjectSpaces.

The other point that really bothered my was the idea that in order to track changes to a collection of objects, OS had to keep a copy of the original data, essentially doubling the memory needed to allow seemless object persistence.  The answer, while still bothersome in some ways, is that the original data is not kept as a copy of the objects in the collection, but instead as a Value object that was created from the datasource as the object graph is materialized.  This technique results in a lower memory footprint with a lot of experience to show that the technique is feasible.  Why? Because it is the same technique the ADO DataSet uses to track changes allowing the DataAdaptor to do it’s thing. We have a lot of experience with DataSet at this point, and most of us have a pretty good feel for where we can and cannot use DataSet on large collections of data. That makes the design decisions easier, but there is still a problem.  The technique can’t scale upward indefinitely.  If you base your design on automatic object persistence, at some point the operation will become to large and your system will run out of memory.  This is where it got interesting, but more hands on experience will be required to fully digest.

First, the feature used to track changes, and as a consequence store a Value object, can be turned off.  In the case where you are building largeish collections that won’t change, or for which you will handle changes outside the OS persistence, you can gain some of the memory back.  Score 2 for ObjectSpaces, and I wish I could do that with the dataset where I often cache a copy and use expressions to get filtered tables, but never plan to update. The next part of this though, is that in reality, on large data sets, you will run into memory constraints whether or not you use ObjectSpaces, but simply decide to have an in memory graph of objects.  On to the next interesting piece.

Here I will just quote what Andrew had to say

At an high level, ObjectSpaces will have two modes - a streaming mode and a caching mode.  In the streaming mode, the user will work entirely with readers and will entirely own the lifetime of the object.  In other words, once the object is materialized, ObjectSpaces will no longer know at all about the object.  That however has some down side in that if the user wants to update the object or utilized an object identity map, they will have to write that code them self or utilize the support of ObjectSpaces.  That is why we have tried to provide explicit interfaces to all ObjectSpaces services so that the user can pick and choose which
ones to use with a knowledge of the overhead required.  I believe this will allow most users of large amounts of data in a hybrid streaming/caching mode and develop solutions which are at least as scalable and performant as their overall architecture will allow.

Pure caching mode will cache all data and will be similar to what is provided by the dataset.  That is it is intended for smaller sets of data where the user has chosen ease of use and functionality over performance.  In cases the amount of data being worked on is small, the difference in performance will be minimal.”

With this knowledge, and hands on experience of course, an architect can probably make some good choices about where ObjectSpaces can fit into the wide variety of applications we are asked to write. At the smaller end of the spectrum, in terms of the amount of data in the scope of the application or service, ObjectSpaces has the potential to really speed up development, and still preserve an elegant and first class object oriented style. You will not really appreciate how important the design decision to not force developers to implement interfaces and proxies, ala EJB, unless you have lived through it, and this will be a major productivity gain.  And, I seems to me that the story of what happens as you increase the amount of data will be more influenced by the decision of in memory collections vs streams than by ObjectSpaces or no, while ObjectSpaces will support you to some degree in either approach. Score 3.

So, like asp.net Whidbey, I can't wait.


Comments are closed.