The good ol’ DataReader vs DataSet debate

As a diversion from more weighty architecture issues, this question continues to pop up in mailing lists and forums. So, for posterity’s sake and so I don’t have to keep repeating myself…

A developer asks:

> Is it said that DataReader is the most efficient way of reading data, but
> passing them across tiers is not recommended.  I was wondering how would I
> take advantage of the efficiency in performance of datareader in a n-tier
> architecture?

Yes a datareader can be faster, perhaps as much as 4 times faster. Impact on the database however is probably impossible to measure compared to dataset because as you point out, the dataset fill methods internally use a datareader. The database is far more likely to be a bottleneck in your complete system than your .net code, so generally I find arguments about the efficiency of the datareader a little hard to justify.

Like most streams, datareaders have a critical life cycle that must be carefully maintained. This includes opening and closing the reader and the connection. It is this that should make you pause before considering passing it between tiers.

The main reason to use a datareader is to process data in a stream rather than in memory. You do this because you want to process things in such a way to minimize the total amount of memory used at one time. In this scenario, you can often have the database read take longer than you would if you had read it into memory.

Consider this:

open connection
execute query
fill dataset
close connection
loop through dataset
- handle each row of data to do some operation
dispose of dataset

Compared to:

open connection
execute query
loop through reader
- handle each row of data to do some operation
close reader
close connection

Which one will impact the database more? Which one will impact the server the .net code runs on? CPU on database? CPU on server? memory on server? Locking in DB? How does size of the data impact the balance? When you can comfortably answer all these questions, not hard to do really, you will have a much better grasp of the “right” solution in your specific situation. You should also consider than answer in the light of the realistic number of simultaneous users your system will have. Simple throughput speed based on a single user will not be a very good predictor of your application’s success or scalability!

Throw in the possibility that you can cache the dataset and eliminate the first 4 steps of the dataset psuedocode, and you get a somewhat different picture of the design trade offs involved.

So enough on the efficiency question. Between tiers, it is impossible to guarantee that a user of your datareader passed from your data tier


Comments are closed.