by
nik >> Mon, 25 Mar 2002 12:43:23 GMT
This deferred (or background) update approach is useful for cases were users don't need immediate visibility to the deferred updates. Examples, that come to mind are application level auditing that require updating global collections and deletes of complex structures such as component assemblies etc. Deletes of complex aggregate structures are reasonably common and users might not want to sit around waiting for the entire operation to complete. One way to handle this is to perform a 'logical delete' of the assembly in the online transaction by marking a top level object in some way and then employing a background sweeper process to perform the physical object deletes. Note, in some cases it might be advisable for the background sweeper process to break up long transactions into parts ensuring that collections are never locked for very long.
In this order entry example you would need to consider whether the order number was required at the time of order entry so that it can be provided to a client placing a phone order, for example. In this example, the background task has to assign the order numbers [Unless you chose to assign order numbers in a transaction preceding the order entry (this has problems of it's own),but it also does away with the need for a deferred update]. Does the order entry clerk sit on the call waiting for an asynch order number notification? On the other hand, this approach *might be* ok for a web ordering system, were the order is placed online and the order number is e-mailed back to the client some time later.
Before embarking on this design strategy, I would advise some form of 'ball park' performance analysis as to whether this approach is required. The analysis should take into account expected transaction sizes, volumes and response time requirements. You can actually come up with ball park estimates before writing a line of code. The more empirical data you can feed into the analysis based on current application statistics the better you can narrow the range between worst case / best case performance.
Craig mentioned in his post:
"My initial reaction to this problem would be to make setting the root reference the very last thing in the transaction then the collection would be locked for the minimum time, so as to avoid going down this path."
That is good advice and might be all that is required in many cases. This of course assumes you don't need to look up the collection earlier while in transaction state. If you do, then you would normally opt to exclusive lock the collection prior to the lookup to avoid upgrade deadlocks. In the order entry example: locking allOrders, obtaining the next order number for display and updating the allOrders collection could be made the last operations prior to commit.
As Craig also points out you need to take into account the time taken to commit a transaction, which you can measure using existing or specially constructed 'representative transactions' and an appropriate transaction driver. To do this properly you need to simulate a real transaction load since queuing effects will ensure the results are non-linear (5 transactions per second don't tend to arrive at precise 200ms intervals). You can also use well established analytical or simulation models based on queuing theory to compute results with whatever empirical data is at hand, if you don't have the "real thing" to measure. The commit time will be extended considerably if network latency is involved i.e. the processing node (e.g. app server) is separated from the database server by a network connection.
One more thought on 'lock exceptions'
Geoff mentioned:
All the collections are large but 'allOrders' could potentially become a bottleneck if orders come in thick and fast. The bottleneck would result in lock exceptions on the clients as they all attempt to update the 'allOrders' collection at the same time.
The emphasis on 'lock exceptions' disturbed me a bit. Don't response time requirements come first when considering application 'hot spots' or bottlenecks? A 'lock exception', means that a required lock wasn't acquired within an application (or system) specified timeout period, which I would expect to only happen in extreme (or out of band) cases. If there is a tendency for these to occur frequently, then acceptable response times are already "out the door", are they not?