Manual versus Automatic collections

allistar · Postby **allistar** » Fri Nov 26, 2010 8:50 am

We are going through a process of trimming down the number of collections in the database with the intention of reducing contention. We have a traditional rule that "every persistent reference or collection must have an inverse". We insist on this to maintain data integrity - dealing with collections that have holes in them is not nice. In many cases the collection on the "other" side of the inverse is used only for deletion checking. I.e. I can't delete this object if any of it's parent collections are not empty. We've realised that this policy isn't very good for performance - those collections can become a point of contention. The goal now is to remove the collections (and hence the inverse) and we don't allow the user to delete the object - instead they make it "inactive" and we hide it from them in most parts of the user interface.

It's a shame in a way to trade off elegance in design for performance, but at the end of the day it's the end user that matters, and so performance wins.
With the exception of "global" collections of "all instances" of the related class, in most cases the collections can be targetted in such a way as to reduce contention between users. In the 6.3 release, the option of using merged iterators means that "global" collections, which often were created mainly for reporting purposes, can now be removed in favour of a merged iterator when you need to report over "all instances" of a given class.

It's not so much an issue with global collections, it's collections on related objects where the collection exists purely for delete checking. Consider an example where you have a tax code in the system and every financial transaction has a reference to it via myTaxCode. This is inversed to allTransactions on TaxCode. For a lot of sites (especially in NZ) there is only really one tax code - NZGST. This means that in order to create a financial transaction, the TaxCode.allTransactions collection is locked for the duration of the database transaction (which could be in the order of 5 seconds because the vast amount of other data that is created/updated). The net effect is that creating financial transactions is effectively single threaded in this system because of the high contention on this otherwise pointless collection. The easiest fix is to recognise that it would be very rare for the TaxCode to be deleted - so we don't allow it to be and we just remove the collection. Viola! Contention problem fixed.

We have this exact scenario is quite a number of places in our product. It's not an issue with global collections but with collections on highly referenced "peripheral" objects.

A more elegant fix is to change the code so that the Transaction.myTaxCode reference is set at the very end of the database transaction, but that's requires fairly major rearchitecture.

Another option if you "must" have the global collections, is to consider using a single background process in "lazy update" mode to avoid the contention between multiple processes. Such lazy updaters will normally keep up with a significant number of interactive users with very little lag time before the newly created instances are in the collection. This doesn't help when you're updating the key value on a MKD inverse, but for the most part this is a far less frequent activity than the adding of new entries. Care does need to be taken to handle situations where two people simultaneously create an entry whereby the "lazy update" would result in duplicate key exceptions.

Yes, we have considered background updating but it would be as a last resort as it opens up another can of worms. What if the updating process isn't running? Gets an exception? Takes too long? Instead we are making changes where we recognise which updates are allowed to be committed even if the main database transaction fails. In such cases we do the "safe" update in a separate small database transaction. This in turn greatly relives contention on the areas updated as those areas are locked for a very short amount of time.

If the contention is between readers and an updater, rather than multiple updaters, then consideration could also be given to using the new Update Lock option to allow other non-updating processes to continue to read the affected collection(s) for as long as possible until such time as the transaction has to be committed.

ie: I'd look to find other ways of reducing contention before I started risking referential integrity with non-inversed references, especially if that means going to manually maintained collections over objects where the key value can and does change, and would only ever consider this as an absolute last resort option.

We have identified what we call "low hanging fruit" which are changes that are easy to implement with minimal impact that will reduce contention. Remove some of these unnecessary collections falls into this category. It's by far the easiest way to get a "win" in this area.

The primary concern regarding contention is between multiple updaters. We have designed a strategy where we now make much better use of reserve locks and update locks. The biggest single issue for this particular product is that we have no well defined transaction agent, or controller layer (gasp!), so we're retrospectively putting one in. As you can imagine with a code base of over 2 million lines of code, this isn't a small task! Once we have a decent MVC layer in place then dealing with contention on a generic basis becomes significantly easier. On tests I have done on a prototype I can more than double the throughput for high users counts when using such a strategy (albeit on a very contrived example).

On a similar vein I would be curious to know what people think of the man/auto feature, I personally advocate avoiding it where possible as I think forcing the developer to think about which side is auto and which is manual results in a better design. I'm sure there are plenty of exceptions to this, but I struggle to come up with real world examples.
I'm with you and Allistar on this one. There should be very few occasions at all where a man/auto inverse is ever justifiable. The lack of thought in this area rates right up there with code such as the following:

Code: Select all
vars didBeginTrans : Boolean; begin if not process.isInTransactionState then beginTransaction ; didBeginTrans := true ; endif; ... if didBeginTrans then commitTransaction ; endif; end;

Yes, transaction state checking is nasty.

BeeJay · Postby **BeeJay** » Fri Nov 26, 2010 9:45 am

It's not so much an issue with global collections, it's collections on related objects where the collection exists purely for delete checking. Consider an example where you have a tax code in the system and every financial transaction has a reference to it via myTaxCode. This is inversed to allTransactions on TaxCode. For a lot of sites (especially in NZ) there is only really one tax code - NZGST. This means that in order to create a financial transaction, the TaxCode.allTransactions collection is locked for the duration of the database transaction (which could be in the order of 5 seconds because the vast amount of other data that is created/updated). The net effect is that creating financial transactions is effectively single threaded in this system because of the high contention on this otherwise pointless collection. The easiest fix is to recognise that it would be very rare for the TaxCode to be deleted - so we don't allow it to be and we just remove the collection. Viola! Contention problem fixed.

In that situation, I'd agree with you that it's justifiable to not inverse the myTaxCode and to disallow the physical deletion of the TaxCode object. I do use that option in some of my systems, and have the concept of logically deleting/deactivating the "Code" so it's no longer available for selection either programmatically or in ComboBoxes etc.

A more elegant fix is to change the code so that the Transaction.myTaxCode reference is set at the very end of the database transaction, but that's requires fairly major rearchitecture.

I'll give the junior developers on my team a proverbial "slap on the hand" if they don't follow this coding convention in our system. Especially so if they set the value used as the key on the MKD inverse after they've set the myXXXX value. Wouldn't it be nice to have 20-20 hindsight vision to be able to implement all code from the beginning how you'd now do it 10 years later - ignoring features that weren't available back then of course!

The biggest single issue for this particular product is that we have no well defined transaction agent, or controller layer (gasp!), so we're retrospectively putting one in. As you can imagine with a code base of over 2 million lines of code, this isn't a small task! Once we have a decent MVC layer in place then dealing with contention on a generic basis becomes significantly easier. On tests I have done on a prototype I can more than double the throughput for high users counts when using such a strategy (albeit on a very contrived example).

Ah, no transartion agent layer would indeed make it harder to implement this.

Cheers,
BeeJay.

allistar · Postby **allistar** » Fri Nov 26, 2010 10:13 am

A more elegant fix is to change the code so that the Transaction.myTaxCode reference is set at the very end of the database transaction, but that's requires fairly major rearchitecture.
I'll give the junior developers on my team a proverbial "slap on the hand" if they don't follow this coding convention in our system. Especially so if they set the value used as the key on the MKD inverse after they've set the myXXXX value. Wouldn't it be nice to have 20-20 hindsight vision to be able to implement all code from the beginning how you'd now do it 10 years later - ignoring features that weren't available back then of course!

Coding it like this for a single method is easy. But if that method is, say, the 2nd method of 50 methods that execute for this particular database update, then moving the reference setting to the end of the 2nd method call won't achieve a lot given there are 48 more methods to execute before the transaction is committed. What's needed is moving the reference set (and hence the collection lock) to the end of the database transaction, not the end of the method that would typically set it. The issue then is that one of the subsequent 48 method calls may expect that reference to be set and a 1090 will result. This is the "fairly major rearchitecture" I was referring to. It's possible to do it, and we have implemented a generic system that does this. The results from a performance point of view are dramatic. In an environment that was on the verge of lock meltdown (queued locks were lasting for nearly the maximum of 15 seconds), when using the new architecture in the same usage scenario the queued locks lasted a maximum of 9 seconds. The impact this has on scalability is quite significant. A nice side effect is that deadlocks are much easier to manage as we effectively explicitly lock the objects ourselves, and hence can control the order they are done in on a transaction wide basis.

It's all been interesting stuff - it makes you realise how seemingly simple code can have a large impact on contention in a heavily used system. It's one of the reasons why I'm a strong advocate of peer review on all changes.

Cheers,
Allistar.

Forums

Manual versus Automatic collections

Re: Manual versus Automatic collections

Re: Manual versus Automatic collections

Re: Manual versus Automatic collections

Who is online