counting objects with a key in a dictionary

Forums for specific tips, techniques and example code
User avatar
ghosttie
Posts: 181
Joined: Sat Aug 15, 2009 1:25 am
Location: Atlanta, GA, USA
Contact:

counting objects with a key in a dictionary

Postby ghosttie » Fri Mar 16, 2012 3:02 am

I want a generic method on MemberKeyDictionary to count how many objects in the dictionary that have a specified key. This is what I arrived at:

Code: Select all

countKey(key : KeyType) : Integer; vars iStart, iEnd : Integer; oStart, oEnd : Object; begin oStart := getAtKey(key); if oStart = null then return 0; endif; iStart := indexOf(oStart); oEnd := getAtKeyGtr(key); if oEnd = null then return (size - iStart) + 1; endif; iEnd := indexOf(oEnd) - 1; return iEnd - iStart; end;
Does anyone see any problems with that or have any suggestions for improving performance?

I know performance could be improved in exchange for giving up accuracy by using indexNear instead of indexOf, but in this context I want accuracy.
I have a catapult. Give me all the money or I will fling an enormous rock at your head.

torrie
Posts: 92
Joined: Fri Aug 14, 2009 11:24 am

Re: counting objects with a key in a dictionary

Postby torrie » Fri Mar 16, 2012 8:14 am

I tried the following. Unfortunately there is no current way in Jade to compare keys for a dictionary, hence the GetAtKeyLeq on line 15. I'm assuming that comparing keys would be quick in comparison to a getAtKey type call.

Not sure if a startAtObject would be faster than at startKeyGeq method. Assume that startAtObject retrieves the key values and then does a getAtKey internally (at least I hope it's not a linear search for Dictionaries!)

As to whether it's more efficient. I guess it depends on the relative efficiency of the getAtKey calls vs the IndexOf. I'm assuming that the IndexOf will be quite slow for large collections (as it's probably a linear search.) The efficency would also be affected by what percentage of the collection you needed to transverse to count the entries.

You may need to do some performance testing on your collections to see which is best.

Code: Select all

countKeys_TM(key : KeyType) : Integer; vars iCount : Integer; oMember, oEnd, oStart : MemberType; oIter : Iterator; oSelf : Dictionary; begin oStart := getAtKey( key ); if oStart = null then iCount := 0; else oIter := createIterator; oEnd := getAtKeyGtr(key); startKeyGeq( key, oIter ); while oIter.next( oMember ) and oMember <> oEnd do iCount := iCount + 1; endwhile; endif; return iCount; epilog delete oIter; end;

murray
Posts: 144
Joined: Fri Aug 14, 2009 6:58 pm
Location: New Plymouth, New Zealand

Re: counting objects with a key in a dictionary

Postby murray » Fri Mar 16, 2012 9:42 am

Member Key Dictionaries use a BTree index, so there shouldn't be anything that needs to do a linear search. indexOf() may be an exception, which is probably why jade includes indexNear() and indexNear64(). However, all things being equal, I would expect a getAtKey() type operation which has to traverse the index nodes from the root to be less efficient than a sequential read (Iterator.next). Calling indexOf() more than once would not be a good idea. Can you not use an Iterator?

[Edit: with a closer look at torrie's example I see that an Iterator is used, and 2 key lookups which avoids the indexOf() calls. That would appear to be a good way to go if you want to avoid trying to compare actual key values].

I would presume that, conceptually, the most efficient method to count duplicate values would be:
(1) seek to the key value using getAtKeyGeq().
(2) read sequential index entries (not reading referenced objects) until the key values change

I can vouch for the fact that startAtObject() reads the key values and then uses them to do a key-based lookup, as I helped troubleshoot a bug we had where the key values had been changed on an object in a collection without inverses, so iteration using an Iterator gave different results to the startAtObject() call, or something like that.
Murray (N.Z.)

User avatar
ghosttie
Posts: 181
Joined: Sat Aug 15, 2009 1:25 am
Location: Atlanta, GA, USA
Contact:

Re: counting objects with a key in a dictionary

Postby ghosttie » Fri Mar 16, 2012 11:35 am

I think we all agree that getAtKey and getAtKeyGtr are quick, but the controversial issue seems to be whether to use Iterator.next or indexOf to count the objects between the first and last one.

With a low number of objects with matching keys the two methods are hard to distinguish, but the Iterator.next method seems to get slower the more objects it has to count.

I benchmarked the two methods using a collection of 50000 objects with 9000 objects matching the key being counted, a cold cache and running the method 1000 times.

Using indexOf it took a total of 562ms, and using Iterator.next it took 34.6s.

EDIT: indexNear is even faster at 108ms and (at least in my test) had the same result as indexOf so didn't appear to lose any accuracy. I can't see anywhere in the documentation how approximate the value is supposed to be.
I have a catapult. Give me all the money or I will fling an enormous rock at your head.

User avatar
BeeJay
Posts: 312
Joined: Tue Jun 30, 2009 2:42 pm
Location: Christchurch, NZ

Re: counting objects with a key in a dictionary

Postby BeeJay » Fri Mar 16, 2012 12:17 pm

Is a match count of 1,000 in a collection of only 50,000 really a reasonable test?

Which comes out better is really going to depend on the size of the collection, the number of matches, and the position within the collection for the indexOf calls. For example, if you have a much larger collection with fewer matches, and the objects that you want to get the indexOf are near the end of the collection, then the linear search to evaluate the two indexOf calls is going to have much more of an impact on the overall time than a few iter.next calls.

If there was a method on the iterator called say "doKeysMatch" that you could pass a KeyType to, and assuming it doesn't retrieve the underlying object in the same way as the Iterator::getCurrentKeys method doesn't retrieve the underlying object, then that would probably make the iterator the best option even for smaller collections with a large number of matches.

Cheers,
BeeJay.

torrie
Posts: 92
Joined: Fri Aug 14, 2009 11:24 am

Re: counting objects with a key in a dictionary

Postby torrie » Fri Mar 16, 2012 1:13 pm

From what I understand, the getCurrentKeys does not retrieve the object, just the keys. (the same as getCurrentKey(n))

Initially I wanted to test the keys (rather than use the getAtKey call to find the last entry .) However, I don't know of a way to compare keys. (A CompareKeys( keyA, keyB : KeyType) on Dictionary that returns Less than, equal or Greater than would be useful.) I also looked for a method that would retrieve the current key from a KeyType variable but I didn't find one.

In some testing I did here (about 10000 items and 100 or so matches), I noticed that the time required for the indexOf calls varied from call to call. It may be a caching symptom in Jade 6.3. The time taken for the getAtKey approach was much more consistent in terms of execution time. It may be that the indexOf call requires all collection blocks to be loaded into the cache, where getAtKey just loads those blocks that are needed. If that is the case, it may be better to use the iterator approach if the collection you're counting is likely to fall out of the cache.

It sounds like the best approach would be to do some performance testing on a test environment and then put the chosen method into production and see if performance is acceptable.

allistar
Posts: 156
Joined: Fri Aug 14, 2009 11:02 am
Location: Mount Maunganui, Tauranga

Re: counting objects with a key in a dictionary

Postby allistar » Fri Mar 16, 2012 3:17 pm

From what I understand, an indexOf on a dictionary requires Jade to walk the collection from the start until it finds the object in question. For this reason it's expensive for large collections.

User avatar
BeeJay
Posts: 312
Joined: Tue Jun 30, 2009 2:42 pm
Location: Christchurch, NZ

Re: counting objects with a key in a dictionary

Postby BeeJay » Fri Mar 16, 2012 5:15 pm

From what I understand, the getCurrentKeys does not retrieve the object, just the keys. (the same as getCurrentKey(n))
Correct. That's what I was meaning, but I've edited my message to make it clear that's what I was meaning. :)

Cheers,
BeeJay.

User avatar
BeeJay
Posts: 312
Joined: Tue Jun 30, 2009 2:42 pm
Location: Christchurch, NZ

Re: counting objects with a key in a dictionary

Postby BeeJay » Fri Mar 16, 2012 5:17 pm

From what I understand, an indexOf on a dictionary requires Jade to walk the collection from the start until it finds the object in question.
Correct. That's what I call a linear search. Start at the beginning and go through everything until you get to desired object. :)

Cheers,
BeeJay.

User avatar
ghosttie
Posts: 181
Joined: Sat Aug 15, 2009 1:25 am
Location: Atlanta, GA, USA
Contact:

Re: counting objects with a key in a dictionary

Postby ghosttie » Sat Mar 17, 2012 12:52 am

So Iterator.next is slow if there are a large number of objects with a matching key and indexOf is slow if the matching objects are near the end of a large collection.

I think my benchmark was biased in that it ran the countKey method multiple times, so the collection blocks were kept in the cache which made indexOf seem faster than it would be in real life.

I actually found a way to compare the keys when iterating instead of comparing the current object, but it actually seems to be significantly slower (1.5m per 1000 runs).

It turns out that you can use app.getParamListTypeEntry etc. on KeyType as well as ParamListType (presumably because the documentation says that ParamListType is a more general kind of KeyType).

Code: Select all

countKey(key : KeyType) : Integer; vars o : MemberType; iCount, iKeys, iKey : Integer; iter : Iterator; bMatch : Boolean; begin iter := createIterator; startKeyGeq(key, iter); iKeys := app.getParamListTypeLength(key); while iter.next(o) do bMatch := true; foreach iKey in 1 to iKeys do if iter.getCurrentKey(iKey) <> app.getParamListTypeEntry(iKey, key) then bMatch := false; break; endif; endforeach; if bMatch then iCount := iCount + 1; else break; endif; endwhile; return iCount; epilog delete iter; end;
I have a catapult. Give me all the money or I will fling an enormous rock at your head.


Return to “Tips and Techniques”

Who is online

Users browsing this forum: No registered users and 20 guests