counting objects with a key in a dictionary

Forums for specific tips, techniques and example code
User avatar
BeeJay
Posts: 312
Joined: Tue Jun 30, 2009 2:42 pm
Location: Christchurch, NZ

Re: counting objects with a key in a dictionary

Postby BeeJay » Tue Mar 20, 2012 3:14 pm

I think we all agree that getAtKey and getAtKeyGtr are quick, but the controversial issue seems to be whether to use Iterator.next or indexOf to count the objects between the first and last one.

With a low number of objects with matching keys the two methods are hard to distinguish, but the Iterator.next method seems to get slower the more objects it has to count.

I benchmarked the two methods using a collection of 50000 objects with 9000 objects matching the key being counted, a cold cache and running the method 1000 times.

Using indexOf it took a total of 562ms, and using Iterator.next it took 34.6s.

EDIT: indexNear is even faster at 108ms and (at least in my test) had the same result as indexOf so didn't appear to lose any accuracy. I can't see anywhere in the documentation how approximate the value is supposed to be.
I'm actually seeing near constant times with both Torrie's method, and your modification of Torrie's method, which is kind of what I was expecting. Presumably checking if two objects are equal doesn't require the objects in question to be brought into the cache, which explains why your modified version of Torrie's method was no more efficient. I had incorrectly assumed they must be getting brought back into cache based on your initial timings.

My timings, with 1,000,000 objects in the MKD, with a single string key of length 30, and 5,000 objects per key, shows the following timings for getting the counts. Interestingly your original method is always out by 1 on the count as well.

Match on entries 5,001 to 10,000 in the collection

Code: Select all

John #1 Count: 4999 Elapsed: 7999ms Torrie method Count: 5000 Elapsed: 416ms John #2 (Torrie) Count: 5000 Elapsed: 450ms
Match on entries 490,001 to 495,000 in the collection

Code: Select all

John #1 Count: 4999 Elapsed: 14920ms Torrie method Count: 5000 Elapsed: 275ms John #2 (Torrie) Count: 5000 Elapsed: 247ms
Match on entries 990,001 to 995,000 in the collection

Code: Select all

John #1 Count: 4999 Elapsed: 7392ms Torrie method Count: 5000 Elapsed: 315ms John #2 (Torrie) Count: 5000 Elapsed: 369ms
I was restarting my Jade environment between runs to ensure an "empty" Jade cache for each run. I wonder what in your environment was resulting in the iter.next approach blowing out to 34,600ms when in my test for 5000 matches it's always < 500ms regardless of where in the collection we start/finish?

Note: Interestingly, the indexOf doesn't always track linearly upwards. Running for objects 250,001 to 255,000 took 11,211ms whereas for objects 750,001 to 755,000 took only 4,135ms? Running it for objects 500,001 to 505,000 took only 566ms?!? Regardless of these anomalies, in my testing the Iterator approach was consistent time-wise and also the fastest in each case that I tested so personally I'd go with that approach.

Cheers,
BeeJay.


Return to “Tips and Techniques”

Who is online

Users browsing this forum: No registered users and 21 guests