I'm actually seeing near constant times with both Torrie's method, and your modification of Torrie's method, which is kind of what I was expecting. Presumably checking if two objects are equal doesn't require the objects in question to be brought into the cache, which explains why your modified version of Torrie's method was no more efficient. I had incorrectly assumed they must be getting brought back into cache based on your initial timings.I think we all agree that getAtKey and getAtKeyGtr are quick, but the controversial issue seems to be whether to use Iterator.next or indexOf to count the objects between the first and last one.
With a low number of objects with matching keys the two methods are hard to distinguish, but the Iterator.next method seems to get slower the more objects it has to count.
I benchmarked the two methods using a collection of 50000 objects with 9000 objects matching the key being counted, a cold cache and running the method 1000 times.
Using indexOf it took a total of 562ms, and using Iterator.next it took 34.6s.
EDIT: indexNear is even faster at 108ms and (at least in my test) had the same result as indexOf so didn't appear to lose any accuracy. I can't see anywhere in the documentation how approximate the value is supposed to be.
My timings, with 1,000,000 objects in the MKD, with a single string key of length 30, and 5,000 objects per key, shows the following timings for getting the counts. Interestingly your original method is always out by 1 on the count as well.
Match on entries 5,001 to 10,000 in the collection
Code: Select all
John #1 Count: 4999 Elapsed: 7999ms
Torrie method Count: 5000 Elapsed: 416ms
John #2 (Torrie) Count: 5000 Elapsed: 450ms
Code: Select all
John #1 Count: 4999 Elapsed: 14920ms
Torrie method Count: 5000 Elapsed: 275ms
John #2 (Torrie) Count: 5000 Elapsed: 247ms
Code: Select all
John #1 Count: 4999 Elapsed: 7392ms
Torrie method Count: 5000 Elapsed: 315ms
John #2 (Torrie) Count: 5000 Elapsed: 369ms
Note: Interestingly, the indexOf doesn't always track linearly upwards. Running for objects 250,001 to 255,000 took 11,211ms whereas for objects 750,001 to 755,000 took only 4,135ms? Running it for objects 500,001 to 505,000 took only 566ms?!? Regardless of these anomalies, in my testing the Iterator approach was consistent time-wise and also the fastest in each case that I tested so personally I'd go with that approach.
Cheers,
BeeJay.