size of instances, information/discussion

For questions and postings not covered by the other forums
ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

size of instances, information/discussion

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:59 am

by Paul Mathews >> Sun, 18 Feb 2001 13:49:56 GMT

I have recently done some testing to see why TimeSheet instances took
up 3k each.


Initially created one time sheet instance with 5 analysis collections empty.
Copy that 100 times.

Space taken up is 112kb.

Then for each instance
SPACE kb Increase
o - populate collection A, 2 entries 126 14
o - populate collection B, 1 entries 180 54
o - populate collection C, 2 entries 224 44
o - populate collection D, 6 entries 274 50
o - populate collection E, 8 entries 304 30

All collections have inverses.

A,B and C are sets.
D and E are member key dictionaries.

The smallest increase is 140 bytes for a collection and the largest is 500 bytes
per timesheet.

Because oids are small I incorrectly assumed small collections of oids also are small.

WRONG.

Any options now or in the future to support small collections would be appreciated.


Paul Mathews
pem@cmsystemsgroup.com.au
Phone: [612] (99717384) Fax[612] (99711679)
(Dee Why,Sydney,Australia)

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: size of instances, information/discussion

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:59 am

by krull >> Mon, 19 Feb 2001 8:12:05 GMT

[snip] .. [snip]
A,B and C are sets. D and E are member key dictionaries.

[snip]
The smallest increase is 140 bytes for a collection and the largest is 500 bytes per timesheet.

Because oids are small I incorrectly assumed small collections of oids also are small.

Perhaps you are assuming far more than you realise.

We could explain exactly how the space is allocated, but it is fairly tedious and might be more information than you need. For now, consider this overview:

1. The smallest block size for any collections is 4 entries and all collections begin life with a blocksize to contain exactly 4 entries. Collection blocks then grow to the MaxBlocksize attribute defined for the type (on CollClass) in the schema.
2. Each entry in a Set block takes up 10 bytes (6 byte OID + 4 byte address) 3. Each entry in a Dictionary block takes up 10 bytes plus the total size of all its keys concatenated together (fixed length)
4. Each entry in an Array block takes up exactly the size of the entry, which for an object array is 6 bytes.

Given the above, the space allocated to just the entries in a set block containing 1-4 entries is 4 * 10 = 40 bytes.
So where does the other 100 or so bytes go?

Well, like most things in JADE a Collection is an object, in fact each collection object is an aggregate of several objects. All collections that have been instantiated consist of a 'Collection Header' object that retains state, such as size, for the collection as a whole, and if entries have been added to the collection they will have one or more collection blocks, which guess what - are also objects. All persistent objects in JADE have a storage overhead over and above space allocated for attributes themselves. This 'overhead' includes such things as db header & trailers, an oid, edition, creation timestamp, replication info, version info etc. The standard object overhead is documented in one of the manuals. Exclusive collection headers actually use slightly less space than shared objects and collection blocks slightly less again (since they don't need the same version information).
Now, that's not all - most people forget to add in the overheads of various indexes and free space structures in the physical database file. All objects have an entry in a (BTree) index that is used to map OIDs to physical disk addresses.
WRONG.

Right!
Any options now or in the future to support small collections would be appreciated.

Since these collections are very small, why don't you use arrays?
This would save you around 14 bytes for sets and 14 + (4*concat key size) for mkey dicts
Adding entries to an array is as fast or faster than adding entries to a set or dictionary and the performance difference increases with size.
How do you access the collections? If you access them mostly sequentially using an iterator this will also be faster for an array?

Collection::includes will be slower for an array than set or member key dictionary, but not much slower for small collections.
Any key searches on a member key dict would need to be replaced with a linear search on the array, which wouldn't be so good if it was a common operation.

Are there things we could change in JADE to make 'small collections' use up less disk space, sure there are:
a) Perhaps we could allow a Coll MinBlockSize of 1, this would save a massive 30 bytes per set and even more for dictionaries (disk drive vendors will be shaking in their boots)
b) Perhaps we could implement a brand new form of 'compact collection' that uses up even less space
c) We could make some of the features that occupy space in objects optional so that if you don't need them you get the disk space back.

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: size of instances, information/discussion

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:59 am

by krull >> Mon, 19 Feb 2001 8:35:48 GMT
Since these collections are very small, why don't you use arrays? This would save you around 14 bytes for sets and 14 + (4*concat key size) for mkey dicts

Ooops, my arithmetic is as bad as my spelling and grammar - the above should read:

That would save you around 16 bytes per instance for small sets and 16 + (4 * concat key size) per instance for small mkey dicts, where small means 1-4 entries.

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: size of instances, information/discussion

Postby ConvertFromOldNGs » Fri Aug 07, 2009 12:00 pm

by Paul Mathews >> Mon, 19 Feb 2001 16:16:58 GMT
Since these collections are very small, why don't you use arrays?

I have just rerun the tests using arrays and there results were very
very similar. No worthwhile saving of space.

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: size of instances, information/discussion

Postby ConvertFromOldNGs » Fri Aug 07, 2009 12:00 pm

by Krull >> Tue, 20 Feb 2001 6:37:58 GMT
I have just rerun the tests using arrays and there results were very very
similar. No worthwhile saving of space.

The savings will be no more than 16 bytes per set or a total of 4.8KB for your 300 set instances. The amount you save on the dictionaries will depend on the key sizes. Another point to note is that the margin of error between theoretical and actual space utilisation will be greater if you don't compact the file after creating all your objects, this is especially true if collection blocks have been reallocated (they will be in your scenario)

Allthough arrays are worth considering for other reasons, I didn't really intend them to sound 'too attractive' space-wise. Perhaps you didn't pick up on my tongue in cheek remarks under the first of the JADE options allowing 1 entry collections:

"this would save a massive 30 bytes per set and even more for dictionaries (disk drive vendors will be shaking in their boots)"

You only need to do the multiplication to see how much this type of change could save overall.

One of the reasons behind limiting the smallest collection block size to 4 entries and controlling the growth sizes, is to limit the number of different size objects in a db files in order to reduce internal fragmentation. Whenever a collection block is resized after it has been sent to the server (committed or overflowed client cache), the prior edition collection block is deleted and a new block is allocated at a different location in the file. When a collection block is relocated, it can leave behind a 'hole' if it is not reused by an equal size object. This is one reason why you need to compact your file before looking at space usage.

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: size of instances, information/discussion

Postby ConvertFromOldNGs » Fri Aug 07, 2009 12:00 pm

by Paul Mathews >> Tue, 20 Feb 2001 10:06:26 GMT

After doing a compact a third of the space was saved.

Of your suggestions, b below definitely has my vote.

b) Perhaps we could implement a brand new form of 'compact collection' that uses up even less space
c) We could make some of the features that occupy space in objects optional
so that if you don't need them you get the disk space back.



Paul Mathews
pem@cmsystemsgroup.com.au
Phone: [612] (99717384) Fax[612] (99711679)
(Dee Why,Sydney,Australia)

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: size of instances, information/discussion

Postby ConvertFromOldNGs » Fri Aug 07, 2009 12:00 pm

by Craig Shearer >> Wed, 21 Feb 2001 2:13:03 GMT

I'd agree only if it doesn't impact performance, which it very well might. I'll take speed over space anytime!!!

Craig.


Return to “General Discussion”

Who is online

Users browsing this forum: No registered users and 6 guests