by
krull >> Mon, 19 Feb 2001 8:12:05 GMT
[snip] .. [snip]
A,B and C are sets. D and E are member key dictionaries.
[snip]
The smallest increase is 140 bytes for a collection and the largest is 500 bytes per timesheet.
Because oids are small I incorrectly assumed small collections of oids also are small.
Perhaps you are assuming far more than you realise.
We could explain exactly how the space is allocated, but it is fairly tedious and might be more information than you need. For now, consider this overview:
1. The smallest block size for any collections is 4 entries and all collections begin life with a blocksize to contain exactly 4 entries. Collection blocks then grow to the MaxBlocksize attribute defined for the type (on CollClass) in the schema.
2. Each entry in a Set block takes up 10 bytes (6 byte OID + 4 byte address) 3. Each entry in a Dictionary block takes up 10 bytes plus the total size of all its keys concatenated together (fixed length)
4. Each entry in an Array block takes up exactly the size of the entry, which for an object array is 6 bytes.
Given the above, the space allocated to just the entries in a set block containing 1-4 entries is 4 * 10 = 40 bytes.
So where does the other 100 or so bytes go?
Well, like most things in JADE a Collection is an object, in fact each collection object is an aggregate of several objects. All collections that have been instantiated consist of a 'Collection Header' object that retains state, such as size, for the collection as a whole, and if entries have been added to the collection they will have one or more collection blocks, which guess what - are also objects. All persistent objects in JADE have a storage overhead over and above space allocated for attributes themselves. This 'overhead' includes such things as db header & trailers, an oid, edition, creation timestamp, replication info, version info etc. The standard object overhead is documented in one of the manuals. Exclusive collection headers actually use slightly less space than shared objects and collection blocks slightly less again (since they don't need the same version information).
Now, that's not all - most people forget to add in the overheads of various indexes and free space structures in the physical database file. All objects have an entry in a (BTree) index that is used to map OIDs to physical disk addresses.
WRONG.
Right!
Any options now or in the future to support small collections would be appreciated.
Since these collections are very small, why don't you use arrays?
This would save you around 14 bytes for sets and 14 + (4*concat key size) for mkey dicts
Adding entries to an array is as fast or faster than adding entries to a set or dictionary and the performance difference increases with size.
How do you access the collections? If you access them mostly sequentially using an iterator this will also be faster for an array?
Collection::includes will be slower for an array than set or member key dictionary, but not much slower for small collections.
Any key searches on a member key dict would need to be replaced with a linear search on the array, which wouldn't be so good if it was a common operation.
Are there things we could change in JADE to make 'small collections' use up less disk space, sure there are:
a) Perhaps we could allow a Coll MinBlockSize of 1, this would save a massive 30 bytes per set and even more for dictionaries (disk drive vendors will be shaking in their boots)
b) Perhaps we could implement a brand new form of 'compact collection' that uses up even less space
c) We could make some of the features that occupy space in objects optional so that if you don't need them you get the disk space back.