Page 1 of 2

performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by John Munro >> Thu, 25 Oct 2007 20:50:54 GMT

We recently added a feature that exports objects from a database to an XML file which can then be loaded into other databases.

When this process was run as a thin client, it ran for several hours before the user gave up and aborted it.

When the process was run on the same database in single user mode, it took five minutes to run.

What is so different between single user and thin client modes that it makes that big a difference?

To give you an idea of what the process is doing, it goes through two stages - first of all collecting objects to export and then exporting the objects.

In the first stage it starts with the root object of the database and goes through its properties adding any referenced objects to a dictionary. It then goes through the objects in the dictionary and does the same thing - goes through their properties looking for objects and adding them to the dictionary.

A couple implementation points about this stage - it can't actually collect objects because you can't store a reference to an exclusive object. It therefore collects oids. It must ensure that the same object isn't collected more than once to avoid getting stuck in stage 1 infinitely. String arrays have worse and worse performance for the includes method the larger the collection gets, so instead of storing the oids in one of those, they are stored as the key of a DynaDictionary with a dummy member.

In the second stage, the dictionary is iterated through, and for each object it is added to a JadeXMLDocument. Each object's attributes are added as sub-elements, with the attribute value appropriately converted to String (e.g. base64EncodeNoCrLf for Binaries). The object's references are also added as sub-elements, with the value converted to String as an oid. Each object is only present in the XML file once.

My question is whether the huge time difference between single and multi-user is caused by too much transient usage (large DynaDictionary plus large JadeXMLDocument) or locking on the persistent objects or what.

I tried using the profiler but nothing in the results leapt out at me as an obvious problem.

John

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by Allistar >> Fri, 26 Oct 2007 0:03:42 GMT
We recently added a feature that exports objects from a database to an XML file which can then be loaded into other databases.

When this process was run as a thin client, it ran for several hours before the user gave up and aborted it.

When the process was run on the same database in single user mode, it took five minutes to run.

What is so different between single user and thin client modes that it makes that big a difference?

A lot of locking semantics are avoided in single user mode. The answer to your question depends on what else is happening on the Jade "network" at the time.
To give you an idea of what the process is doing, it goes through two stages - first of all collecting objects to export and then exporting
the objects.

Going through the objects to export would require getting a shared lock on the collection(s) the objects are in. This lock will queue if other processes are updating those collections at the same time. Also keep in mind that each access to the collection is a trip to the database server (unless you are foreaching without "discreteLock" specified, in which case the lock is held for the entire iteration).
In the first stage it starts with the root object of the database and goes through its properties adding any referenced objects to a dictionary. It then goes through the objects in the dictionary and does the same thing - goes through their properties looking for objects and adding them to the dictionary.

A couple implementation points about this stage - it can't actually collect objects because you can't store a reference to an exclusive object. It therefore collects oids. It must ensure that the same
object isn't collected more than once to avoid getting stuck in stage 1 infinitely.

I assume this is all done transiently, so the "getAtKey" or "includesKey" or "includes" you are doing to ensure you haven't already processed that object is only locking a local transient collection and not a persistent on (which would require a trip to the database server).
String arrays have worse and worse performance for the includes method the larger the collection gets, so instead of storing
the oids in one of those, they are stored as the key of a DynaDictionary with a dummy member.

I don't understand why you have oids involved in a StringArray, as they only contain String primitives.
In the second stage, the dictionary is iterated through, and for each object it is added to a JadeXMLDocument. Each object's attributes are added as sub-elements, with the attribute value appropriately converted to String (e.g. base64EncodeNoCrLf for Binaries). The object's references are also added as sub-elements, with the value converted to String as an oid. Each object is only present in the XML file once.

In my experience using Jade's native XML parsing/creating functionality can have a performance impact for large numbers of nodes. Keep in mind that each node and element is represented by a transient, and that transient occupies space in the nodes local transient cache. If that size of that cache is not large enough, they will spill out to disk which will kill performance. When running in thin client mode you are sharing the local cache with other processes, so this spilling will be more likely to happen.
My question is whether the huge time difference between single and multi-user is caused by too much transient usage (large DynaDictionary plus large JadeXMLDocument) or locking on the persistent objects or what.

The answer is: possibly and possibly.
I tried using the profiler but nothing in the results leapt out at me as an obvious problem.

Did you note the statistics at the end of the profile report? Cache overruns, objects locked, unlocked, get etc? They may reveal an issue.

Have you tried this process in fat client instead of thin client mode? For data intensive processes this can often improve performance and minimise the impact on other clients.

Also I would bump up the size of the transient cache for the node running the process and see what different that makes.

When running in single user and thin client, is this on the same computer? If not then the difference could be an environmental issue. Available RAM, disk speed, CPU cache, processor speed (and number of processors) all play a role - is one of those a bottle neck?

Out of interest, what is the size of the resultant XML file and how many nodes does it have? I understand there are XML parsing performance improvements in Jade 6.2, I'm not sure what version you are running, but you could try upgrading and analysing the difference.

You may find refactoring could improve performance. If you are doing the same operation over and over, it may be more efficient to cache the results the first time and use those cached results each subsequent time rather than doing the hard work again. I'm thinking things like metadata traversal etc.

Feel free to post (or send me) the profiler report so I can have a quick look at it and see if anything obvious crops up. The profile report for both thin and single-user modes would be helpful so a comparison can be made. My public (i.e. spammed) email address is "allistar@paradise.net.nz).
Regards,
Allistar.

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by John Munro >> Fri, 26 Oct 2007 15:50:44 GMT
A lot of locking semantics are avoided in single user mode. The answer to your question depends on what else is happening on the Jade "network" at the time.

The process takes a very long time to run thin client even if it's the only client running.
Going through the objects to export would require getting a shared lock on the collection(s) the objects are in. This lock will queue if other processes are updating those collections at the same time. Also keep in mind that each access to the collection is a trip to the database server (unless you are foreaching without "discreteLock" specified, in which case the lock is held for the entire iteration).

I don't think this is the problem, because it's still slow even when there aren't other processes that could be locking things.
I assume this is all done transiently, so the "getAtKey" or "includesKey" or "includes" you are doing to ensure you haven't already processed that object is only locking a local transient collection and not a persistent on (which would require a trip to the database server).

Yes, that's all transient
I don't understand why you have oids involved in a StringArray, as they only contain String primitives.

What I wanted to have was something like

if not objectset.includes(object) then
objectset.add(object);
endif;

but you can't do that with exclusive objects, so I then tried

string := getOidStringForObject(object);
if not stringarray.includes(string) then
stringarray.add(string);
endif;

but that has performance problems when the stringarray gets large, so I ended up with

string := getOidStringForObject(object);
if not dynadict.includesKey(string) then
dynadict.putAtKey(string, dummy);
endif;

To be honest describing the process for the newsgroup post made me realize that I can probably eliminate the whole getOidStringForObject/dynadict thing because I can refactor it so that exclusive objects don't need to be added to the list.

(I know objectset.includes(object) isn't the fastest way to do that, I have implemented a method safeAdd that uses an exception handler instead , that is much faster)
In my experience using Jade's native XML parsing/creating functionality can have a performance impact for large numbers of nodes. Keep in mind that each node and element is represented by a transient, and that transient occupies space in the nodes local transient cache. If that size of that cache is not large enough, they will spill out to disk which will kill performance. When running in thin client mode you are sharing the local cache with other processes, so this spilling will be more likely to happen.

Yes, I thought that might be a problem, but while the JadeXMLParser eliminates that problem when loading the XML, I couldn't find a better way when creating it.
When running in single user and thin client, is this on the same computer? If not then the difference could be an environmental issue. Available RAM, disk speed, CPU cache, processor speed (and number of processors) all play a role - is one of those a bottle neck?

Yes, it's the same computer
Out of interest, what is the size of the resultant XML file and how many nodes does it have? I understand there are XML parsing performance improvements in Jade 6.2, I'm not sure what version you are running, but you could try upgrading and analysing the difference.

Depending on the database being exported, a minimum of 20mb and 250,000 elements.
You may find refactoring could improve performance. If you are doing the same operation over and over, it may be more efficient to cache the results the first time and use those cached results each subsequent time rather than doing the hard work again. I'm thinking things like metadata traversal etc.

In general each database would only be exported once and then imported several times (used as a template) so I don't think anything can be cashed at the export end.

I think I'll have a go at changing the getOidStringForObject/dynadict code to object/objectset, and if it still suffers from the same problem I'll dive into the profiler reports.

Thanks,

John

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by Allistar >> Wed, 31 Oct 2007 23:29:31 GMT
A lot of locking semantics are avoided in single user mode. The answer to your question depends on what else is happening on the Jade "network" at the time.

The process takes a very long time to run thin client even if it's the only client running.
Going through the objects to export would require getting a shared lock on the collection(s) the objects are in. This lock will queue if other processes are updating those collections at the same time. Also keep in mind that each access to the collection is a trip to the database server (unless you are foreaching without "discreteLock" specified, in which case the lock is held for the entire iteration).

I don't think this is the problem, because it's still slow even when there aren't other processes that could be locking things.

There is a cost in acquiring the lock (a trip to the server at least, looking up look tables etc). I wouldn't expect the overhead of that to be huge unless you are doing a massive amount of locks and unlocks.

Out of interest, are you using "shared memory transport" between the app server and database server? If not you may find that lowers the total time taken.
I assume this is all done transiently, so the "getAtKey" or "includesKey" or "includes" you are doing to ensure you haven't already processed that object is only locking a local transient collection and not a persistent on (which would require a trip to the database server).

Yes, that's all transient
I don't understand why you have oids involved in a StringArray, as they only contain String primitives.

What I wanted to have was something like

if not objectset.includes(object) then
objectset.add(object);
endif;

but you can't do that with exclusive objects, so I then tried

string := getOidStringForObject(object);
if not stringarray.includes(string) then
stringarray.add(string);
endif;

but that has performance problems when the stringarray gets large, so I ended up with

string := getOidStringForObject(object);
if not dynadict.includesKey(string) then
dynadict.putAtKey(string, dummy);
endif;

To be honest describing the process for the newsgroup post made me realize that I can probably eliminate the whole getOidStringForObject/dynadict thing because I can refactor it so that exclusive objects don't need to be added to the list.

(I know objectset.includes(object) isn't the fastest way to do that, I have implemented a method safeAdd that uses an exception handler instead , that is much faster)

Good idea, although do a .includes in an ObjectSet should be pretty fast as it's essentially a MemberKeyDictionary keyed by oid.
In my experience using Jade's native XML parsing/creating functionality can have a performance impact for large numbers of nodes. Keep in mind that each node and element is represented by a transient, and that transient occupies space in the nodes local transient cache. If that size of that cache is not large enough, they will spill out to disk which will kill performance. When running in thin client mode you are sharing the local cache with other processes, so this spilling will be more likely to happen.

Yes, I thought that might be a problem, but while the JadeXMLParser eliminates that problem when loading the XML, I couldn't find a better way when creating it.

Can you confirm whether the transient cache is overflowing? Whether increasing it makes a difference on performance?
When running in single user and thin client, is this on the same computer? If not then the difference could be an environmental issue. Available RAM, disk speed, CPU cache, processor speed (and number of processors) all play a role - is one of those a bottle neck?

Yes, it's the same computer
Out of interest, what is the size of the resultant XML file and how many nodes does it have? I understand there are XML parsing performance improvements in Jade 6.2, I'm not sure what version you are running, but you could try upgrading and analysing the difference.

Depending on the database being exported, a minimum of 20mb and 250,000 elements.

Is this only for export/import between Jade database using the same schema? If so, and the xml is not manipulated or looked at by any other software, I'd suggest not using XML. You could come up with a much lighter, recursive format. You can make a lot of assumptions if the two schemas are identical (such as there is no need to explicitly mention the property name the value is for because you can assume the values are in the same order). Coming up with another format minimises the overhead of parsing the xml and removes the need for transients to be created (and kept) for the entire read of the file. I wouldn't be surprised if such a change resulted in an order of magnitude performance improvement.
You may find refactoring could improve performance. If you are doing the same operation over and over, it may be more efficient to cache the results the first time and use those cached results each subsequent time rather than doing the hard work again. I'm thinking things like metadata traversal etc.

In general each database would only be exported once and then imported several times (used as a template) so I don't think anything can be cashed at the export end.

Yet during one export you will be exporting multiple objects from the same class. If there are any lengthy operations to work out what to export for that class, you could cache the results of that so the next time you export an instance of that class you don't have to repeat the lengthy operation.
I think I'll have a go at changing the getOidStringForObject/dynadict code to object/objectset, and if it still suffers from the same problem I'll dive into the profiler reports.

--
A.

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by John Munro >> Fri, 2 Nov 2007 18:05:16 GMT
Out of interest, are you using "shared memory transport" between the app server and database server? If not you may find that lowers the total time taken.

Yes, we're using shared memory transport
Can you confirm whether the transient cache is overflowing? Whether increasing it makes a difference on performance?

A tmp file in the system directory got to 177Mb, which I thought meant that it was overflowing, but as you can see in the attached profiler log, it says "No cache overruns" and "String pool overruns: 0" so now I'm not sure...
Is this only for export/import between Jade database using the same schema? If so, and the xml is not manipulated or looked at by any other software, I'd suggest not using XML. You could come up with a much lighter, recursive format. You can make a lot of assumptions if the two schemas are identical (such as there is no need to explicitly mention the property name the value is for because you can assume the values are in the same order). Coming up with another format minimises the overhead of parsing the xml and removes the need for transients to be created (and kept) for the entire read of the file. I wouldn't be surprised if such a change resulted in an order of magnitude performance improvement.

That's a good idea, but unfortunately I don't have the time to invest in that at the moment
Yet during one export you will be exporting multiple objects from the same class. If there are any lengthy operations to work out what to export for that class, you could cache the results of that so the next time you export an instance of that class you don't have to repeat the lengthy operation.

That is a very good idea
I think I'll have a go at changing the getOidStringForObject/dynadict code to object/objectset, and if it still suffers from the same problem I'll dive into the profiler reports.

I've now done this, but it doesn't seem to have noticeably affected the time taken

You can see from the profiler logs, the time to run thin client was 220761 seconds, and single user was 699 seconds - an enormous difference

The profiler has too many JadeXML methods in it, you can't really see which of my methods are taking up all the time - I may move the profiler lines so it doesn't capture writing the XML file, since I can't optimize JadeXMLDocument::writeToFile.

I also forgot to stop some server apps that were running at the same time as the export so that may have polluted the profile - it says there were 69 committed transactions and the export doesn't update the database... I guess I'll run it again to get a clean profile, but 220761 seconds is two and a half days so it's a pain.

I am using serverExecution and beginLoad/endLoad to try and speed it up - do you think they could be slowing it down in this context?

Thanks,

John

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by dcooper@jade.co.nz >> Fri, 2 Nov 2007 18:21:25 GMT

Again, without digesting any of this too much. The "no cache overruns" in the profiler is referring to the interpreter's method cache. That's unrelated to the transient object cache which is what the tmp*.dat files are overflow files for. If your transient overflow file(s) are at 177MB, then you're definitely overflowing a bunch of transients somewhere. In the thin client case, are you using a single user application server or an application server connected to a database server? If the latter and your application is doing a lot of work with transients (which the 177MB overflow files would suggest), then the serverExecution methods could actually be hurting performance if they're having to ship lots of transients from the app server to the database server and back. Also, in this case, the size of the remoteTransientCache on the database server becomes very important (it's the cache that's used on the database server for any transients required by serverExecution methods). I think both of these points have already been mentioned in this discussion. If you are using an app server and a database server in your thin client example, suggest switching it to a single user app server and see if that makes any difference.

Dean.

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by John Munro >> Fri, 2 Nov 2007 18:51:19 GMT

Switching off serverExecution fixes it - thanks everyone

John

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by John Munro >> Fri, 2 Nov 2007 19:44:23 GMT
Yet during one export you will be exporting multiple objects from the same class. If there are any lengthy operations to work out what to export for that class, you could cache the results of that so the next time you export an instance of that class you don't have to repeat the lengthy operation.

That is a very good idea

Adding some basic metadata caching dropped the export time from 10 minutes to 8 minutes (after switching off serverExecution dropped it from 2.5 days).

The time savings should be a function of the number of instances exported per class and the number of properties per class exported, so for larger databases this should shave off a significant amount of time.

Thanks,

John

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by dcooper@jade.co.nz >> Sat, 27 Oct 2007 8:56:08 GMT

Without digesting all of your email, in the Thin Client example, was the XML file you're producing being written to the Thin Client machine or to the App Server machine? In the Single User (ie: Fat/Standard Client) case, it will effectively be the latter. In the Thin Client case, if you're writing the file to the Thin Client machine, all of the output data will have to be pushed across the network to the Thin Client, which could account for the performance difference.

Dean.

Re: performance question

Posted: Fri Aug 07, 2009 1:13 pm
by ConvertFromOldNGs
by John Munro >> Sun, 28 Oct 2007 0:57:13 GMT

Yes, the file is being written to the thin client machine, but that doesn't happen until the end of the process, and the earlier stages are slow. Also the db, app server and tc are all on the same machine so it shouldn't be a network bottleneck.

Thanks,

John