Page 1 of 1
Reading large text files sequentially
Posted: Fri Sep 07, 2018 11:15 am
by stevek2
Out app currently imports very large text files every day (circa 70MB) and sometimes double that during special periods.
These files range anywhere from 500,000 (daily) to 2,000,000 (periodically) rows
I am finding the process time for these files takes a very long tim - initially I thought this related to the nature of processing that our app applies to the data as it is read in.
However, when I disabled the process portion and simply read each line sequentially (and nothing else) using file.readLine, I discovered this appears to take quite some time to progress (more than I expected)
Is there a more efficient way to read these very large files sequentially other than via the readLine?
(each row is a fixed length)
Regards
Steve
Re: Reading large text files sequentially
Posted: Fri Sep 07, 2018 1:05 pm
by stevek2
Further testing reveals that on our test system (Jade7.1.09 Unicode), reading 510,000 rows sequentially - with no other processing - takes less than 30 seconds. Same test process on customer site (with same file), takes over 90 mins
Hence, must be environmental conditions on customer site
Re: Reading large text files sequentially
Posted: Fri Sep 07, 2018 1:26 pm
by Kevin
Perhaps the file is on a remote computer? Either being read via the presentation file system or a network drive?
This would slow it down significantly as it'll be reading the file line by line across the network.
Re: Reading large text files sequentially
Posted: Fri Sep 07, 2018 2:36 pm
by BeeJay
If the file is being read on the presentation client, and then processed on the AppServer, then that will add some overhead for getting the data transferred from the thin client to the AppServer node. This would be worse if their network connectivity is particularly bad/slow.
For example, reading a 500,000 line file:
SingleUser - took ~ 5s to read every line in the file (file was on an SSD drive)
ThinClient - high speed lan was going to take ~ 15 minutes. Was averaging around 1.8s per 1000 lines.
Thin client - fast(ish) wireless was going to take ~ 41 minutes. Was averaging just under 5s per 1000 lines.
Thin client - slow wireless was going to take ~ 9.4 hours. Was averaging around 68s per 1000 lines.
For interests sake, I repeated the same test using logic to read the file in chunks with 'readString' instead of doing readLines, and then using my own logic to parse these chunks into 'lines', including logic to handle lines which were split between chunks. This showed the following timings:
SingleUser - took less than 1s per run - it was averaging in the range of 0.70s and 0.75s
ThinClient - high speed lan took ~ 14s.
Thin client - fast(ish) wireless took ~ 15s.
Thin client - slow wireless tooks ~ 28s.
So the chunked approach is far less susceptible to the impact of network speed than readLine. If your reading of the file is via a thin client file system, you may want to consider using a similar chunked read approach to reduce the impact of slower connections.
Cheers,
BeeJay.
Re: Reading large text files sequentially
Posted: Tue Sep 11, 2018 10:58 am
by JohnP
You may also be able to speed it up by setting mode to Mode_Input and shareMode to Share_Read. Extra overheads can be incurred if these are not set, as JADE has to keep checking to see if the file has been updated by someone else.
https://www.jadeworld.com/docs/jade-201 ... remodefile
Re: Reading large text files sequentially
Posted: Mon Dec 10, 2018 5:47 pm
by stevek2
Thanks for all the responses - but have finally found the problem and solution - along the lines of BeeJay solution
The problem arose because we were importing UTF-8 files using readLine
Jade explicitily state in documentation that readLine on UTF-8 files becomes very (very) slow with large files (and suggest to use readString instead)
The problem does not arise when importing ASCII files
We could not use readString because the rows were not of a consistent length (due to error by the vendor that supplied the file)
When I changed to using readString (did not care about how the data appeared), the import process went very fast (as expected)
The import process and import file were both on the server.
Vendor has now fixed the file and we now use readString and all is good again (fast import)
Thanks again