Reading large text files sequentially

stevek2 · Postby **stevek2** » Fri Sep 07, 2018 11:15 am

Out app currently imports very large text files every day (circa 70MB) and sometimes double that during special periods.

These files range anywhere from 500,000 (daily) to 2,000,000 (periodically) rows

I am finding the process time for these files takes a very long tim - initially I thought this related to the nature of processing that our app applies to the data as it is read in.

However, when I disabled the process portion and simply read each line sequentially (and nothing else) using file.readLine, I discovered this appears to take quite some time to progress (more than I expected)

Is there a more efficient way to read these very large files sequentially other than via the readLine?
(each row is a fixed length)

Regards
Steve

stevek2 · Postby **stevek2** » Fri Sep 07, 2018 1:05 pm

Further testing reveals that on our test system (Jade7.1.09 Unicode), reading 510,000 rows sequentially - with no other processing - takes less than 30 seconds. Same test process on customer site (with same file), takes over 90 mins

Hence, must be environmental conditions on customer site

Kevin · Postby **Kevin** » Fri Sep 07, 2018 1:26 pm

Perhaps the file is on a remote computer? Either being read via the presentation file system or a network drive?

This would slow it down significantly as it'll be reading the file line by line across the network.

BeeJay · Postby **BeeJay** » Fri Sep 07, 2018 2:36 pm

If the file is being read on the presentation client, and then processed on the AppServer, then that will add some overhead for getting the data transferred from the thin client to the AppServer node. This would be worse if their network connectivity is particularly bad/slow.

For example, reading a 500,000 line file:

SingleUser - took ~ 5s to read every line in the file (file was on an SSD drive)
ThinClient - high speed lan was going to take ~ 15 minutes. Was averaging around 1.8s per 1000 lines.
Thin client - fast(ish) wireless was going to take ~ 41 minutes. Was averaging just under 5s per 1000 lines.
Thin client - slow wireless was going to take ~ 9.4 hours. Was averaging around 68s per 1000 lines.

For interests sake, I repeated the same test using logic to read the file in chunks with 'readString' instead of doing readLines, and then using my own logic to parse these chunks into 'lines', including logic to handle lines which were split between chunks. This showed the following timings:

SingleUser - took less than 1s per run - it was averaging in the range of 0.70s and 0.75s
ThinClient - high speed lan took ~ 14s.
Thin client - fast(ish) wireless took ~ 15s.
Thin client - slow wireless tooks ~ 28s.

So the chunked approach is far less susceptible to the impact of network speed than readLine. If your reading of the file is via a thin client file system, you may want to consider using a similar chunked read approach to reduce the impact of slower connections.

Cheers,
BeeJay.

JohnP · Postby **JohnP** » Tue Sep 11, 2018 10:58 am

You may also be able to speed it up by setting mode to Mode_Input and shareMode to Share_Read. Extra overheads can be incurred if these are not set, as JADE has to keep checking to see if the file has been updated by someone else.

https://www.jadeworld.com/docs/jade-201 ... remodefile

stevek2 · Postby **stevek2** » Mon Dec 10, 2018 5:47 pm

Thanks for all the responses - but have finally found the problem and solution - along the lines of BeeJay solution

The problem arose because we were importing UTF-8 files using readLine
Jade explicitily state in documentation that readLine on UTF-8 files becomes very (very) slow with large files (and suggest to use readString instead)
The problem does not arise when importing ASCII files

We could not use readString because the rows were not of a consistent length (due to error by the vendor that supplied the file)

When I changed to using readString (did not care about how the data appeared), the import process went very fast (as expected)
The import process and import file were both on the server.

Vendor has now fixed the file and we now use readString and all is good again (fast import)

Thanks again

Forums

Reading large text files sequentially

Reading large text files sequentially

Re: Reading large text files sequentially

Re: Reading large text files sequentially

Re: Reading large text files sequentially

Re: Reading large text files sequentially

Re: Reading large text files sequentially

Who is online