ArchiveUrlConnection and Performance [message #1817301] |
Wed, 20 November 2019 09:13 |
|
We have the usecase to read EMF models from within a JAR on the classpath on a project
there are two scenarios and they show tremendous different performance
(1) add the jar as "external jar" to the classpath
The URL looks then like archive:file:/somepath/somejar.jar!/somepack/somefile.someextension
(2) add the jar as "jar" to the classpath
The URL then looks like
archive:platform:/resource/simeproject/somepath/small.jar!/somejar.jar!/somepack/somefile.someextension
in case (1) ArchiveURLConnection uses ZIPFile.getEntry to retrieve the entry and InputStream
in case (2) a ZipInputStream is created and a linear search for the correct entry is done.
with a huge number of entries with a huge size each in the jar the (2) will be tremendously slower compared to (1)
Here are my Questions
(1) do i overlook something?
(2) could this mechanism be optimized to (a) try to get a zipfile anyway or (b) to use some buffering to maybe speed up the linear search.
Thanks
Christian
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
[Updated on: Wed, 20 November 2019 09:21] Report message to a moderator
|
|
|
|
|
|
Re: ArchiveUrlConnection and Performance [message #1817330 is a reply to message #1817328] |
Wed, 20 November 2019 14:10 |
Ed Merks Messages: 33218 Registered: July 2009 |
Senior Member |
|
|
Yes, reading all entries from an archive as separate resources will be n^2 cost. Of course the ones at the beginning can be accessed quickly and terminate, but the one at the end, will scan the whole resource first.
The code I pointed at that's used by Oomph keeps the bytes for all the entries in memory, weakly referenced, so if accessed quickly, the resource as a whole is only loaded once, and the entries can and will be garbage collected. This was done more from complaints about 100s of reads the setups.zip than it was for purely performance reasons, but it accomplished both goals. One problem that was reported is that during loading, the entire set of entries is strongly referenced, so it's possible to get an out-of-memory exception if the resource is really huge and you already have barely enough heap space left.
Ed Merks
Professional Support: https://www.macromodeling.com/
|
|
|
Powered by
FUDForum. Page generated in 0.03856 seconds