Eclipse Community Forums: EMF » ArchiveUrlConnection and Performance

Help

Home

Home » Modeling » EMF » ArchiveUrlConnection and Performance

Show: Today's Messages :: Show Polls :: Message Navigator

ArchiveUrlConnection and Performance [message #1817301]

Wed, 20 November 2019 09:13

Christian Dietrich

Messages: 14716
Registered: July 2009

Senior Member

We have the usecase to read EMF models from within a JAR on the classpath on a project

there are two scenarios and they show tremendous different performance

(1) add the jar as "external jar" to the classpath
The URL looks then like archive:file:/somepath/somejar.jar!/somepack/somefile.someextension
(2) add the jar as "jar" to the classpath
The URL then looks like
archive:platform:/resource/simeproject/somepath/small.jar!/somejar.jar!/somepack/somefile.someextension

in case (1) ArchiveURLConnection uses ZIPFile.getEntry to retrieve the entry and InputStream
in case (2) a ZipInputStream is created and a linear search for the correct entry is done.

with a huge number of entries with a huge size each in the jar the (2) will be tremendously slower compared to (1)

Here are my Questions

(1) do i overlook something?
(2) could this mechanism be optimized to (a) try to get a zipfile anyway or (b) to use some buffering to maybe speed up the linear search.

Thanks
Christian

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de

[Updated on: Wed, 20 November 2019 09:21]

Report message to a moderator

Re: ArchiveUrlConnection and Performance [message #1817320 is a reply to message #1817301]

Wed, 20 November 2019 11:57

Ed Merks

Messages: 33218
Registered: July 2009

Senior Member

1) That sounds like how it works.
2) A zip file only works for a java.io.File. So when one uses a platform:/resource URI, one must read the stream through the workspace APIs. One could determine the underlying java.io.File location of the IFile and of course most often (but not in general), there is an underlying java.io.File. But then you are directly reading the file system, not accessing the IFile's contents. I don't know how buffering will help a linear search; you mean if you search repeatedly?

In Oomph we read the setups.zip a lot, so we have code like this:

https://git.eclipse.org/c/oomph/org.eclipse.oomph.git/tree/plugins/org.eclipse.oomph.setup.core/src/org/eclipse/oomph/setup/internal/core/util/SetupCoreUtil.java#n539

Ed Merks
Professional Support: https://www.macromodeling.com/

Report message to a moderator

Re: ArchiveUrlConnection and Performance [message #1817327 is a reply to message #1817320]

Wed, 20 November 2019 13:44

Christian Dietrich

Messages: 14716
Registered: July 2009

Senior Member

we read all resources
thus the complete bug fat jar will be read for all reources inside the jar

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de

Report message to a moderator

Re: ArchiveUrlConnection and Performance [message #1817328 is a reply to message #1817327]

Wed, 20 November 2019 13:57

Ed Willink

Messages: 7670
Registered: July 2009

Senior Member

Hi

Surely the problem is that for (2) the ZIP is opened (read and closed) for each entry? Neither EMF nor the platform can reasonably keep the zip open on the off chance that it will be re-used, but your application can. Just make sure that your multi-model loader maintains a cache of currently open ZIP files; then maybe an efficient ZIP support class will not re-read.

Regards

Ed Willink

Report message to a moderator

Re: ArchiveUrlConnection and Performance [message #1817330 is a reply to message #1817328]

Wed, 20 November 2019 14:10

Ed Merks

Messages: 33218
Registered: July 2009

Senior Member

Yes, reading all entries from an archive as separate resources will be n^2 cost. Of course the ones at the beginning can be accessed quickly and terminate, but the one at the end, will scan the whole resource first.

The code I pointed at that's used by Oomph keeps the bytes for all the entries in memory, weakly referenced, so if accessed quickly, the resource as a whole is only loaded once, and the entries can and will be garbage collected. This was done more from complaints about 100s of reads the setups.zip than it was for purely performance reasons, but it accomplished both goals. One problem that was reported is that during loading, the entire set of entries is strongly referenced, so it's possible to get an out-of-memory exception if the resource is really huge and you already have barely enough heap space left.

Ed Merks
Professional Support: https://www.macromodeling.com/

Report message to a moderator

Previous Topic:	Memory leak in Ecore application
Next Topic:	[CDO] Repository checkouts not visible in Project Explorer

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Sep 24 16:50:46 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter