Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » EMF » ArchiveUrlConnection and Performance
ArchiveUrlConnection and Performance [message #1817301] Wed, 20 November 2019 09:13 Go to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
We have the usecase to read EMF models from within a JAR on the classpath on a project

there are two scenarios and they show tremendous different performance

(1) add the jar as "external jar" to the classpath
The URL looks then like archive:file:/somepath/somejar.jar!/somepack/somefile.someextension
(2) add the jar as "jar" to the classpath
The URL then looks like
archive:platform:/resource/simeproject/somepath/small.jar!/somejar.jar!/somepack/somefile.someextension

in case (1) ArchiveURLConnection uses ZIPFile.getEntry to retrieve the entry and InputStream
in case (2) a ZipInputStream is created and a linear search for the correct entry is done.

with a huge number of entries with a huge size each in the jar the (2) will be tremendously slower compared to (1)

Here are my Questions

(1) do i overlook something?
(2) could this mechanism be optimized to (a) try to get a zipfile anyway or (b) to use some buffering to maybe speed up the linear search.

Thanks
Christian


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de

[Updated on: Wed, 20 November 2019 09:21]

Report message to a moderator

Re: ArchiveUrlConnection and Performance [message #1817320 is a reply to message #1817301] Wed, 20 November 2019 11:57 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33140
Registered: July 2009
Senior Member
1) That sounds like how it works.
2) A zip file only works for a java.io.File. So when one uses a platform:/resource URI, one must read the stream through the workspace APIs. One could determine the underlying java.io.File location of the IFile and of course most often (but not in general), there is an underlying java.io.File. But then you are directly reading the file system, not accessing the IFile's contents. I don't know how buffering will help a linear search; you mean if you search repeatedly?

In Oomph we read the setups.zip a lot, so we have code like this:

https://git.eclipse.org/c/oomph/org.eclipse.oomph.git/tree/plugins/org.eclipse.oomph.setup.core/src/org/eclipse/oomph/setup/internal/core/util/SetupCoreUtil.java#n539


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: ArchiveUrlConnection and Performance [message #1817327 is a reply to message #1817320] Wed, 20 November 2019 13:44 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
we read all resources
thus the complete bug fat jar will be read for all reources inside the jar


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: ArchiveUrlConnection and Performance [message #1817328 is a reply to message #1817327] Wed, 20 November 2019 13:57 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7655
Registered: July 2009
Senior Member
Hi

Surely the problem is that for (2) the ZIP is opened (read and closed) for each entry? Neither EMF nor the platform can reasonably keep the zip open on the off chance that it will be re-used, but your application can. Just make sure that your multi-model loader maintains a cache of currently open ZIP files; then maybe an efficient ZIP support class will not re-read.

Regards

Ed Willink
Re: ArchiveUrlConnection and Performance [message #1817330 is a reply to message #1817328] Wed, 20 November 2019 14:10 Go to previous message
Ed Merks is currently offline Ed MerksFriend
Messages: 33140
Registered: July 2009
Senior Member
Yes, reading all entries from an archive as separate resources will be n^2 cost. Of course the ones at the beginning can be accessed quickly and terminate, but the one at the end, will scan the whole resource first.

The code I pointed at that's used by Oomph keeps the bytes for all the entries in memory, weakly referenced, so if accessed quickly, the resource as a whole is only loaded once, and the entries can and will be garbage collected. This was done more from complaints about 100s of reads the setups.zip than it was for purely performance reasons, but it accomplished both goals. One problem that was reported is that during loading, the entire set of entries is strongly referenced, so it's possible to get an out-of-memory exception if the resource is really huge and you already have barely enough heap space left.


Ed Merks
Professional Support: https://www.macromodeling.com/
Previous Topic:Memory leak in Ecore application
Next Topic:[CDO] Repository checkouts not visible in Project Explorer
Goto Forum:
  


Current Time: Tue Apr 23 08:21:42 GMT 2024

Powered by FUDForum. Page generated in 0.03239 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top