Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » EMF » Loading different models concurrently synchronisation.
Loading different models concurrently synchronisation. [message #1804706] Fri, 29 March 2019 22:23 Go to next message
Tobias Fox is currently offline Tobias FoxFriend
Messages: 10
Registered: February 2019
Junior Member
Hi again,

My goal is to load several models into different resources, resourceSets over several threads to reduce execution time. Given that these objects are completely independent I would assume they can be loaded in parallel, but distributing the resources over two threads has very little benefit to the Resource.load(null) method, and even less over three or four.

This leads me to presume there is some kind of synchronisation between the loading processes. Do you know if this is the case, and what I can do to improve performance? I understand that EMF is not necessarily thread-safe, but I see no reason why this must be the case for this method.

Thanks,

Tobias
Re: Loading different models concurrently synchronisation. [message #1804715 is a reply to message #1804706] Sat, 30 March 2019 05:34 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7655
Registered: July 2009
Senior Member
Hi

I could understand a rare malfunction as e.g. the same SAXParser is allocated to two different threads concurrently. But I would expect it to work in an intermittent way since you are doing what is clearly unsafe without researching the underlying functionality.

Are you sure that your bottleneck is not your disk I/O? See whether reading multiple text files scales as you expect.

Regards

Ed Willink
Re: Loading different models concurrently synchronisation. [message #1804720 is a reply to message #1804715] Sat, 30 March 2019 08:24 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33136
Registered: July 2009
Senior Member
I'm not sure how you are measuring. Of course multi-threading will not improve the performance of an individual call to Resource.load, i.e., even if you have two threads that separately load two resources in parallel, the performance of any one call will be essentially the same. But there is no general interaction between two calls to load in the core framework so you should be able to load two resources twice as fast using two threads. I.e., I have not built in any synchronized calls in the generated models nor in the resource framework. All synchronization and threading control must be managed by the downstream application.

Most certainly I have significant experience with several downstream uses of EMF (e.g., in customer applications) that make use of multi-threading to improve the performance of loading. Even in Oomph we make heavy use of this and it generally always has a huge impact to improve overall load time.

Oomph is open source so you can look at the details of how we do such parallel loading:

https://git.eclipse.org/c/oomph/org.eclipse.oomph.git/tree/plugins/org.eclipse.oomph.setup.core/src/org/eclipse/oomph/setup/internal/core/util/ResourceMirror.java

We of course have to be careful when loading multiple resources into the same resource set to make that process thread safe. I.e., we must guard when the resource set itself is modified:

https://git.eclipse.org/c/oomph/org.eclipse.oomph.git/tree/plugins/org.eclipse.oomph.setup.core/src/org/eclipse/oomph/setup/internal/core/util/ResourceMirror.java#n159

Here you see here that synchronization and thread management is up to the application and is not inherent in the EMF framework itself allowing clients to build arbitrary mechanisms without battling locking/synchronization assumptions of the core framework.


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Loading different models concurrently synchronisation. [message #1804728 is a reply to message #1804720] Sat, 30 March 2019 15:23 Go to previous messageGo to next message
Tobias Fox is currently offline Tobias FoxFriend
Messages: 10
Registered: February 2019
Junior Member
Hi,

Thanks for your detailed replies.

Ed Willink - my resources are loaded directly InputStreams so this shouldn't be an issue.
Ed Merks - I'm very happy that is the case, and looking at the examples it seems to confirm my assumptions. Based on this I have written a small test to show my results and measurements.

The model is loaded from independent InputStreams from the same ArrayList:

public static void test() {
  try {
    ExecutorService exec = Executors.newFixedThreadPool(4);

    File initialFile = new File("resources/set1.model");
    ArrayList<String> file = new ArrayList<>(Files.readAllLines(initialFile.toPath()));
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    for (String line : file) baos.write(line.getBytes());

        InputStream is1 = new ByteArrayInputStream(baos.toByteArray());
        InputStream is2 = new ByteArrayInputStream(baos.toByteArray());
        InputStream is3 = new ByteArrayInputStream(baos.toByteArray());
        InputStream is4 = new ByteArrayInputStream(baos.toByteArray());
			
        long time = System.currentTimeMillis();
        exec.execute(createRunnable(is1));
	exec.execute(createRunnable(is2));
	exec.execute(createRunnable(is3));
	exec.execute(createRunnable(is4));
	exec.shutdown();
	exec.awaitTermination(100, TimeUnit.HOURS);
	System.out.print(System.currentTimeMillis()-time);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

static Runnable createRunnable(InputStream is) {
    return new Runnable() {
        public void run() {
            try {
                ResourceSet modelResourceSet = new ResourceSetImpl();
                modelResourceSet.getResourceFactoryRegistry().getExtensionToFactoryMap().put("*", new XMIResourceFactoryImpl());
	        Resource modelResource = (XMLResource) modelResourceSet.createResource(URI.createFileURI("exampleURI"));
	        modelResource.load(is, null);
           } catch (Exception e) {
	        e.printStackTrace();
            }
        }
    };
}


When an executorService with 1 thread is created this outputs around 4000
When an executorService with 2 threads is created this outputs around 3700 - a slight improvement, but nowhere near I would expect.
For any number of threads greater than 2 the output is greater than 4000.

ResourceSets are independent, the streams are independent, and (tested seporrately) executorService works as expected.
Can you see any obvious issues with this implementation, compared to the one in, say, Oomph?

Thanks so much once again.
Tobias
Re: Loading different models concurrently synchronisation. [message #1804730 is a reply to message #1804728] Sat, 30 March 2019 17:15 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33136
Registered: July 2009
Senior Member
Depending on what you're doing around this code, you may be mostly measuring the cost of class loading/initialization. Try other things like loading 100 resources. Also try doing a warmup run, or run this whole test itself in a loop, so all needed classes are loaded and initialized.

Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Loading different models concurrently synchronisation. [message #1804732 is a reply to message #1804728] Sat, 30 March 2019 17:56 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7655
Registered: July 2009
Senior Member
Hi

Parallel execution is on my to-do list for OCL/QVTd so I had a a quick look. 4 threads is actually 10% slower for me.

I changed "exampleURI" to a distinct name per thread to avoid a shared external file.

Using VisualVM with 4 threads it appears that only 1.25 are actually "Running" the others are "Monitor".

Using the Windows Task Manager it seems that my 4 CPUs average 50% busy for 1 or 4 threads, so it would appear that Java 8 is able to distribute your one thread example to multiple CPUs. I didn't know it was that clever.

The "Monitor" in VisualVM suggests to me that some routine perhaps in EMF perhaps in the SAX Parser, perhaps in java.io, perhaps in ... is synchronized contrary to our expectations.

Quote:
I could understand a rare malfunction as e.g. the same SAXParser is allocated to two different threads concurrently.


Correction: After investigating the "synchronized" usage in emf.core, emf.core.xmi, emf.common, I see that the Parser Pool is well-synchronized.

EMF is mostly unsynchronized, but anything to do with Date, Regex or ECollections hits synchronization.

The EPackageRegistryImpl.getRegistry synchronizes accesses to the classLoaderToRegistryMap which for my 38000 line test file incurs 7 synchronizations per file.

CommonUtil.synchronizedPowerOf31 incurs a single synchronization.

The above might be improved but they do not account for the lack os speed up.

Running the four threads under the debugger and stopping arbitrarily hit a point where there is a huge amount of class loading going on. Why? The threads had been running long enough to warm up. The problem arises with a dynamic metamodel. EFactoryImpl.create invokes "if (eSuperType.getInstanceClass() != null)" which needs to catch a CNFE for every model reference. The underlying classloader access may well incur synchronization.

Switching to use a registered model such as Ecore; UML.ecore is the largest Ecore available, but it seems equally resistant to speed up, although at 1.6 seconds it may not be big enough. Stopping at an arbitraty poiunt nd all 4 threads seem to be hard at work reading without monitors.

Using an artificial 500,000 line Ecore file slows things up a bit so that the 4 threads is nearly a two-fold speed up. It looks as if you are not looking at a bit enough problem to swamp the overheads. Stopping the 4 thread 5000,000 line Ecore arbitrarily shows all threwad "Running" at all times, except one which has switched to finalizer. Another confounding topic; GC is messing up the observations.

Conclusion EMF probably can be sped up 4-fold on four threads but only if you take enormous care to use big models and manage the memory.

Regards

Ed Willink
Re: Loading different models concurrently synchronisation. [message #1804737 is a reply to message #1804706] Sat, 30 March 2019 19:40 Go to previous messageGo to next message
Erdal Karaca is currently offline Erdal KaracaFriend
Messages: 854
Registered: July 2009
Senior Member
Also, do not forget to set the IDtoEObject lookup map:

xmiResourceImpl.setIntrinsicIDToEObjectMap(new HashMap<>())

This may improve resource loading time.
Re: Loading different models concurrently synchronisation. [message #1804747 is a reply to message #1804737] Sun, 31 March 2019 06:15 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33136
Registered: July 2009
Senior Member
As Erdal suggests, there are a number of options/ways to improve load performance of an individual calls.

But let's be really concrete and make sure we are measuring what we think we are measuring. Here's a modified "test" that anyone can run:
package snippet;


import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

import org.eclipse.emf.common.util.URI;
import org.eclipse.emf.ecore.EPackage;
import org.eclipse.emf.ecore.EcorePackage;
import org.eclipse.emf.ecore.resource.Resource;
import org.eclipse.emf.ecore.resource.ResourceSet;
import org.eclipse.emf.ecore.resource.impl.ResourceSetImpl;
import org.eclipse.emf.ecore.util.EcoreUtil;
import org.eclipse.emf.ecore.xmi.impl.EcoreResourceFactoryImpl;


public class Snippet
{
  private static final Resource.Factory ECORE_RESOURCE_FACTORY = new EcoreResourceFactoryImpl();

  public static void main(String[] args)
  {
    for (int i = 1; i < 16; ++i)
    {
      test(i, i);
    }
  }

  public static void test(int threadCount, int resourceCount)
  {
    try
    {
      ExecutorService exec = Executors.newFixedThreadPool(threadCount);

      ByteArrayOutputStream output = new ByteArrayOutputStream();
      Resource resource = ECORE_RESOURCE_FACTORY.createResource(URI.createURI("Ecore.ecore"));
      for (int i = 0; i < 1000; ++i)
      {
        EPackage ePackage = EcoreUtil.copy(EcorePackage.eINSTANCE);
        resource.getContents().add(ePackage);
      }
      resource.save(output, null);
      byte[] bytes = output.toByteArray();

      long time = System.currentTimeMillis();
      for (int i = 0; i < resourceCount; ++i)
      {
        InputStream input = new ByteArrayInputStream(bytes);
        exec.execute(createRunnable(input));
      }
      exec.shutdown();
      exec.awaitTermination(100, TimeUnit.HOURS);
      System.out.println("" + threadCount + " threads loading " + resourceCount + " resources:" + (System.currentTimeMillis() - time));
    }
    catch (Exception e)
    {
      e.printStackTrace();
    }
  }

  static Runnable createRunnable(InputStream is)
  {
    return new Runnable()
      {
        public void run()
        {
          try
          {
            ResourceSet modelResourceSet = new ResourceSetImpl();
            modelResourceSet.getResourceFactoryRegistry().getExtensionToFactoryMap().put("*", ECORE_RESOURCE_FACTORY);
            Resource modelResource = modelResourceSet.createResource(URI.createFileURI("exampleURI"));
            modelResource.load(is, null);
          }
          catch (Exception e)
          {
            e.printStackTrace();
          }
        }
      };
  }
}
For sample data I just use the Ecore model and create a big resource by creating many copies of EcorePackage.eINSTANCE and save it to get those bytes. Each threads load those bytes as a resource as before. So how does this perform?
1 threads loading 1 resources:1237
2 threads loading 2 resources:1539
3 threads loading 3 resources:1590
4 threads loading 4 resources:1016
5 threads loading 5 resources:1222
6 threads loading 6 resources:1191
7 threads loading 7 resources:1292
8 threads loading 8 resources:1451
9 threads loading 9 resources:1573
10 threads loading 10 resources:1776
11 threads loading 11 resources:1945
12 threads loading 12 resources:2084
13 threads loading 13 resources:3158
14 threads loading 14 resources:3583
15 threads loading 15 resources:4446

We see here that we can load 8 resources on 8 threads in the same elapsed time as we can load 1 resource on 1 thread. We can also see that the elapsed time starts to go up in a statistically meaningful way only after the thread count exceeds the number of actual available cores on my machine. We can also see that I've not managed memory carefully (nor am I sure how one would do that). Of course one can imagine the garbage collector must be hard at work too, also consuming CPU cycles but that's always the case.

In any case, it's clear that EMF's resource loading is not resistant to performance improvement via multi-threading; load performance can defintely (and easily) be improved by using multi threads and that scales exactly in the expected way. In a real life example you will be loading from disk (or from the internet), and in that case the thread will often block waiting for input to arrive. In such cases, you could use more threads than you have CPUs available because each thread will not always be active but rather blocked waiting for the relatively slow input to arrive from the external source.


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Loading different models concurrently synchronisation. [message #1804768 is a reply to message #1804747] Mon, 01 April 2019 02:51 Go to previous messageGo to next message
Tobias Fox is currently offline Tobias FoxFriend
Messages: 10
Registered: February 2019
Junior Member
Hi all,

Thanks for your replies.

I can replicate what you found, Ed, and was also noticed the blocked threads as sown in visual VM. I do have an issue with your test, however. If I alter it as follows:

for (int i = 1; i < 16; ++i)
{
  test(i, i);
}

becomes
for (int i = 4; i > 0 ; --i)
{
    test(i, i);
}


I see the following behaviour, where I would expect each row to have more or less the same value on a PC with 4 cores:

4 threads loading 4 resources:2903
3 threads loading 3 resources:1548
2 threads loading 2 resources:1296
1 threads loading 1 resources:1308

I see the same results on my own models too, that the first test is always significantly worse performing that the others when executed over several threads. Does this suggest that parallel loading is only beneficial for models that previously have been loaded? Maybe objects sticking around in memory? If this is the case then it's safe to assume that EMF would behave poorly on a one-off concurrent loading of a number of different models/resources?

Regards,
Tobias

[Updated on: Mon, 01 April 2019 03:12]

Report message to a moderator

Re: Loading different models concurrently synchronisation. [message #1804771 is a reply to message #1804768] Mon, 01 April 2019 05:27 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7655
Registered: July 2009
Senior Member
Hi

You are seeing JIT warm up effects. The JIT optimises Java execution on about the 3000th execution, so you should run a significant workload before any observations you make are trutworthy.

On my 4 CPU machine the direct execution reports:

1 threads loading 1 resources:3157
2 threads loading 2 resources:2390
3 threads loading 3 resources:2203
4 threads loading 4 resources:3439
5 threads loading 5 resources:3765
6 threads loading 6 resources:4484
7 threads loading 7 resources:4703
8 threads loading 8 resources:6875
9 threads loading 9 resources:7656

But if I prefix a 9 execution to warmup

9 threads loading 9 resources:10408
1 threads loading 1 resources:1750
2 threads loading 2 resources:2124
3 threads loading 3 resources:3281
4 threads loading 4 resources:2313
5 threads loading 5 resources:4219
6 threads loading 6 resources:5063
7 threads loading 7 resources:5797
8 threads loading 8 resources:6781
9 threads loading 9 resources:8047

you can see that single CPU drops dramatically while others are moderately stable, although disturbingly variable suggesting there is a variability such as GC that needs control.

Adding a System.gc() before each test: improves the chances that no gc occurs within timed code:

9 threads loading 9 resources:10186
1 threads loading 1 resources:2172
2 threads loading 2 resources:2109
3 threads loading 3 resources:2156
4 threads loading 4 resources:2516
5 threads loading 5 resources:3265
6 threads loading 6 resources:4078
7 threads loading 7 resources:7062
8 threads loading 8 resources:5547
9 threads loading 9 resources:6078

1,2,3 are now pretty consistent then gradually deteriorating. The deviation for 7 clearly show these observations are still not independent of confounding effects.,

Moving the System.gc() to the line before timing starts and the 7 thread anomally vanishes/reverses:

9 threads loading 9 resources:10406
1 threads loading 1 resources:2000
2 threads loading 2 resources:1781
3 threads loading 3 resources:2281
4 threads loading 4 resources:2156
5 threads loading 5 resources:2796
6 threads loading 6 resources:3703
7 threads loading 7 resources:4062
8 threads loading 8 resources:6906
9 threads loading 9 resources:8094

Regards

Ed Willink
Re: Loading different models concurrently synchronisation. [message #1804773 is a reply to message #1804768] Mon, 01 April 2019 05:56 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33136
Registered: July 2009
Senior Member
If I reverse the loop to
     for (int i = 16; i > 0; --i)

then I see this:
16 threads loading 16 resources:5642
15 threads loading 15 resources:3974
14 threads loading 14 resources:4621
13 threads loading 13 resources:2285
12 threads loading 12 resources:2615
11 threads loading 11 resources:1927
10 threads loading 10 resources:1795
9 threads loading 9 resources:1711
8 threads loading 8 resources:1494
7 threads loading 7 resources:1291
6 threads loading 6 resources:973
5 threads loading 5 resources:919
4 threads loading 4 resources:816
3 threads loading 3 resources:753
2 threads loading 2 resources:798
1 threads loading 1 resources:749


Java performance is always a tricky thing. As I already suggested, if the first run must load the bytes from the *.class files, that will generally happen on one of the threads, blocking the other threads; you can't run the byte code if it hasn't been loaded and you can't load a class in parallel. So yes, many threads might well be blocked waiting for classes to load and so overall it will take longer.

Note the effect here as well that at the end of the test, a single load only takes significantly less time than did the initial load on the previous set of tests, i.e., 749 instead of 1237. Java has the Just-In-Time (JIT) compiler that optimizes code on the fly. And of course by this point, the classes are all loaded and JITed.

Then of course there is the issue of garbage collection. That too must happen somewhere in the JVM and isn't for free.

In the end you're drawing broad conclusions about EMF from a test case; the behavior of your actual application will likely be very different. It's definitely safe to assume for any Java application that if class loading must occur for the first time during some operation, that you can't speed that process up as much as you might like with parallel execution of the operation because all threads must wait for the classes to load before the operation can proceed.

If I run with test(1,1) 16 times I see this:
1 threads loading 1 resources:1239
1 threads loading 1 resources:907
1 threads loading 1 resources:787
1 threads loading 1 resources:855
1 threads loading 1 resources:695
1 threads loading 1 resources:696
1 threads loading 1 resources:714
1 threads loading 1 resources:689
1 threads loading 1 resources:691
1 threads loading 1 resources:680
1 threads loading 1 resources:701
1 threads loading 1 resources:687
1 threads loading 1 resources:692
1 threads loading 1 resources:687
1 threads loading 1 resources:706
1 threads loading 1 resources:707
If I do that first, and then run the other way:
1 threads loading 1 resources:1241
1 threads loading 1 resources:877
1 threads loading 1 resources:778
1 threads loading 1 resources:900
1 threads loading 1 resources:685
1 threads loading 1 resources:705
1 threads loading 1 resources:688
1 threads loading 1 resources:688
1 threads loading 1 resources:735
1 threads loading 1 resources:707
1 threads loading 1 resources:697
1 threads loading 1 resources:690
1 threads loading 1 resources:687
1 threads loading 1 resources:684
1 threads loading 1 resources:686
1 threads loading 1 resources:687
16 threads loading 16 resources:6242
15 threads loading 15 resources:2674
14 threads loading 14 resources:2439
13 threads loading 13 resources:2263
12 threads loading 12 resources:2094
11 threads loading 11 resources:1951
10 threads loading 10 resources:1777
9 threads loading 9 resources:1624
8 threads loading 8 resources:1522
7 threads loading 7 resources:1290
6 threads loading 6 resources:987
5 threads loading 5 resources:908
4 threads loading 4 resources:794
3 threads loading 3 resources:802
2 threads loading 2 resources:810
1 threads loading 1 resources:741
Here at the end we can see that we can load 4 resources using 4 threads in effectively the same total time as we can load 1 resource on 1 thread. So it seems safe to conclude that EMF does not itself cause any significant blocking/locking to occur and that parallel loading will in general improve performance significantly. Any first-time effects are primarily inherent in Java itself, as are the effects of the JIT and the garbage collector.

You might try loading some small representative sample of your model (i.e., one that has an instance of every class in your model) initially during initialization to force early class loading.


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Loading different models concurrently synchronisation. [message #1804817 is a reply to message #1804773] Mon, 01 April 2019 16:55 Go to previous message
Tobias Fox is currently offline Tobias FoxFriend
Messages: 10
Registered: February 2019
Junior Member
Hi all, thanks again for your continued and detailed replies.

To both Eds - it seems you are right, thank you for bringing that to my attention.
I am now experiencing good speedup when loading several smaller models before the parallel execution.

Thanks so much.
Tobias
Previous Topic:Desktop application using EMF but not using Eclipse RCP
Next Topic:Save Emf probleme with many reference to save
Goto Forum:
  


Current Time: Fri Apr 19 00:18:07 GMT 2024

Powered by FUDForum. Page generated in 0.02325 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top