Eclipse Community Forums: TMF (Xtext) » Performance issue in QueuedBuildData

Help

Home

Home » Modeling » TMF (Xtext) » Performance issue in QueuedBuildData

Show: Today's Messages :: Show Polls :: Message Navigator

Performance issue in QueuedBuildData [message #1739605]

Wed, 03 August 2016 08:54

Akos Kitta

Messages: 25
Registered: November 2015

Junior Member

Hi there,

I have some performance issue in QueuedBuildData.

What am I trying to do?

I have more than 500k very simple resources in the workspace, balanced into 256 projects and I am trying to run a full clean build. (Please do not ask why I am doing that. I just wanted to try out whether Xtext can handle such a number of resources or not. It can. Indeed I had to customize the WorkspaceProjectsStateHelperand the DerivedStateAwareResourceDescriptionManager but it works fine. Pretty neat, isn't it? At least it is cool for me, I was not aware of that capability. Just as a side note, the full clean build time was around 40 minutes in my case.)

Where do I have exactly the issue?

In the QueuedBuildData#queueURI(URI) method. If I understand correctly, neither the uris, nor the values of the projectNameToChangedResource can contain duplicate URIs. The same rule applies on the corresponding containers for the copies. Currently, linked lists are being used for both the uris and the values of the projectNameToChangedResource. Unfortunately, it is pretty expensive to check the contains all the time (with such a number of resources) to avoid URI duplication. I did some profiling and the contains time is roughly the 20% of my overall build time. I was thinking about to replace it with a linked hash set. Actually, I have already done that, but I had to basically copy paste the whole class, replace the linked lists with linked hash sets and bind the QueuedBuildData to my custom implementation. Based on my experiences, it worked fine.

My question is the following:

Is there a more elegant way to customize the QueuedBuildData? If yes, can someone please advise how to achive that? If no, but you believe, it would make sense to provide some sort of mechanism to make it customizable, I would be happy to start working on a patch.

Thanks in advance for any kind of feedbacks!

Cheers,
Akos

Report message to a moderator

Re: Performance issue in QueuedBuildData [message #1739611 is a reply to message #1739605]

Wed, 03 August 2016 09:12

Karsten Thoms

Messages: 762
Registered: July 2009
Location: Dortmund, Germany

Senior Member

Dear Akos,

these are quite interesting observations. Scalability and performance are always things that we want to improve for Xtext. Often, performance issues are not visible when handling "normal" amounts of resources and bottlenecks become visible in scenarios like yours.

QueuedBuildData itself is not designed to handle that vast amount of URIs and you are perfectly right, list.contains() does not scale in that case.

I am unsure if it is necessary to keep insertion order stable. That might be the reason why a LinkedList was chosen initially. It might be worth to consider a UniqueEList instead the LinkedList. A linked hashset does not guarantee access order, and a queue is usually something where you want to maintain the original order.

Please do me a favor and try to use UniqueEList from emf.common. If you have concrete numbers that show performance benefits in large scale, document them and raise a bug (or an issue https://github.com/eclipse/xtext-eclipse/issues). Do also test for small change sets. The normal use case should not degrade in performance, too. Attach profile results which show that the time spent on QueuedBuildData#queueURI benefits from the change.

Thanks,
~Karsten

Need professional support for Xtext, EMF, Eclipse IDE?
Go to: http://devhub.karakun.com
Twitter : @kthoms
Blog : www.karsten-thoms.de

Report message to a moderator

Re: Performance issue in QueuedBuildData [message #1739618 is a reply to message #1739611]

Wed, 03 August 2016 10:49

Akos Kitta

Messages: 25
Registered: November 2015

Junior Member

Dear Karsten,

first off, thank you for your quick reply.

Secondly, I have to admit, I am extremely confused now...

"A linked hashset does not guarantee access order..."

But it guarantees iteration order, right? Or am I wrong? And we need only iteration order, and removing the first element and appending to the end of the collection, right? As I said, I am confused. Please reference the Javadoc of the LinkedHashSet.

" This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set. (An element e is reinserted into a set s if s.add(e) is invoked when s.contains(e) would return true immediately prior to the invocation.) "

Also, you can reference my extremely simple example from the main method.

public static void main(String[] args) {

	LinkedHashSet<String> linkedHashSet = new LinkedHashSet<String>();
	HashSet<String> hashSet = new HashSet<String>();

	List<Integer> toInsert = Arrays.asList(10, 11, 12, 88, 89, 90, 99);

	System.out.println("Item insertion order:\n" + Arrays.toString(toInsert.toArray()) + "\n");
	for (int i : toInsert) {
		linkedHashSet.add(Integer.toString(i));
		hashSet.add(Integer.toString(i));
	}

	System.out.println(LinkedHashSet.class.getSimpleName() + " iteration order:\n"
			+ Arrays.toString(linkedHashSet.toArray()) + "\n");
	System.out.println(
			HashSet.class.getSimpleName() + " iteration order:\n" + Arrays.toString(hashSet.toArray()) + "\n");

	System.out.println("____________________________________________________________________");
	System.out.println("Reverse the order of added items and re-add them again. (In reverse order.)");
	Collections.reverse(toInsert);

	System.out.println("Item insertion order:\n" + Arrays.toString(toInsert.toArray()) + "\n");
	for (int i : toInsert) {
		linkedHashSet.add(Integer.toString(i));
		hashSet.add(Integer.toString(i));
	}

	System.out.println(LinkedHashSet.class.getSimpleName() + " iteration order:\n"
			+ Arrays.toString(linkedHashSet.toArray()) + "\n");
	System.out.println(HashSet.class.getSimpleName() + " iteration order:\n" + Arrays.toString(hashSet.toArray()));

}

Furthemore, I believe using a UniqueEList does not change anything. The contains method is still checked at the base class's org.eclipse.emf.common.util.BasicEList.contains(Object) method. Which is again, an iteration (O(n)).

I was rather thinking about such a solution:

import java.util.AbstractQueue;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Queue;

import org.junit.Assert;
import org.junit.Test;

public class LinkedHashQueueTest extends Assert {

	@Test
	public void checkIterationOrder() {

		final Queue<String> queue = new LinkedList<String>();
		final LinkedHashSet<String> linkedHashSet = new LinkedHashSet<String>();

		final List<Integer> toInsert = Arrays.asList(10, 11, 12, 88, 89, 90, 99);

		for (final int i : toInsert) {
			linkedHashSet.add(Integer.toString(i));
			queue.add(Integer.toString(i));
		}

		final Queue<String> hashQueue = new LinkedHashQueue<String>(linkedHashSet);

		final ArrayList<String> queuePollOrder = new ArrayList<String>();
		final ArrayList<String> hashQueuePollOrder = new ArrayList<String>();

		while (!queue.isEmpty()) {
			queuePollOrder.add(queue.poll());
		}

		while (!hashQueue.isEmpty()) {
			hashQueuePollOrder.add(hashQueue.poll());
		}

		assertTrue(queue.isEmpty());
		assertTrue(hashQueue.isEmpty());

		final String[] expecteds = new String[] { "10", "11", "12", "88", "89", "90", "99" };

		assertArrayEquals(expecteds, queuePollOrder.toArray());
		assertArrayEquals(expecteds, hashQueuePollOrder.toArray());
		assertArrayEquals(queuePollOrder.toArray(), hashQueuePollOrder.toArray());

	}

	public class LinkedHashQueue<E> extends AbstractQueue<E> {

		private final LinkedHashSet<E> delegate;

		public LinkedHashQueue(final LinkedHashSet<E> delegate) {
			this.delegate = delegate;
		}

		@Override
		public boolean offer(final E e) {
			return delegate.add(e);
		}

		@Override
		public E poll() {
			final Iterator<E> itr = delegate.iterator();
			if (itr.hasNext()) {
				final E next = itr.next();
				itr.remove();
				return next;
			}
			return null;
		}

		@Override
		public E peek() {
			final Iterator<E> itr = delegate.iterator();
			if (itr.hasNext()) {
				return itr.next();
			}
			return null;
		}

		@Override
		public Iterator<E> iterator() {
			return delegate.iterator();
		}

		@Override
		public int size() {
			return delegate.size();
		}

		/**
		 * This is overridden due to performance issues. To avoid iteration and
		 * check containment with a simple hash.
		 * 
		 * <p>
		 * {@inheritDoc}
		 */
		@Override
		public boolean contains(final Object o) {
			return delegate.contains(o);
		}

	}

}

I know, I have attached a bit too much code to this thread, but I believe this thread definitely requires upfront clarification.

As always any kind of feedback is welcome.

Cheers,
Akos

Report message to a moderator

Re: Performance issue in QueuedBuildData [message #1739621 is a reply to message #1739618]

Wed, 03 August 2016 11:02

Karsten Thoms

Messages: 762
Registered: July 2009
Location: Dortmund, Germany

Senior Member

Dear Akos,

you are perfectly right, sorry for confusing you. I did not have in mind that LinkedHashSet guarantees iteration order, which is what we want to achieve.

So it seems that your approach is promising. Could you open an issue for that to discuss your proposal? Even better would be to additional open a Pull Request with a proposed change.

Best wishes,
~Karsten

Report message to a moderator

Re: Performance issue in QueuedBuildData [message #1739623 is a reply to message #1739621]

Wed, 03 August 2016 11:17

Akos Kitta

Messages: 25
Registered: November 2015

Junior Member

Dear Karsten,

sure. Once I have the issue and the PR, I will post them here as well.

Cheers,
akos

Report message to a moderator

Re: Performance issue in QueuedBuildData [message #1739707 is a reply to message #1739623]

Thu, 04 August 2016 07:59

Akos Kitta

Messages: 25
Registered: November 2015

Junior Member

Github issue is available at: https://github.com/eclipse/xtext-eclipse/issues/40

Report message to a moderator

Re: Performance issue in QueuedBuildData [message #1833741 is a reply to message #1739707]

Fri, 23 October 2020 05:55

Neeraj Bhusare

Messages: 177
Registered: July 2009
Location: Canada

Senior Member

> Indeed I had to customize the WorkspaceProjectsStateHelper and the DerivedStateAwareResourceDescriptionManager but it works fine. Pretty neat, isn't it? At least it is cool for me, I was not aware of that capability

Dear Akos,
I know this thread is very old. In case you still have the above code, could you please provide more details or an example of the customizations that did above? On Github, I found two examples where "DerivedStateAwareResourceDescriptionManager" is subclassed/customized (links below).

https://github.com/dsldevkit/dsl-devkit/blob/61bbf102f6ce37948b1b51de2e5a0120da12d86b/com.avaloq.tools.ddk.xtext/src/com/avaloq/tools/ddk/xtext/resource/AbstractCachingResourceDescriptionManager.java
https://github.com/olegsergeyev/lab/blob/51c5b5e484ff2a325726dea4eb601f2c061ba2ec/plugins/org.eclipse.emf.ecore.xcore/src/org/eclipse/emf/ecore/xcore/scoping/XcoreResourceDescriptionManager.java

Thanks in advance.

Twitter : @NeerajBhusare
Blog : https://nbhusare.github.io/
Best regards, Neeraj

[Updated on: Fri, 23 October 2020 06:01]

Report message to a moderator