Eclipse Community Forums: TMF (Xtext) » Performance issue in QueuedBuildData

Home » Modeling » TMF (Xtext) » Performance issue in QueuedBuildData

Performance issue in QueuedBuildData [message #1739605]

Wed, 03 August 2016 04:54

Eclipse User

Hi there,

I have some performance issue in QueuedBuildData.

What am I trying to do?

I have more than 500k very simple resources in the workspace, balanced into 256 projects and I am trying to run a full clean build. (Please do not ask why I am doing that. I just wanted to try out whether Xtext can handle such a number of resources or not. It can. Indeed I had to customize the WorkspaceProjectsStateHelperand the DerivedStateAwareResourceDescriptionManager but it works fine. Pretty neat, isn't it? At least it is cool for me, I was not aware of that capability. Just as a side note, the full clean build time was around 40 minutes in my case.)

Where do I have exactly the issue?

In the QueuedBuildData#queueURI(URI) method. If I understand correctly, neither the uris, nor the values of the projectNameToChangedResource can contain duplicate URIs. The same rule applies on the corresponding containers for the copies. Currently, linked lists are being used for both the uris and the values of the projectNameToChangedResource. Unfortunately, it is pretty expensive to check the contains all the time (with such a number of resources) to avoid URI duplication. I did some profiling and the contains time is roughly the 20% of my overall build time. I was thinking about to replace it with a linked hash set. Actually, I have already done that, but I had to basically copy paste the whole class, replace the linked lists with linked hash sets and bind the QueuedBuildData to my custom implementation. Based on my experiences, it worked fine.

My question is the following:

Is there a more elegant way to customize the QueuedBuildData? If yes, can someone please advise how to achive that? If no, but you believe, it would make sense to provide some sort of mechanism to make it customizable, I would be happy to start working on a patch.

Thanks in advance for any kind of feedbacks!

Cheers,
Akos

Re: Performance issue in QueuedBuildData [message #1739611 is a reply to message #1739605]

Wed, 03 August 2016 05:12

Eclipse User

Dear Akos,

these are quite interesting observations. Scalability and performance are always things that we want to improve for Xtext. Often, performance issues are not visible when handling "normal" amounts of resources and bottlenecks become visible in scenarios like yours.

QueuedBuildData itself is not designed to handle that vast amount of URIs and you are perfectly right, list.contains() does not scale in that case.

I am unsure if it is necessary to keep insertion order stable. That might be the reason why a LinkedList was chosen initially. It might be worth to consider a UniqueEList instead the LinkedList. A linked hashset does not guarantee access order, and a queue is usually something where you want to maintain the original order.

Please do me a favor and try to use UniqueEList from emf.common. If you have concrete numbers that show performance benefits in large scale, document them and raise a bug (or an issue https://github.com/eclipse/xtext-eclipse/issues). Do also test for small change sets. The normal use case should not degrade in performance, too. Attach profile results which show that the time spent on QueuedBuildData#queueURI benefits from the change.

Thanks,
~Karsten

Re: Performance issue in QueuedBuildData [message #1739618 is a reply to message #1739611]

Wed, 03 August 2016 06:49

Eclipse User

Dear Karsten,

first off, thank you for your quick reply.

Secondly, I have to admit, I am extremely confused now...

"A linked hashset does not guarantee access order..."

But it guarantees iteration order, right? Or am I wrong? And we need only iteration order, and removing the first element and appending to the end of the collection, right? As I said, I am confused. Please reference the Javadoc of the LinkedHashSet.

" This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set. (An element e is reinserted into a set s if s.add(e) is invoked when s.contains(e) would return true immediately prior to the invocation.) "

Also, you can reference my extremely simple example from the main method.

public static void main(String[] args) {

	LinkedHashSet<String> linkedHashSet = new LinkedHashSet<String>();
	HashSet<String> hashSet = new HashSet<String>();

	List<Integer> toInsert = Arrays.asList(10, 11, 12, 88, 89, 90, 99);

	System.out.println("Item insertion order:\n" + Arrays.toString(toInsert.toArray()) + "\n");
	for (int i : toInsert) {
		linkedHashSet.add(Integer.toString(i));
		hashSet.add(Integer.toString(i));
	}

	System.out.println(LinkedHashSet.class.getSimpleName() + " iteration order:\n"
			+ Arrays.toString(linkedHashSet.toArray()) + "\n");
	System.out.println(
			HashSet.class.getSimpleName() + " iteration order:\n" + Arrays.toString(hashSet.toArray()) + "\n");

	System.out.println("____________________________________________________________________");
	System.out.println("Reverse the order of added items and re-add them again. (In reverse order.)");
	Collections.reverse(toInsert);

	System.out.println("Item insertion order:\n" + Arrays.toString(toInsert.toArray()) + "\n");
	for (int i : toInsert) {
		linkedHashSet.add(Integer.toString(i));
		hashSet.add(Integer.toString(i));
	}

	System.out.println(LinkedHashSet.class.getSimpleName() + " iteration order:\n"
			+ Arrays.toString(linkedHashSet.toArray()) + "\n");
	System.out.println(HashSet.class.getSimpleName() + " iteration order:\n" + Arrays.toString(hashSet.toArray()));

}

Furthemore, I believe using a UniqueEList does not change anything. The contains method is still checked at the base class's org.eclipse.emf.common.util.BasicEList.contains(Object) method. Which is again, an iteration (O(n)).

I was rather thinking about such a solution:

import java.util.AbstractQueue;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Queue;

import org.junit.Assert;
import org.junit.Test;

public class LinkedHashQueueTest extends Assert {

	@Test
	public void checkIterationOrder() {

		final Queue<String> queue = new LinkedList<String>();
		final LinkedHashSet<String> linkedHashSet = new LinkedHashSet<String>();

		final List<Integer> toInsert = Arrays.asList(10, 11, 12, 88, 89, 90, 99);

		for (final int i : toInsert) {
			linkedHashSet.add(Integer.toString(i));
			queue.add(Integer.toString(i));
		}

		final Queue<String> hashQueue = new LinkedHashQueue<String>(linkedHashSet);

		final ArrayList<String> queuePollOrder = new ArrayList<String>();
		final ArrayList<String> hashQueuePollOrder = new ArrayList<String>();

		while (!queue.isEmpty()) {
			queuePollOrder.add(queue.poll());
		}

		while (!hashQueue.isEmpty()) {
			hashQueuePollOrder.add(hashQueue.poll());
		}

		assertTrue(queue.isEmpty());
		assertTrue(hashQueue.isEmpty());

		final String[] expecteds = new String[] { "10", "11", "12", "88", "89", "90", "99" };

		assertArrayEquals(expecteds, queuePollOrder.toArray());
		assertArrayEquals(expecteds, hashQueuePollOrder.toArray());
		assertArrayEquals(queuePollOrder.toArray(), hashQueuePollOrder.toArray());

	}

	public class LinkedHashQueue<E> extends AbstractQueue<E> {

		private final LinkedHashSet<E> delegate;

		public LinkedHashQueue(final LinkedHashSet<E> delegate) {
			this.delegate = delegate;
		}

		@Override
		public boolean offer(final E e) {
			return delegate.add(e);
		}

		@Override
		public E poll() {
			final Iterator<E> itr = delegate.iterator();
			if (itr.hasNext()) {
				final E next = itr.next();
				itr.remove();
				return next;
			}
			return null;
		}

		@Override
		public E peek() {
			final Iterator<E> itr = delegate.iterator();
			if (itr.hasNext()) {
				return itr.next();
			}
			return null;
		}

		@Override
		public Iterator<E> iterator() {
			return delegate.iterator();
		}

		@Override
		public int size() {
			return delegate.size();
		}

		/**
		 * This is overridden due to performance issues. To avoid iteration and
		 * check containment with a simple hash.
		 * 
		 * <p>
		 * {@inheritDoc}
		 */
		@Override
		public boolean contains(final Object o) {
			return delegate.contains(o);
		}

	}

}

I know, I have attached a bit too much code to this thread, but I believe this thread definitely requires upfront clarification.

As always any kind of feedback is welcome.

Cheers,
Akos

Re: Performance issue in QueuedBuildData [message #1739621 is a reply to message #1739618]

Wed, 03 August 2016 07:02

Eclipse User

Dear Akos,

you are perfectly right, sorry for confusing you. I did not have in mind that LinkedHashSet guarantees iteration order, which is what we want to achieve.

So it seems that your approach is promising. Could you open an issue for that to discuss your proposal? Even better would be to additional open a Pull Request with a proposed change.

Best wishes,
~Karsten

Re: Performance issue in QueuedBuildData [message #1739623 is a reply to message #1739621]

Wed, 03 August 2016 07:17

Eclipse User

Dear Karsten,

sure. Once I have the issue and the PR, I will post them here as well.

Cheers,
akos

Re: Performance issue in QueuedBuildData [message #1739707 is a reply to message #1739623]

Thu, 04 August 2016 03:59

Eclipse User

Github issue is available at: https://github.com/eclipse/xtext-eclipse/issues/40

Re: Performance issue in QueuedBuildData [message #1833741 is a reply to message #1739707]

Fri, 23 October 2020 01:55

Eclipse User

> Indeed I had to customize the WorkspaceProjectsStateHelper and the DerivedStateAwareResourceDescriptionManager but it works fine. Pretty neat, isn't it? At least it is cool for me, I was not aware of that capability

Dear Akos,
I know this thread is very old. In case you still have the above code, could you please provide more details or an example of the customizations that did above? On Github, I found two examples where "DerivedStateAwareResourceDescriptionManager" is subclassed/customized (links below).

https://github.com/dsldevkit/dsl-devkit/blob/61bbf102f6ce37948b1b51de2e5a0120da12d86b/com.avaloq.tools.ddk.xtext/src/com/avaloq/tools/ddk/xtext/resource/AbstractCachingResourceDescriptionManager.java
https://github.com/olegsergeyev/lab/blob/51c5b5e484ff2a325726dea4eb601f2c061ba2ec/plugins/org.eclipse.emf.ecore.xcore/src/org/eclipse/emf/ecore/xcore/scoping/XcoreResourceDescriptionManager.java

Thanks in advance.

[Updated on: Fri, 23 October 2020 02:01] by Moderator

Re: Performance issue in QueuedBuildData [message #1833742 is a reply to message #1833741]

Fri, 23 October 2020 02:21

Eclipse User

@ Neeraj what is slow for your with DerivedStateAwareResourceDescriptionManager
please note: QueuedBuildData is fixed meanwhile

Previous Topic:	refresh dependecies before importing the project on eclipse
Next Topic:	xtend code

Goto Forum:

-=] Back to Top [=-

Current Time: Wed Jul 23 08:24:35 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter