Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EclipseLink » Performance degredation
Performance degredation [message #508653] Tue, 19 January 2010 18:00 Go to next message
Matthew Pocock is currently offline Matthew Pocock
Messages: 18
Registered: January 2010
Junior Member
Hi,

I have an application that parses a file and creates numerous objects (<100,000 but >20,000) from it. The entire read operation must either succeed or fail, so this demarcates a single transaction.

To begin with, the application runs with about 60-70% of the CPU in Java, and 30-40% in postgres. However, over time this degrades to under 1% postgres and 99% Java. With persistence, the application can take up to 2 hours to run. When I replace the DAO with one that manages the objects in-memory, it runs in perhaps 2 seconds.

I am using the latest (non-milestone) release of eclipselink and spring in this project. I've not noticed the performance issue being resolved when upgrading from eclipselink 1.* to 2.*, or from Spring 2.5 to 3.*. I have not so far tried an alternative JPA provider to see if this is specific to eclipselink.

The code is part of a larger infrastructure, so it would be a pain to pull out into a test-case.

All ideas welcome.

Matthew
Re: Performance degredation [message #509180 is a reply to message #508653] Thu, 21 January 2010 15:29 Go to previous messageGo to next message
James Sutherland is currently offline James Sutherland
Messages: 1939
Registered: July 2009
Location: Ottawa, Canada
Senior Member

That is very odd.

What exactly are you doing (code) and how have you configured things (persistence.xml, config, etc.).

"over time this degrades" what do you means by this? Are you performing this read/insert once and it gets slower as it processes the file, or are you performing the read/insert over and over again and the server eventually gets slower?

If it degrades over the single operation, then you probably need to split up the batch of objects into smaller sets. You can still do this in a single transaction using a flush() then a clear() after say 1000 objects.

If it degrades over time, then you probable have a memory leak somewhere. Check your cache settings, and ensure you application is releasing it handle on the objects. You may wish to use a memory profiler.

In general with any performance issue it is best to figure out what the issue is first using a performance profiler, such as JProfiler.

There are some performance information on EclipseLink here,

http://wiki.eclipse.org/EclipseLink/Performance


James : Wiki : Book : Blog : Twitter
Re: Performance degredation [message #509308 is a reply to message #508653] Thu, 21 January 2010 22:22 Go to previous messageGo to next message
Matthew Pocock is currently offline Matthew Pocock
Messages: 18
Registered: January 2010
Junior Member
=What exactly are you doing (code) and how have you configured things (persistence.xml, config, etc.).

My persitsence.xml is almost empty. In spring, I turn sql logging and ddl generation on, and that's about it. It uses RESOURCE_LOCAL transactions. It's about as stripped-down as it can possibly be.

The problem classes are:

@Entity
@Table(uniqueConstraints = @UniqueConstraint(columnNames = { "orfName" }))
@NamedQueries(@NamedQuery(name = "orfByName", query = "select o from Orf o where o.orfName = :orfName"))
public class Orf
implements Serializable, Comparable<Orf>
{
@Id
@GeneratedValue(strategy= GenerationType.SEQUENCE)
private Integer id;

@Basic(fetch = FetchType.EAGER)
private String orfName;

...
}

@Entity
@Table(uniqueConstraints = {@UniqueConstraint(columnNames = {"orf1_id", "orf2_id"})})
@NamedQueries(
{
@NamedQuery(name = "allPairs", query = "select p from Pair p"),
@NamedQuery(name = "pairsByOrf", query = "select p from Pair p where p.orf1 = :orf or p.orf2 = :orf"),
@NamedQuery(name = "pairByOrfs", query = "select p from Pair p where (p.orf1 = :orf1 and p.orf2 = :orf2) or (p.orf2 = :orf1 and p.orf1 = :orf2)")
})
public class Pair
implements Serializable
{
@Id
@GeneratedValue(strategy= GenerationType.SEQUENCE)
private Integer pairID;

@ManyToOne(fetch = FetchType.EAGER )
private Orf orf1;

@ManyToOne(fetch = FetchType.EAGER)
private Orf orf2;

...
}

In the dao, pairs are written by this code:

public Pair fetchOrMakePair(Orf orf1, Orf orf2)
{
List<Pair> res = entityManager.createNamedQuery("pairByOrfs")
.setParameter("orf1", orf1)
.setParameter("orf2", orf2).getResultList();
if(res.isEmpty())
{
Pair p = new Pair(orf1, orf2);
p.normalize();
entityManager.persist(p);
return p;
}
else
{
return res.get(0);
}
}


I make sure that Orf instances are pre-loaded. Then in the same transaction I add about 50k pairs.

="over time this degrades" what do you means by this? Are you performing this read/insert once and it gets slower as it processes the file, or are you performing the read/insert over and over again and the server eventually gets slower?

All the read/inserts are done in a single transaction and then bulk-committed. As more pairs are added to the transaction, the performance degrades.

The same behaviour can be seen with hibernate. It's not specific to eclipselink.

I don't think it is a memory leak of some kind, as once the transaction commits, the performance goes back to normal and all memory is recovered.
Re: Performance degredation [message #509876 is a reply to message #508653] Mon, 25 January 2010 16:18 Go to previous messageGo to next message
James Sutherland is currently offline James Sutherland
Messages: 1939
Registered: July 2009
Location: Ottawa, Canada
Senior Member

Since it degrades over the single operation, then you probably need to split up the batch of objects into smaller sets. You can still do this in a single transaction using a flush() then a clear() after say 1000 objects.

James : Wiki : Book : Blog : Twitter
Re: Performance degredation [message #510204 is a reply to message #509876] Tue, 26 January 2010 17:02 Go to previous messageGo to next message
Matthew Pocock is currently offline Matthew Pocock
Messages: 18
Registered: January 2010
Junior Member
I've identified the issue and have a workaround. However, I can't help feeling that it's the sort of thing that should be handled for me.

The performance was being killed by looking up entities by things other than their primary key, within the loop that persisted them. So, for the case of Orf, the orfName has a unique constraint, but the Orf has a numeric ID. I was doing lots of queries to fetch orfs by their name. All the Java cpu was going into this lookup - presumably doing a linear scan, despite this having a unique constraint. When I re-wrote my DAO to keep a local cache map of orfName->Orf, this overhead went away. I did a similar thing for Pair, making a cache map from (int,int) to Pair.

The time taken to run the app without these caches is about 1030 minutes. The time with these caches is about 2 minutes. In both cases, I was looking for objects by unique constraints within the same transaction as they where created.

The same behaviour is seen in both hibernate and eclipselink, so I guess this is a performance issue in how in-transaction objects are searched by something fairly low-down in JPA.

Matthew (who now feels dirty for maintaining caches of objects in-memory)
Re: Performance degredation [message #510759 is a reply to message #510204] Thu, 28 January 2010 14:59 Go to previous messageGo to next message
James Sutherland is currently offline James Sutherland
Messages: 1939
Registered: July 2009
Location: Ottawa, Canada
Senior Member

Objects are only cache by their Id (primary key) in the persistence context. Any JPA Query is required by default to first "flush" to the database any change made to any of the objects in the persistence context, and then query the database for the object.

JPA defines a flushMode which can be set on the EntityManager or a Query that will avoid the flush until COMMIT. EclipseLink also supports a persistence unit property and query hint to configure the flush mode. Setting the flush mode to COMMIT would avoid the cost of flushing, which is probably your issue, but then the object would not be written to the database, and your query would not find it (unless you manually called flush after persisting the object).

Another option is to use conforming in EclipseLink, which allows a query to find an object in memory. You would also need to set the query type to make it a ReadObject query to avoid accessing the database though.

In general you solution to caching the object by their alternate Id is probably best. Otherwise you could change you queries to use the real Id, or change your mapping Id to what you are using in queries.



James : Wiki : Book : Blog : Twitter
Re: Performance degredation [message #510775 is a reply to message #510759] Thu, 28 January 2010 15:27 Go to previous message
Matthew Pocock is currently offline Matthew Pocock
Messages: 18
Registered: January 2010
Junior Member
=Objects are only cache by their Id (primary key) in the persistence context.

It's a pity that implementers do not additionally cache by unique constraints. I have never looked into the guts of any JPA implementations, so have no idea how feasible this would be. However, given my experience, I find it likely that a lot of applications would suddenly start going a lot faster if this where done.
Previous Topic:2.0.0: where did the commit go?
Next Topic:XML Representation of SDO Change Summary
Goto Forum:
  


Current Time: Tue Oct 21 17:03:17 GMT 2014

Powered by FUDForum. Page generated in 0.16731 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software