Home » Eclipse Projects » EclipseLink » Performance degredation
Performance degredation [message #508653] |
Tue, 19 January 2010 13:00  |
Eclipse User |
|
|
|
Hi,
I have an application that parses a file and creates numerous objects (<100,000 but >20,000) from it. The entire read operation must either succeed or fail, so this demarcates a single transaction.
To begin with, the application runs with about 60-70% of the CPU in Java, and 30-40% in postgres. However, over time this degrades to under 1% postgres and 99% Java. With persistence, the application can take up to 2 hours to run. When I replace the DAO with one that manages the objects in-memory, it runs in perhaps 2 seconds.
I am using the latest (non-milestone) release of eclipselink and spring in this project. I've not noticed the performance issue being resolved when upgrading from eclipselink 1.* to 2.*, or from Spring 2.5 to 3.*. I have not so far tried an alternative JPA provider to see if this is specific to eclipselink.
The code is part of a larger infrastructure, so it would be a pain to pull out into a test-case.
All ideas welcome.
Matthew
|
|
| |
Re: Performance degredation [message #509308 is a reply to message #508653] |
Thu, 21 January 2010 17:22   |
Eclipse User |
|
|
|
=What exactly are you doing (code) and how have you configured things (persistence.xml, config, etc.).
My persitsence.xml is almost empty. In spring, I turn sql logging and ddl generation on, and that's about it. It uses RESOURCE_LOCAL transactions. It's about as stripped-down as it can possibly be.
The problem classes are:
@Entity
@Table(uniqueConstraints = @UniqueConstraint(columnNames = { "orfName" }))
@NamedQueries(@NamedQuery(name = "orfByName", query = "select o from Orf o where o.orfName = :orfName"))
public class Orf
implements Serializable, Comparable<Orf>
{
@Id
@GeneratedValue(strategy= GenerationType.SEQUENCE)
private Integer id;
@Basic(fetch = FetchType.EAGER)
private String orfName;
...
}
@Entity
@Table(uniqueConstraints = {@UniqueConstraint(columnNames = {"orf1_id", "orf2_id"})})
@NamedQueries(
{
@NamedQuery(name = "allPairs", query = "select p from Pair p"),
@NamedQuery(name = "pairsByOrf", query = "select p from Pair p where p.orf1 = :orf or p.orf2 = :orf"),
@NamedQuery(name = "pairByOrfs", query = "select p from Pair p where (p.orf1 = :orf1 and p.orf2 = :orf2) or (p.orf2 = :orf1 and p.orf1 = :orf2)")
})
public class Pair
implements Serializable
{
@Id
@GeneratedValue(strategy= GenerationType.SEQUENCE)
private Integer pairID;
@ManyToOne(fetch = FetchType.EAGER )
private Orf orf1;
@ManyToOne(fetch = FetchType.EAGER)
private Orf orf2;
...
}
In the dao, pairs are written by this code:
public Pair fetchOrMakePair(Orf orf1, Orf orf2)
{
List<Pair> res = entityManager.createNamedQuery("pairByOrfs")
.setParameter("orf1", orf1)
.setParameter("orf2", orf2).getResultList();
if(res.isEmpty())
{
Pair p = new Pair(orf1, orf2);
p.normalize();
entityManager.persist(p);
return p;
}
else
{
return res.get(0);
}
}
I make sure that Orf instances are pre-loaded. Then in the same transaction I add about 50k pairs.
="over time this degrades" what do you means by this? Are you performing this read/insert once and it gets slower as it processes the file, or are you performing the read/insert over and over again and the server eventually gets slower?
All the read/inserts are done in a single transaction and then bulk-committed. As more pairs are added to the transaction, the performance degrades.
The same behaviour can be seen with hibernate. It's not specific to eclipselink.
I don't think it is a memory leak of some kind, as once the transaction commits, the performance goes back to normal and all memory is recovered.
|
|
| |
Re: Performance degredation [message #510204 is a reply to message #509876] |
Tue, 26 January 2010 12:02   |
Eclipse User |
|
|
|
I've identified the issue and have a workaround. However, I can't help feeling that it's the sort of thing that should be handled for me.
The performance was being killed by looking up entities by things other than their primary key, within the loop that persisted them. So, for the case of Orf, the orfName has a unique constraint, but the Orf has a numeric ID. I was doing lots of queries to fetch orfs by their name. All the Java cpu was going into this lookup - presumably doing a linear scan, despite this having a unique constraint. When I re-wrote my DAO to keep a local cache map of orfName->Orf, this overhead went away. I did a similar thing for Pair, making a cache map from (int,int) to Pair.
The time taken to run the app without these caches is about 1030 minutes. The time with these caches is about 2 minutes. In both cases, I was looking for objects by unique constraints within the same transaction as they where created.
The same behaviour is seen in both hibernate and eclipselink, so I guess this is a performance issue in how in-transaction objects are searched by something fairly low-down in JPA.
Matthew (who now feels dirty for maintaining caches of objects in-memory)
|
|
| |
Re: Performance degredation [message #510775 is a reply to message #510759] |
Thu, 28 January 2010 10:27  |
Eclipse User |
|
|
|
=Objects are only cache by their Id (primary key) in the persistence context.
It's a pity that implementers do not additionally cache by unique constraints. I have never looked into the guts of any JPA implementations, so have no idea how feasible this would be. However, given my experience, I find it likely that a lot of applications would suddenly start going a lot faster if this where done.
|
|
|
Goto Forum:
Current Time: Tue Jul 22 18:39:20 EDT 2025
Powered by FUDForum. Page generated in 0.03969 seconds
|