Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[eclipselink-dev] Some ideas for EclipseLink 3.0.0

Hello,

I'm not sure if this is the correct place for this kind of mail - if not, please let me know where to post it instead.

We're working with EclipseLink for a long time and I'd like to share with you some thoughts of our work with it and
our proposals for the upcoming EclipseLink release 3.0.0.
I'm sorry that it got a little bit long but hopefully it will be of some help for you. Most of them are performance or correctness related and I'm quite confident that a significant number of users might benefit from the following improvements:


1. Implicitly use (OUTER) JOIN FETCH for EAGERly loaded ManyToOne references

If an entity contains a ManyToOne reference with FetchType.EAGER (default) or when the corresponding weaving technique is disabled, EclipseLink issues a separate (sub-)query for each of those references when retrieving it. In case of complex hierarchies of those ManyToOne references, this may result in a dramatic performance drop as the fetch of one simple query might cause many additional queries and round trips to the database.

IMHO, EclipseLink should implicitly use an (OUTER) JOIN FETCH for fetching EAGER references
(maybe as a configurable option of the Session/EntityManager).
This should affect all queries irrespective of the source of such a query (e.g. find(), transparent indirection containers, Query).

To avoid too big / deep JOIN FETCH queries, there should be a limit (ideally configurable) on how many references (or levels)
should be considered for implicit JOIN FETCH queries.

We had scenarios where the introduction of a JOIN FETCH increased single-threaded throughput by factors of 5-10. I would love to see that kind of "tuning" to be automatically done in EclipseLink. (BTW, LAZY loaded references can only help here in cases where the ManyToOne reference is actually NOT accessed, otherwise the performance impact matches the one for eagerly fetched ManyToOne references).


2. Enable a way to provide query hints to transparent indirection containers

Currently, I'm not aware of a way to provide query hints for transparent indirection containers like IndirectSet. This prevents us to provide performance related query hints like join/batch-fetches or FetchSize (ideally with the option of specifying them globally, e.g. use fetchSize of xyz for all indirect container loads)


3. Improve sequence caching

Using sequence caching is a good idea in many scenarios.
However, there's still room of improvement:

- Introduce a configuration option to share the sequence cache across transactions. Currently, it looks like each transaction has its own sequence cache. This lowers the cache efficiency (and saves unnecessary database round trips) for transactions with only a few persists.
In most cases, especially where sequence values are retrieved
from a globally synchronized database sequence generator, there's no reason why the sequence cache should not be shared across transactions/client sessions.

- Expose the sequence cache to an official API so that non-JPA code can also benefit from EclipseLink sequence caching,
hencing improving its efficiency even more.


4. Make EntityManager.getReference() database roundtrip free

Currently, there is a little difference in behavior between EntityManager.find() and EntityManager.getReference(). Both issue database calls which is not required for the latter according the JPA spec. This lowers the efficiency of getReference(). Imagine that you just want to link an entity (with a known PK) to a newly persisted one - getReference() would be perfect for that job without the need of fetching anything from the database (e.g. by returning a proxy).

Other JPA implementations do better here (Hibernate for example).


5. Add support for retrieving detached entities within transactions

Many entities retrieved by a JPA provider are not changed at all.
Often, this is known at development time. If EclipseLink would support some kind of hint (e.g. READ_DETACHED) to return a detached entity (even within a transaction), significant amount of work and memory for building and managing the backup clone could be saved. Initially, I thought the QueryHints.READ_ONLY would be exactly what I need, however, according to the documentation it can only be used when using the shared-cache and for non-transactional queries only. Both doesn't apply in our use case.


6. Avoid StackOverflowError for certain entity models

Imagine a single entity which represents some kind of linked list element. Each record/entity references to a previous record to define the chain
(by using an EAGERly fetched ManyToOne reference).
The very first element has a null reference to the previous record.

Even though this kind of data structure is perfectly valid, EclipseLink has some issues with it: When the "chain" grows, EclipseLink will sooner or later throw an StackOverflowError. The reason for this is that eagerly fetched references are processed using a recursion-based approach. For each element, the stack increases, and we had cases where a 30-element chain already caused a StackOverflowError with default stack size settings in an enterprise-grade linux environment. I'm not aware of any workaround (beside using a lazy previous reference or increasing stack size) to avoid that issue.


7. Exploit parallelism of CPU bound tasks

Some tasks within EclipseLink are quite CPU intense, e.g. change tracking calculations or creating backup clones in large transactions. Throughput can be significantly increased in certain scenarios if Eclipselink would exploit parallelism of such tasks by using multiple threads
(should be configurable).


8. Weaving: Eliminate need for a backup clone in certain scenarios

This is something more experimental:
For entities with a significant amount of mappings, the backup clone adds significantly CPU and memory overhead. In certain scenarios where no or only little fields changed and where weaving is used, alternative methods might perform much better here, e.g. by storing the original values in the original clone before changing them
(that the original clone contains both database and changed values).


9. Address bug reports that affect correctness

Correctness should be the most crucial feature for any ORM.
Please have a look at corresponding bug reports that affect correctness, e.g.:

349477 (42 votes)
391279 (35 votes)
371743 (16 votes)
247662 (15 votes)
416837 (12 votes)
467470 (12 votes)
416837


10. Care about startup time

EclipseLink takes (relatively) long to startup when having large persistence units and/or classpaths. Most of the time is spent within I/O operations which can be avoided in many scenarios (maybe configurable).

In certain short-living EntityManagerFactory scenarios (e.g. unit/automated testing) it does matter significantly whether EclipseLink needs 1 or 3 seconds to startup (please also take a look at bug 352845).


11. Address open bug reports

Currently, there are ~155 open, unresolved and unassigned critical/blocker bug reports for EclipseLink in the bugzilla.
Same is true for feature requests with a significant number of votes.
I'm sure the community would appreciate if they finally get some response for some of them.


Finally, I'd like to thank all involved developers/companies for their great work related to EclipseLink!


Regards,
Patric





Back to the top