[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| [eclipselink-dev] Some ideas for EclipseLink 3.0.0 | 
Hello,
I'm not sure if this is the correct place for this kind of mail - if 
not, please let me know where to post it instead.
We're working with EclipseLink for a long time and I'd like to share 
with you some thoughts of our work with it and
our proposals for the upcoming EclipseLink release 3.0.0.
I'm sorry that it got a little bit long but hopefully it will be of some 
help for you.
Most of them are performance or correctness related and I'm quite 
confident that a significant number of users might benefit from the 
following improvements:
1. Implicitly use (OUTER) JOIN FETCH for EAGERly loaded ManyToOne 
references
If an entity contains a ManyToOne reference with FetchType.EAGER 
(default) or when the corresponding weaving technique is disabled,
EclipseLink issues a separate (sub-)query for each of those references 
when retrieving it.
In case of complex hierarchies of those ManyToOne references, this may 
result in a dramatic performance drop as the fetch of
one simple query might cause many additional queries and round trips to 
the database.
IMHO, EclipseLink should implicitly use an (OUTER) JOIN FETCH for 
fetching EAGER references
(maybe as a configurable option of the Session/EntityManager).
This should affect all queries irrespective of the source of such a 
query (e.g. find(), transparent indirection containers, Query).
To avoid too big / deep JOIN FETCH queries, there should be a limit 
(ideally configurable) on how many references (or levels)
should be considered for implicit JOIN FETCH queries.
We had scenarios where the introduction of a JOIN FETCH increased 
single-threaded throughput by factors of 5-10.
I would love to see that kind of "tuning" to be automatically done in 
EclipseLink.
(BTW, LAZY loaded references can only help here in cases where the 
ManyToOne reference is actually NOT accessed, otherwise the
performance impact matches the one for eagerly fetched ManyToOne 
references).
2. Enable a way to provide query hints to transparent indirection 
containers
Currently, I'm not aware of a way to provide query hints for transparent 
indirection containers like IndirectSet.
This prevents us to provide performance related query hints like 
join/batch-fetches or FetchSize
(ideally with the option of specifying them globally, e.g. use fetchSize 
of xyz for all indirect container loads)
3. Improve sequence caching
Using sequence caching is a good idea in many scenarios.
However, there's still room of improvement:
- Introduce a configuration option to share the sequence cache across 
transactions. Currently, it looks like each transaction has its own 
sequence cache.
This lowers the cache efficiency (and saves unnecessary database round 
trips) for transactions with only a few persists.
In most cases, especially where sequence values are retrieved
from a globally synchronized database sequence generator, there's no 
reason why the sequence cache should not be shared across 
transactions/client sessions.
- Expose the sequence cache to an official API so that non-JPA code can 
also benefit from EclipseLink sequence caching,
hencing improving its efficiency even more.
4. Make EntityManager.getReference() database roundtrip free
Currently, there is a little difference in behavior between 
EntityManager.find() and EntityManager.getReference().
Both issue database calls which is not required for the latter according 
the JPA spec. This lowers the efficiency of getReference().
Imagine that you just want to link an entity (with a known PK) to a 
newly persisted one - getReference() would be perfect for that job
without the need of fetching anything from the database (e.g. by 
returning a proxy).
Other JPA implementations do better here (Hibernate for example).
5. Add support for retrieving detached entities within transactions
Many entities retrieved by a JPA provider are not changed at all.
Often, this is known at development time. If EclipseLink would support 
some kind of hint (e.g. READ_DETACHED)
to return a detached entity (even within a transaction), significant 
amount of work and memory for building and managing the backup clone 
could be saved.
Initially, I thought the QueryHints.READ_ONLY would be exactly what I 
need, however, according to the documentation
it can only be used when using the shared-cache and for 
non-transactional queries only. Both doesn't apply in our use case.
6. Avoid StackOverflowError for certain entity models
Imagine a single entity which represents some kind of linked list 
element. Each record/entity references to a previous record to define 
the chain
(by using an EAGERly fetched ManyToOne reference).
The very first element has a null reference to the previous record.
Even though this kind of data structure is perfectly valid, EclipseLink 
has some issues with it:
When the "chain" grows, EclipseLink will sooner or later throw an 
StackOverflowError.
The reason for this is that eagerly fetched references are processed 
using a recursion-based approach.
For each element, the stack increases, and we had cases where a 
30-element chain already caused a StackOverflowError with default stack 
size
settings in an enterprise-grade linux environment. I'm not aware of any 
workaround
(beside using a lazy previous reference or increasing stack size) to 
avoid that issue.
7. Exploit parallelism of CPU bound tasks
Some tasks within EclipseLink are quite CPU intense, e.g. change 
tracking calculations or creating backup clones in large transactions.
Throughput can be significantly increased in certain scenarios if 
Eclipselink would exploit parallelism of such tasks by using multiple 
threads
(should be configurable).
8. Weaving: Eliminate need for a backup clone in certain scenarios
This is something more experimental:
For entities with a significant amount of mappings, the backup clone 
adds significantly CPU and memory overhead.
In certain scenarios where no or only little fields changed and where 
weaving is used, alternative methods might perform much better here,
e.g. by storing the original values in the original clone before 
changing them
(that the original clone contains both database and changed values).
9. Address bug reports that affect correctness
Correctness should be the most crucial feature for any ORM.
Please have a look at corresponding bug reports that affect correctness, 
e.g.:
349477 (42 votes)
391279 (35 votes)
371743 (16 votes)
247662 (15 votes)
416837 (12 votes)
467470 (12 votes)
416837
10. Care about startup time
EclipseLink takes (relatively) long to startup when having large 
persistence units and/or classpaths.
Most of the time is spent within I/O operations which can be avoided in 
many scenarios (maybe configurable).
In certain short-living EntityManagerFactory scenarios (e.g. 
unit/automated testing) it does matter significantly
whether EclipseLink needs 1 or 3 seconds to startup (please also take a 
look at bug 352845).
11. Address open bug reports
Currently, there are ~155 open, unresolved and unassigned 
critical/blocker bug reports for EclipseLink in the bugzilla.
Same is true for feature requests with a significant number of votes.
I'm sure the community would appreciate if they finally get some 
response for some of them.
Finally, I'd like to thank all involved developers/companies for their 
great work related to EclipseLink!
Regards,
Patric