Hi together,
I believe we have some case of cache corruption. We can reproduce it within a test-case of our application with ~70% reliability.
How the issues manifest itself: within a transaction, a flush of the EntityManager triggers a Foreign key violation about another entry that has been added (and successfully updated) within the same transaction.
What we did to trigger the issue: we started passing id to the asynchronous workers (and let them fetch the entity themselves) instead of passing them the entity directly. This does put a lot of more work onto the entity manager, causing it to corrupt its cache (I believe). When we remove that modification, that issue does not happens. Which leads us to believe that using eclipselink asynchronously more than before is what triggers the issue.
Enclosed is an example of logs that get generated while the issue occurs. That log is trimmed to the instant where the erroneous unit-of-work (1298690816) is active. Some other are visible, those are the mentioned asynchronous workers. Let me go briefly through it:
An entity of type slaughter_party is first inserted (id: 640). That entity is queried, updated a few times. Later another entity of the same type is created with a (supposedly) reference to the first one (old_party_id). The flush of that second entity claims that the first one does not exist.
Sometimes, that same issue (a foreign key violation) happens with different objects, always related to a missing entity of type slaughter_party.
There is another cache error happening in this log (probably related): an entity of type join_partner_business (partner 635, business 607) is inserted with id 639 after checking that it doesn’t exists (select count …). That same procedure happens a few moments later and decides to insert a new join_partner_business (id 647) about the same relation. In case where it works, that second join entity is (rightfully) never inserted.
I tried with various DB backend (HSQL in-memory, H2 in-memory and H2 file), as those are quite easy to setup, they all trigger the same issue.
I hope you can shed some light on our issue and point me to my mistake as I don’t really want to believe that multi-threaded eclipselink is fundamentally broken.