Eclipse Community Forums: EclipseLink » Replication/union partitioning and elasticity

Help

Home

Home » Eclipse Projects » EclipseLink » Replication/union partitioning and elasticity

Show: Today's Messages :: Show Polls :: Message Navigator

Replication/union partitioning and elasticity [message #967625]

Thu, 01 November 2012 22:37

Miroslav Kandic

Messages: 7
Registered: October 2012

Junior Member

I am testing ElipseLink Partition as a solution for so called "big data" problem where terabytes of data are distributed/partitioned over hundreds or thousands of partitions (DBs).
One important aspect I wanted to test is how EclipseLink Partition supports the elasticity of platform.

I started with two partitions and created one entity which has replication-partitioning policy. EclipseLink, as expected, has created two replicas of my entity.
Then I added third partition creating the same schema in third DB, creating new data source in my APP server, configuring properly peristance.xml and adding third partition in the list of pools in replication-partitioning policy definition.
After starting application server I retrieved my previously created object (I saw that only one DB was contacted as expected), I updated one of properties and called the JPA persist.
What I noticed is that EclipsLink sent UPDATE SQL command to all 3 DBs not recognizing that entity does not exist in one of them and that it has to send an INSERT SQL command instead.

Is this a bug or I am doing something wrong?

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #968477 is a reply to message #967625]

Fri, 02 November 2012 13:41

Chris Delahunt

Messages: 1389
Registered: July 2009

Senior Member

I'm not sure how you would want this to be handled other than throwing an exception. If for some reason data doesn't exist in one of the databases, it would be wrong for Eclipselink to reinsert it if it was expected by this transaction to be there. There is no telling which database view is correct and you could be forcing in stale data.
If you are using replication, then the data in the new partion needs to have a copy of the data from the others. EclipseLink will replicate the changes it makes, but cannot resync the databases for you if there are differences from outside sources.

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #968721 is a reply to message #967625]

Fri, 02 November 2012 17:53

Miroslav Kandic

Messages: 7
Registered: October 2012

Junior Member

Maybe I did not understand correctly the idea behind recent movement in JPA community manifested by Hibernate Shard, EclipseLink Partition and OpenJPA Slice projects.
I understood this movement as a solution for "big data" which uses the existing RDBM technologies.
For most of enterprises, which have invested a lot of money in Oracle, DB2, etc., it is not acceptable to replace all those engines (and engineers) with new distributed and elastic databases such as Google Spanner/F1 , Hadoop/Impala, etc.
These new database technologies support elasticity, that is, they dynamically apply all policies to newly introduced node/partition. Some of them can do even dynamic resharding.

If my understanding is not to ambitious then JPA providers should support some aspects of elasticity. It is not trivial goal but can we check how difficult it is?

From application perspective the EclipseLink JPA should ensure that specified policy is always applied.
If entity X is partitioned by replication-partitioning policy that means entity X should exist in every partition/pool returned by the policy.
IMO, in this particular case the EclipseLink should check return value of JDBC UPDATE operation and if it is 0 (UPDATE command has changed 0 records) it should do SQL INSERT in that partition/pool. But, as you said, entity X can have reference (FK on DB level) to entity Y and inserting just object X can lead to FK constraint violation.

If we propose/enforce data design where data consistency must be preserved in every partition and consistency check is granted to the partition RDB, then that means we cannot have (virtual/fake) foreign keys across partitions. In another word, tree of entity instances is always in same partition. If we enforce/propose this design then we can derive some conclusions/rules; a) child od partitioned entity must be also partitioned by the same policy, b) parent of replicated entity must also be replicated.
In the example above, that means if entity X is replicated then entity Y must be also replicated. So, if we have to insert entity X in new partition we have to insert also entity Y and that has to be done before we insert entity X.

Since we have access to metadata (O/R mappings) we can get all necessary information about relationships and replication policies and, IMO, this should not be difficult to implement.

If we do not implement this feature within JPA then application developers have to implement it within some cluster management application which populates all necessary data in newly added partition before that partition is made available.

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #968992 is a reply to message #968721]

Fri, 02 November 2012 22:50

Chris Delahunt

Messages: 1389
Registered: July 2009

Senior Member

I am not sure my point was clear. You should file an en enhancement to get the functionality if you feel it will help. I was trying to point out that is not the way it was implemented, and that the functionality you are asking for will be might be dangerous in production and detrimental to performance.

What you are proposing is to assume that the partitions are not in synch. I'm not convinced anyone would want the JPA provider deciding which is correct and then fixing the others in transactions meant for application opperations. DB X stops for some reason, and didn't get notification that entity Y was removed in a transaction. if it is allowed to come back up without replicationg the missing transactions, this feature could push that stale data into the other databases with out warning, something that should cause alarmbells. There is no way to know which DB is right and which might be incorrect, and fixing it if this feature pushes that data everywhere might require rolling every transaction back to the point the offending server was brought back up. If this is something the application needs, then it needs to handle what should be done when it is detected in a more sophisticated way then a policy file. Just knowing that an update failed should not give the provider reason to try inserting it. The expense of constantly monitoring that each query returns the exact same results on each DB and keeping them the same isnt going to be just a check if rows were modified then insert type operation. Compare this to a one cost of replicating the data upfront: a one time cost that ensures all databases are in synch always so that any can be relied on if something happens to having to pay a price on each application operation and still not be sure that any individual DB is ever really the true representation of the data.

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #972615 is a reply to message #967625]

Mon, 05 November 2012 19:23

Miroslav Kandic

Messages: 7
Registered: October 2012

Junior Member

I tried to present a simplified version of the use case I am working on and that is reason why you understood the problem as issue of initial setup of newly introduced partition (DB).
Let me explain my use case with all details.

My application is multitenant network management system where the tenants have the devices managed by my application.
I have to support grouping of devices across tenants where one device group can have many devices and one device can be member of many groups. The attached class diagram represents major entities and their relationships. M:N relationship between DeviceGroup and Device is implemented by GroupMember entity. Association of DeviceGroup and GroupMember is a composition and in ORM it has cascade=ALL.

Devices are partitioned by tenantId property and the partitioning policy is defined by a mapping table which defines explicit mapping of tenant to partition, that is, this table has two properties; tenantId, partitionId.
Since predefined partitioning policies do not support this kind of partitioning I developed my own partitioning policy which is driven by this mapping table.

Since GrouoMember entity references the Device entity it has to be portioned by Device.tenantId too. I developed partitioning policy which does that, that is, which gets the corresponding instance of Device and then based on tenantId value and based on the mapping table it returns the appropriate list of Accessor.

Since DeviceGroup is parent of partitioned entity it has to be replicated to all partitions in which their members exist. I developed my own replication policy which analyzes the members and accordingly returns appropriate list of Accessor.

I have 3 tenants and 3 partitions where T1 is located in P1, T2 in P2 and T3 in P3. Each tenant has many devices.

I started with creation of DeviceGroup which has one member device from partition P1 and one from partition P2. Everything went fine; DeviceGroup instance is replicated to partitions P1 and P2, appropriate instances of GroupMember are created in P1 and P2 and all referential integrity constraints are obeyed.

Then I ran the next test; I added new device to already created DeviceGroup where that new device member is located in partition P3. When I call EntityManager's persist (deviceGroup) I got referential integrity constraint violation in partition P3. EclipseLink tried to insert GroupMember record in P3 but corresponding DeviceGroup record did not exist there.

IMO, EclipseLink should recognize that DeviceGroup record does not exist in partition P3 and it should insert it. My replication policy for DeviceGroup has returned all 3 accessors (for P1, P2 and P3) but EclipseLink generated the UPDATE SQL statement for all of them instead of generating the INSERT statement for partition P3.

Attachment: PartitioningTest_Main.jpeg
(Size: 35.42KB, Downloaded 279 times)

[Updated on: Mon, 05 November 2012 19:24]

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #973654 is a reply to message #972615]

Tue, 06 November 2012 14:06

James Sutherland

Messages: 1939
Registered: July 2009
Location: Ottawa, Canada

Senior Member

If you add a new database node, you are responsible for cloning any data that you want replicated from your other databases. Your database may provide tools to help you do this.

If you are using an Oracle RAC, then you can add and remove nodes without requiring anything special (and usage of replication would not be required).

EclipseLink cannot replicate an entire database at runtime, and you would not want to do this at runtime either.

James : Wiki : Book : Blog : Twitter

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #974104 is a reply to message #967625]

Tue, 06 November 2012 21:38

Miroslav Kandic

Messages: 7
Registered: October 2012

Junior Member

Please put aside the use case described in the first post, it is special case of use case described in my previous post.

The use case from my previous post is not about adding of new database node, there are three DB nodes all the time.
Again, I want to replicate DeviceGroup entities to the nodes where their members exist, not to all nodes.
I have created new DeviceGroup instance with two members; one located in partition 1 and second one in partition 2. That worked fine, that is, my DeviceGroup instance is replicated to both partitions; 1 and 2.
But when I added third member which is in partition 3 I was expecting that DeviceGroup instance will be replicated to the partition 3 too by EclipseLink.

There is a workaround if I replicate each entity with the replication policy to all nodes instead to the nodes where the members of collection exist but that is not what I wanted to do. Adopting this strategy, I can address the first use case when new DB partition is added; in that case I have to develop some DB setup utility which replicates all tables annotated with replication policy.

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #982943 is a reply to message #974104]

Tue, 13 November 2012 14:58

James Sutherland

Messages: 1939
Registered: July 2009
Location: Ottawa, Canada

Senior Member

How are you replicating your DeviceGroup to only 1 and 2 partitions? What partitioning policy are you using?
What is the object that has a relationship to the DeviceGroup, how is it partitioned, and how is its relationship partitioned?

In general, I'm not sure dynamically changing where objects are replicated is advisable. You should always replicate a specific object to the same nodes. If you have a relationship to the object from another object the relationship or queries should be partitioned to use the correct node.

James : Wiki : Book : Blog : Twitter

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #986276 is a reply to message #967625]

Mon, 19 November 2012 17:07

Miroslav Kandic

Messages: 7
Registered: October 2012

Junior Member

I am replicating DeviceGroup to each partition in which DeviceGroup has the members (Device) in order to satisfy local referential integrity constraints. I am using my own policy which analyzes the members (Device) and returns set of partitions in which members exist.
Initially I had two members, one from partitions 1 and one from partition 2 and my policy returned these two partition. That works fine, that is, DeviceGroup is replicated to these two partitions. Then I updated collection of members adding third member which is in partition 3. My policy returned now 3 partitions, 1, 2 and 3, but EclipseLink did not replicate DeviceGroup to partition 3.

Report message to a moderator

Re: Replication/union partitioning and elasticity [message #986491 is a reply to message #986276]

Tue, 20 November 2012 16:02

James Sutherland

Messages: 1939
Registered: July 2009
Location: Ottawa, Canada

Senior Member

You cannot change the partitions for an existing object. I assume EclipseLink did replicate it to all 3, but it replicated the UPDATE as it is an existing object.

If you wish to add partitions to existing objects you will need to replicate the INSERT of the data to the new partition yourself. Otherwise you could always replicate the object to all partitions if you want to be able to access on other partitions in the future.

You could log some sort of enhancement request to have EclipseLink somehow automatically detected a failed update on one partition and instead insert the data.

James : Wiki : Book : Blog : Twitter

Report message to a moderator

Previous Topic:	Can't make @MappedSuperclass work with EclipseLink / Spring Data JPA
Next Topic:	EclipseLink Partition and native query

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 18 11:53:53 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter