Replication/union partitioning and elasticity [message #967625] |
Thu, 01 November 2012 18:37  |
Eclipse User |
|
|
|
I am testing ElipseLink Partition as a solution for so called "big data" problem where terabytes of data are distributed/partitioned over hundreds or thousands of partitions (DBs).
One important aspect I wanted to test is how EclipseLink Partition supports the elasticity of platform.
I started with two partitions and created one entity which has replication-partitioning policy. EclipseLink, as expected, has created two replicas of my entity.
Then I added third partition creating the same schema in third DB, creating new data source in my APP server, configuring properly peristance.xml and adding third partition in the list of pools in replication-partitioning policy definition.
After starting application server I retrieved my previously created object (I saw that only one DB was contacted as expected), I updated one of properties and called the JPA persist.
What I noticed is that EclipsLink sent UPDATE SQL command to all 3 DBs not recognizing that entity does not exist in one of them and that it has to send an INSERT SQL command instead.
Is this a bug or I am doing something wrong?
|
|
|
|
Re: Replication/union partitioning and elasticity [message #968721 is a reply to message #967625] |
Fri, 02 November 2012 13:53   |
Eclipse User |
|
|
|
Maybe I did not understand correctly the idea behind recent movement in JPA community manifested by Hibernate Shard, EclipseLink Partition and OpenJPA Slice projects.
I understood this movement as a solution for "big data" which uses the existing RDBM technologies.
For most of enterprises, which have invested a lot of money in Oracle, DB2, etc., it is not acceptable to replace all those engines (and engineers) with new distributed and elastic databases such as Google Spanner/F1 , Hadoop/Impala, etc.
These new database technologies support elasticity, that is, they dynamically apply all policies to newly introduced node/partition. Some of them can do even dynamic resharding.
If my understanding is not to ambitious then JPA providers should support some aspects of elasticity. It is not trivial goal but can we check how difficult it is?
From application perspective the EclipseLink JPA should ensure that specified policy is always applied.
If entity X is partitioned by replication-partitioning policy that means entity X should exist in every partition/pool returned by the policy.
IMO, in this particular case the EclipseLink should check return value of JDBC UPDATE operation and if it is 0 (UPDATE command has changed 0 records) it should do SQL INSERT in that partition/pool. But, as you said, entity X can have reference (FK on DB level) to entity Y and inserting just object X can lead to FK constraint violation.
If we propose/enforce data design where data consistency must be preserved in every partition and consistency check is granted to the partition RDB, then that means we cannot have (virtual/fake) foreign keys across partitions. In another word, tree of entity instances is always in same partition. If we enforce/propose this design then we can derive some conclusions/rules; a) child od partitioned entity must be also partitioned by the same policy, b) parent of replicated entity must also be replicated.
In the example above, that means if entity X is replicated then entity Y must be also replicated. So, if we have to insert entity X in new partition we have to insert also entity Y and that has to be done before we insert entity X.
Since we have access to metadata (O/R mappings) we can get all necessary information about relationships and replication policies and, IMO, this should not be difficult to implement.
If we do not implement this feature within JPA then application developers have to implement it within some cluster management application which populates all necessary data in newly added partition before that partition is made available.
|
|
|
|
Re: Replication/union partitioning and elasticity [message #972615 is a reply to message #967625] |
Mon, 05 November 2012 14:23   |
Eclipse User |
|
|
|
I tried to present a simplified version of the use case I am working on and that is reason why you understood the problem as issue of initial setup of newly introduced partition (DB).
Let me explain my use case with all details.
My application is multitenant network management system where the tenants have the devices managed by my application.
I have to support grouping of devices across tenants where one device group can have many devices and one device can be member of many groups. The attached class diagram represents major entities and their relationships. M:N relationship between DeviceGroup and Device is implemented by GroupMember entity. Association of DeviceGroup and GroupMember is a composition and in ORM it has cascade=ALL.
Devices are partitioned by tenantId property and the partitioning policy is defined by a mapping table which defines explicit mapping of tenant to partition, that is, this table has two properties; tenantId, partitionId.
Since predefined partitioning policies do not support this kind of partitioning I developed my own partitioning policy which is driven by this mapping table.
Since GrouoMember entity references the Device entity it has to be portioned by Device.tenantId too. I developed partitioning policy which does that, that is, which gets the corresponding instance of Device and then based on tenantId value and based on the mapping table it returns the appropriate list of Accessor.
Since DeviceGroup is parent of partitioned entity it has to be replicated to all partitions in which their members exist. I developed my own replication policy which analyzes the members and accordingly returns appropriate list of Accessor.
I have 3 tenants and 3 partitions where T1 is located in P1, T2 in P2 and T3 in P3. Each tenant has many devices.
I started with creation of DeviceGroup which has one member device from partition P1 and one from partition P2. Everything went fine; DeviceGroup instance is replicated to partitions P1 and P2, appropriate instances of GroupMember are created in P1 and P2 and all referential integrity constraints are obeyed.
Then I ran the next test; I added new device to already created DeviceGroup where that new device member is located in partition P3. When I call EntityManager's persist (deviceGroup) I got referential integrity constraint violation in partition P3. EclipseLink tried to insert GroupMember record in P3 but corresponding DeviceGroup record did not exist there.
IMO, EclipseLink should recognize that DeviceGroup record does not exist in partition P3 and it should insert it. My replication policy for DeviceGroup has returned all 3 accessors (for P1, P2 and P3) but EclipseLink generated the UPDATE SQL statement for all of them instead of generating the INSERT statement for partition P3.
[Updated on: Mon, 05 November 2012 14:24] by Moderator
|
|
|
|
|
|
|
Re: Replication/union partitioning and elasticity [message #986491 is a reply to message #986276] |
Tue, 20 November 2012 11:02  |
Eclipse User |
|
|
|
You cannot change the partitions for an existing object. I assume EclipseLink did replicate it to all 3, but it replicated the UPDATE as it is an existing object.
If you wish to add partitions to existing objects you will need to replicate the INSERT of the data to the new partition yourself. Otherwise you could always replicate the object to all partitions if you want to be able to access on other partitions in the future.
You could log some sort of enhancement request to have EclipseLink somehow automatically detected a failed update on one partition and instead insert the data.
|
|
|
Powered by
FUDForum. Page generated in 0.05528 seconds