|
|
Re: Replication/union partitioning and elasticity [message #968721 is a reply to message #967625] |
Fri, 02 November 2012 17:53 |
Miroslav Kandic Messages: 7 Registered: October 2012 |
Junior Member |
|
|
Maybe I did not understand correctly the idea behind recent movement in JPA community manifested by Hibernate Shard, EclipseLink Partition and OpenJPA Slice projects.
I understood this movement as a solution for "big data" which uses the existing RDBM technologies.
For most of enterprises, which have invested a lot of money in Oracle, DB2, etc., it is not acceptable to replace all those engines (and engineers) with new distributed and elastic databases such as Google Spanner/F1 , Hadoop/Impala, etc.
These new database technologies support elasticity, that is, they dynamically apply all policies to newly introduced node/partition. Some of them can do even dynamic resharding.
If my understanding is not to ambitious then JPA providers should support some aspects of elasticity. It is not trivial goal but can we check how difficult it is?
From application perspective the EclipseLink JPA should ensure that specified policy is always applied.
If entity X is partitioned by replication-partitioning policy that means entity X should exist in every partition/pool returned by the policy.
IMO, in this particular case the EclipseLink should check return value of JDBC UPDATE operation and if it is 0 (UPDATE command has changed 0 records) it should do SQL INSERT in that partition/pool. But, as you said, entity X can have reference (FK on DB level) to entity Y and inserting just object X can lead to FK constraint violation.
If we propose/enforce data design where data consistency must be preserved in every partition and consistency check is granted to the partition RDB, then that means we cannot have (virtual/fake) foreign keys across partitions. In another word, tree of entity instances is always in same partition. If we enforce/propose this design then we can derive some conclusions/rules; a) child od partitioned entity must be also partitioned by the same policy, b) parent of replicated entity must also be replicated.
In the example above, that means if entity X is replicated then entity Y must be also replicated. So, if we have to insert entity X in new partition we have to insert also entity Y and that has to be done before we insert entity X.
Since we have access to metadata (O/R mappings) we can get all necessary information about relationships and replication policies and, IMO, this should not be difficult to implement.
If we do not implement this feature within JPA then application developers have to implement it within some cluster management application which populates all necessary data in newly added partition before that partition is made available.
|
|
|
|
Re: Replication/union partitioning and elasticity [message #972615 is a reply to message #967625] |
Mon, 05 November 2012 19:23 |
Miroslav Kandic Messages: 7 Registered: October 2012 |
Junior Member |
|
|
I tried to present a simplified version of the use case I am working on and that is reason why you understood the problem as issue of initial setup of newly introduced partition (DB).
Let me explain my use case with all details.
My application is multitenant network management system where the tenants have the devices managed by my application.
I have to support grouping of devices across tenants where one device group can have many devices and one device can be member of many groups. The attached class diagram represents major entities and their relationships. M:N relationship between DeviceGroup and Device is implemented by GroupMember entity. Association of DeviceGroup and GroupMember is a composition and in ORM it has cascade=ALL.
Devices are partitioned by tenantId property and the partitioning policy is defined by a mapping table which defines explicit mapping of tenant to partition, that is, this table has two properties; tenantId, partitionId.
Since predefined partitioning policies do not support this kind of partitioning I developed my own partitioning policy which is driven by this mapping table.
Since GrouoMember entity references the Device entity it has to be portioned by Device.tenantId too. I developed partitioning policy which does that, that is, which gets the corresponding instance of Device and then based on tenantId value and based on the mapping table it returns the appropriate list of Accessor.
Since DeviceGroup is parent of partitioned entity it has to be replicated to all partitions in which their members exist. I developed my own replication policy which analyzes the members and accordingly returns appropriate list of Accessor.
I have 3 tenants and 3 partitions where T1 is located in P1, T2 in P2 and T3 in P3. Each tenant has many devices.
I started with creation of DeviceGroup which has one member device from partition P1 and one from partition P2. Everything went fine; DeviceGroup instance is replicated to partitions P1 and P2, appropriate instances of GroupMember are created in P1 and P2 and all referential integrity constraints are obeyed.
Then I ran the next test; I added new device to already created DeviceGroup where that new device member is located in partition P3. When I call EntityManager's persist (deviceGroup) I got referential integrity constraint violation in partition P3. EclipseLink tried to insert GroupMember record in P3 but corresponding DeviceGroup record did not exist there.
IMO, EclipseLink should recognize that DeviceGroup record does not exist in partition P3 and it should insert it. My replication policy for DeviceGroup has returned all 3 accessors (for P1, P2 and P3) but EclipseLink generated the UPDATE SQL statement for all of them instead of generating the INSERT statement for partition P3.
[Updated on: Mon, 05 November 2012 19:24] Report message to a moderator
|
|
|
Re: Replication/union partitioning and elasticity [message #973654 is a reply to message #972615] |
Tue, 06 November 2012 14:06 |
|
If you add a new database node, you are responsible for cloning any data that you want replicated from your other databases. Your database may provide tools to help you do this.
If you are using an Oracle RAC, then you can add and remove nodes without requiring anything special (and usage of replication would not be required).
EclipseLink cannot replicate an entire database at runtime, and you would not want to do this at runtime either.
James : Wiki : Book : Blog : Twitter
|
|
|
|
Re: Replication/union partitioning and elasticity [message #982943 is a reply to message #974104] |
Tue, 13 November 2012 14:58 |
|
How are you replicating your DeviceGroup to only 1 and 2 partitions? What partitioning policy are you using?
What is the object that has a relationship to the DeviceGroup, how is it partitioned, and how is its relationship partitioned?
In general, I'm not sure dynamically changing where objects are replicated is advisable. You should always replicate a specific object to the same nodes. If you have a relationship to the object from another object the relationship or queries should be partitioned to use the correct node.
James : Wiki : Book : Blog : Twitter
|
|
|
|
Re: Replication/union partitioning and elasticity [message #986491 is a reply to message #986276] |
Tue, 20 November 2012 16:02 |
|
You cannot change the partitions for an existing object. I assume EclipseLink did replicate it to all 3, but it replicated the UPDATE as it is an existing object.
If you wish to add partitions to existing objects you will need to replicate the INSERT of the data to the new partition yourself. Otherwise you could always replicate the object to all partitions if you want to be able to access on other partitions in the future.
You could log some sort of enhancement request to have EclipseLink somehow automatically detected a failed update on one partition and instead insert the data.
James : Wiki : Book : Blog : Twitter
|
|
|
Powered by
FUDForum. Page generated in 0.02833 seconds