Re: [eclipselink-dev] Code review: initial partitioning support, bug#328

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [eclipselink-dev] Code review: initial partitioning support, bug#328937

From: James Sutherland <JAMES.SUTHERLAND@xxxxxxxxxx>
Date: Wed, 17 Nov 2010 10:31:44 -0800 (PST)
Delivered-to: eclipselink-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/eclipselink-dev>
List-help: <mailto:eclipselink-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/eclipselink-dev>, <mailto:eclipselink-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/eclipselink-dev>, <mailto:eclipselink-dev-request@eclipse.org?subject=unsubscribe>

Samba,

I think being forced to deal with policies within policies would be more complex. Also the policies need to be integrated, they are not independent. Basic replication just replicates to a set of nodes. In range partitioning (or value, or hash), that also wanted to do replication, each range would need a set of connection pools to replicate to, there would not be a single connection pool, so having separate policies would not help. Having a multi policy still might be useful, and the partitioning framework would allow for this, you could create your own partitioning policy that had multiple policies within it and decided which policy to use based on the query.

It could be named LoadBalancingPolicy instead of PartitioningPolicy, but our focus is currently on partitioning, and right now in the industry it is the bigger buzz word. Load balancing is much simpler than data partitioning, you don't need an ORM to do load balancing, most application servers have some sort of load balancing and failover support just using a JDBC DataSouce. Data partitioning is a good fit with ORM because we control and understand the data, and you can't do data partitioning without knowledge and context of the data, so partitioning is our focus.

-----Original Message-----
From: Samba [mailto:saasira@xxxxxxxxx]
Sent: Wednesday, November 17, 2010 12:33 PM
To: Dev mailing list for Eclipse Persistence Services
Subject: Re: [eclipselink-dev] Code review: initial partitioning support, bug#328937

James,

I think I was not understood clearly here, so thought of giving a little more details for clarification.

I'm saying that the ClassDesiptor.java class should have an instance of LoadBalancingPolicy (a new name ???) instead of PartitioningPolicy. We can load balance using partitioning or replication and hence using appropriate names would make future course clear.

which is like :

public abstract class LoadBalancingPolicy {

private PartitioningPolicy partitioningPolicy;

private ReplicatonPolicy replicationPolicy;

//getters and setters

// minimum common code

}

A loadbalancing scheme may decide to implement either or these or even both (which is a very lofty goal, I understand).

Then, we can have implementations like RoundRobinLoadBalancingPolicy, ClusterLoadBalancingPolicy, etc which are only replication based, as well as RangeBasedLoadBalaningPolicy, ValueBasedLoadBalancingPolicy, etc which are only partition based.

I'm just trying to make terminology clear so that the implementations can concentrate what features to implement in a particular load balancing policy.

Thanks and Regards,

Samba

On Wed, Nov 17, 2010 at 7:48 AM, James Sutherland <JAMES.SUTHERLAND@xxxxxxxxxx> wrote:

Hello Samba,

Thank you for your comments, please also submit them to the design document discussion page so they can be tracked accordingly.

Database clustering is obviously a big complex area, and this is our first entrance into this space. We will not support everything imaginable in our first release. I think the policies outlined in the design doc cover the common use cases, and are probably even a little to ambitious for a first release.

What you seem to be requesting is support for both partitioning and replication... and load balancing for the same data. As you can understand this would be pretty complex. Scenarios like this are something that the design of the partitioning framework are capable of supporting, but given the complexity, this is not something we will direct support in our first release. To implement this you can define your own PartitioningPolicy subclass (or subclass the policy that matches your requirements the closest). Then for a given query you can define, in your own code, which databases to send the request to. You would need to define which set of servers each range should go to. For a read request, you could load balance across these servers. For a write request, you could write to each of the servers. If you detect an error in one of the servers, you can failover to the other.

I understand that the root class name PartitioningPolicy does not describe everything it can do adequately. But naming the class PartitioningLoadBalanacingReplicationAndFailoverPolicy would be a little too wordy; I think partitioning is the main usecase, hence its name.

-----Original Message-----
From: Samba [mailto:saasira@xxxxxxxxx]
Sent: Wednesday, November 17, 2010 10:33 AM
To: Dev mailing list for Eclipse Persistence Services
Subject: Re: [eclipselink-dev] Code review: initial partitioning support, bug#328937

Hi James,

I have few comments on the implementation; please read these as constructive criticism.

A PartitioningPolicy is supposed to be used for partitioning both the read queries as well as write statements such that queries get distributed and end up at different database instances.

A ReplicationPolicy, as noted in the comments above the class, is instrumented to duplicate all writes across all the configured nodes of replication.

A LoadBalancingPolicy can be an implementation of either of the above or a combination of both the partitioning and the replication features. So, we need to create a base class for LoadBalancingPolicy that can be exteded to support various ways of load balancing by utilizing replication or partitioning or both.

The difference I'm trying to bring out is that a ReplicationPolicy cannot be an extension of PartitioningPolicy.

Similarly, a LoadBalancingPolicy can have an instance each of PartitioningPolicy and ReplicationPolicy but it is not an extension of either of these; instead it can only be a composition of these two features.

How we provide load balancing can be dealt in the implementations like:

1. RounRobinLoadBalancingPolicy, CluserLoadBalancingPolicy, etc are possible if the data is only

replicated and not partitioned; they support active/passive fail-over. Their primary purpose is to support

fail over and optionally to reduce load on a single server. These implementations rely on having

ReplicationPolicy implementations and cannot have PartitioningPolicy instance.

2. RangeLoadBalancingPolicy, HashLoadBalancingPolicy,etc are possible with partitioning the data across

several nodes, their primary purpose is to provide scalability and performance.However we can also

replicate each partitioned node and thus can support passive fail-over. These policies will primarily reply

on having PartitioingPolicy implementation but can optionally also include ReplicationPolicy features as

well in order to support fail over in addition to scale and performance.

I hope I'm making some sense here :)

Thanks and Regards,
Samba

On Mon, Nov 8, 2010 at 7:57 AM, James Sutherland <JAMES.SUTHERLAND@xxxxxxxxxx> wrote:

Code review: initial partitioning support, bug#328937

https://bugs.eclipse.org/bugs/show_bug.cgi?id=328937

design doc,

http://wiki.eclipse.org/EclipseLink/DesignDocs/328937

Changes:

- added partitioningPolicy to ClassDescriptor

- added null check to FetchGroupManager to avoid null-pointer on failed deploy

- added PartitionPolicy abstract class, defines getConnections API

- added ReplicationPolicy, replicates writes to multiple pools

- added pool reference to Accessor, so it knows where it came from

- added acquire/release connection logging

- added partitioningPolicy to AbstractSession

- changed AbstractSession accessor to Collection accessors (and updated references)

- changed transaction to work with multiple accessors (2 stage commit)

- changed call execution to work with multiple accessors

- made client sessions (isolated, exclusive) execute calls consistently, added support for partitioning

- added @Overrides to sessions, some micro

- fixed finally connection release in ReferenceMapping

- changed DatabaseQuery accessor to Colleciton accessors

- changed SessionBroker getAccessor API to use same getAccessor for partitioning, only a single call, pass query

- added setURL to DatabaseLogin

- changed ClientSession writeAccessor to Map writeAccessors keyed on pool name

- changed ClientSession connection to be lazily assigned

- changed getAccessor on ClientSession to assign a connection if in a transaction to support backward compatiblity, and internal usage

- changed ServerSession call execution to support partitioning

- added JPA partitioned model

- changed JPA test framework methods to be instances methods and use inherited getPersistenceUnitName define in test avoid common mistakes in non default unit tests

- added JPA paritioning test switch using derby "cluster" and round robin and replication, tests only run on Derby as need to create multiple databases

- added batch-fetch example

- added partitioned, and isolated partitioned version of UnitOfWork test model, uses "virtual rack" (multiple connection pools to the same database)

Code review: Andrei (pending)

_______________________________________________
eclipselink-dev mailing list
eclipselink-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/eclipselink-dev

References:
- Re: [eclipselink-dev] Code review: initial partitioning support, bug#328937
  - From: Samba

Prev by Date: Re: [eclipselink-dev] Code review: initial partitioning support, bug#328937
Next by Date: [eclipselink-dev] fix for 214519 checked in
Previous by thread: Re: [eclipselink-dev] Code review: initial partitioning support, bug#328937
Next by thread: [eclipselink-dev] WG: [Bug 329773] MaxDB doesn't allow temporary tables to be used within XA transactions
Index(es):
- Date
- Thread

Breadcrumbs