Eclipse Community Forums: EMF » [CDO] Some thoughts on enhancements

Help

Home

Home » Modeling » EMF » [CDO] Some thoughts on enhancements

Show: Today's Messages :: Show Polls :: Message Navigator

[CDO] Some thoughts on enhancements [message #425311]

Sat, 22 November 2008 11:01

Stefan Winkler

Messages: 307
Registered: July 2009
Location: Germany

Senior Member

Hi,

during and around ESE a few ideas about CDO enhancements popped into my
mind.
I'm interested in feedback. How do you think of the following issues:

1. Serverless operation

I'm a user of CDO's large-model-handling capabilities. I like the way,
CDO materializes just those parts of a model in memory which are
currently needed and makes it possible that the rest is GC'd. I don't
use the multi-user/multi-client features. This is why I only have one
session and use the JVM connector.
However, I wondered, if it would be possible to eliminate the Net4j
layer completely in a way that the session is directly wired to a store.
This would decrease communication and memory overhead which is caused by
the cascaded revision caches (client- and server-side). So what do you
think?
Do the existing interfaces support connecting the session directly to
the internal-container-"server-side"-revision cache? If this would work,
maybe CDO would be an excellent back-end for the E4 platform, which is
more or less completely based on models if I understood that correctly
at ESE.

2. Query abstraction

I don't like two things in the current backend implementation:
a) Mapping Strategies pass around SQL where-parts as strings and
b) Queries are passed around as query-language specific strings.

Both issues could be adressed by an abstracted query, which is an object
that centrally represents a query. Maybe this could be handled in the
way regular expressions are handled.
- The user creates a query in a supported query language
- The query abstraction layer compiles the query into the abstract
representation
- At the store backend, the query is translated to the store-native
representation

So instead of

Client -----{OCL}-----> Server ------{OCL}-------> Store [parses and
translates to e.g.SQL]----> DB/MEM/...

we would have

Client --{OCL}-> AbstractQuery(AQ) ---> Server ---{AQ}--> Store
[translates to e.g.SQL]---->DB/MEM/...

If this is implemented in an efficient way, also store-internal queries
(like the above-mentioned where-parts) could be replaced by AQ
representations.

3. More intelligent mapping of references in DBStore

Currently, MultiReferences are stored in (souce_id, version, index,
target_id) tuples.
If in 1000 iterations an object O is added a reference and committed as in
for(i in 0..999) { o.getRef().add(foo[i]); commit; }
this would result in 1+2+3+4+5+6+...+999 rows in total. This does not
scale well, as revisions with a large number of references do need much
longer to be written -- even if the reference list has not been changed.

There was the idea of converting the reference list to a string
representation and store this like an attribute. However, depending on
the database system used, this results in a similar overhead, plus the
representation has to be created and parsed. (And BTW it violates the
basic principle of atomic DB rows - or 1NF in short).

My alternative idea would be to express the reference table as
(source_id, version_created, version_revised, index, target_id).
version_created would be the source version, for which the reference has
been created.
version_revised would be 0 if the reference still exists and if not ==0
it has been revised with the
creation of the given version. This would lead to no operation if the
references are unchanged,
O(1) if a reference is appended and only O(n) if a reference is removed
(and the subsequent indices are
updated).

4. Solution for OutOfMemoryErrors during huge commits

This one is a quote of Eike and is here just as a reminder.
If newObjects and dirtyObjects grow, because of a very large
transaction, it might me necessary to "swap out" those objects to
prevent OOME-s. (or transmitted to the server, or even temporarily
stored into the store).

So, discussion is opened :-)

Cheers,
Stefan

Report message to a moderator

Re: [CDO] Some thoughts on enhancements [message #425314 is a reply to message #425311]

Sat, 22 November 2008 12:28

Eike Stepper

Messages: 6682
Registered: July 2009

Senior Member

Stefan,

Comments below...

Stefan Winkler schrieb:
> Hi,
>
> during and around ESE a few ideas about CDO enhancements popped into
> my mind.
> I'm interested in feedback. How do you think of the following issues:
>
> 1. Serverless operation
>
> I'm a user of CDO's large-model-handling capabilities. I like the way,
> CDO materializes just those parts of a model in memory which are
> currently needed and makes it possible that the rest is GC'd. I don't
> use the multi-user/multi-client features. This is why I only have one
> session and use the JVM connector.
> However, I wondered, if it would be possible to eliminate the Net4j
> layer completely in a way that the session is directly wired to a
> store. This would decrease communication and memory overhead which is
> caused by the cascaded revision caches (client- and server-side). So
> what do you think?
> Do the existing interfaces support connecting the session directly to
> the internal-container-"server-side"-revision cache?
This requires a deep refactoring of the CDO internals, but I also think
this is a good idea!

> If this would work, maybe CDO would be an excellent back-end for the
> E4 platform, which is more or less completely based on models if I
> understood that correctly at ESE.
Interesting idea and I already planned for a session with Boris ;-)

>
>
> 2. Query abstraction
>
> I don't like two things in the current backend implementation:
You really mean the *DBStore* back-end!

> a) Mapping Strategies pass around SQL where-parts as strings and
What's bad about this?
It could and should be changed with only local impact...

> b) Queries are passed around as query-language specific strings.
>
> Both issues could be adressed by an abstracted query, which is an
> object that centrally represents a query. Maybe this could be handled
> in the way regular expressions are handled.
> - The user creates a query in a supported query language
> - The query abstraction layer compiles the query into the abstract
> representation
> - At the store backend, the query is translated to the store-native
> representation
>
> So instead of
>
> Client -----{OCL}-----> Server ------{OCL}-------> Store [parses and
> translates to e.g.SQL]----> DB/MEM/...
>
> we would have
>
> Client --{OCL}-> AbstractQuery(AQ) ---> Server ---{AQ}--> Store
> [translates to e.g.SQL]---->DB/MEM/...
>
> If this is implemented in an efficient way, also store-internal
> queries (like the above-mentioned where-parts) could be replaced by AQ
> representations.
I don't see a much value (which does not necessarily imply there is
none!) in producing additional translations for languages that are
back-end specifc anyway.
But I think a common (i.e. back-end independent) query language like OCL
or XPath is a good idea. We already have a bugzilla for this:

245658: [Query] Provide OCL query language
https://bugs.eclipse.org/bugs/show_bug.cgi?id=245658

>
>
> 3. More intelligent mapping of references in DBStore
>
> Currently, MultiReferences are stored in (souce_id, version, index,
> target_id) tuples.
> If in 1000 iterations an object O is added a reference and committed
> as in
> for(i in 0..999) { o.getRef().add(foo[i]); commit; }
> this would result in 1+2+3+4+5+6+...+999 rows in total. This does not
> scale well, as revisions with a large number of references do need
> much longer to be written -- even if the reference list has not been
> changed.
This is a known issue of the DBStore ;-(
I fully agree that we should have a more scalable option, too.

>
> There was the idea of converting the reference list to a string
> representation and store this like an attribute. However, depending on
> the database system used, this results in a similar overhead, plus the
> representation has to be created and parsed. (And BTW it violates the
> basic principle of atomic DB rows - or 1NF in short).
>
> My alternative idea would be to express the reference table as
> (source_id, version_created, version_revised, index, target_id).
> version_created would be the source version, for which the reference
> has been created.
> version_revised would be 0 if the reference still exists and if not
> ==0 it has been revised with the
> creation of the given version. This would lead to no operation if the
> references are unchanged,
> O(1) if a reference is appended and only O(n) if a reference is
> removed (and the subsequent indices are
> updated).
That sounds like a good idea.
Would you agree that it's better to add a general, non-auditing mode to
the DBStore first?

>
>
> 4. Solution for OutOfMemoryErrors during huge commits
>
> This one is a quote of Eike and is here just as a reminder.
> If newObjects and dirtyObjects grow, because of a very large
> transaction, it might me necessary to "swap out" those objects to
> prevent OOME-s. (or transmitted to the server, or even temporarily
> stored into the store).
Nothing to add from my side :P

Cheers
/Eike

----
http://thegordian.blogspot.com

>
>
> So, discussion is opened :-)
>
> Cheers,
> Stefan

Cheers
/Eike

----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper

Report message to a moderator

Re: [CDO] Some thoughts on enhancements [message #425315 is a reply to message #425311]

Sat, 22 November 2008 12:40

Simon Mc Duff

Messages: 596
Registered: July 2009

Senior Member

Stefan Winkler wrote:
> Hi,
>
> during and around ESE a few ideas about CDO enhancements popped into my
> mind.
> I'm interested in feedback. How do you think of the following issues:
>
> 1. Serverless operation
>
> I'm a user of CDO's large-model-handling capabilities. I like the way,
> CDO materializes just those parts of a model in memory which are
> currently needed and makes it possible that the rest is GC'd. I don't
> use the multi-user/multi-client features. This is why I only have one
> session and use the JVM connector.
> However, I wondered, if it would be possible to eliminate the Net4j
> layer completely in a way that the session is directly wired to a store.
> This would decrease communication and memory overhead which is caused by
> the cascaded revision caches (client- and server-side). So what do you
> think?
> Do the existing interfaces support connecting the session directly to
> the internal-container-"server-side"-revision cache? If this would work,
> maybe CDO would be an excellent back-end for the E4 platform, which is
> more or less completely based on models if I understood that correctly
> at ESE.
>
>
I don't see how can it be done without extra works each time we create a
new signal. Did you measure the time spend in the JVM connector to see
the gain we will have ?
Signal are used at many differents place at the moment. We could maybe
used a strategy for all of them... It is feasible I believed... with
some refactoring... but I would like to measure the gain first!

> 2. Query abstraction
>
> I don't like two things in the current backend implementation:
> a) Mapping Strategies pass around SQL where-parts as strings and
> b) Queries are passed around as query-language specific strings.
>
> Both issues could be adressed by an abstracted query, which is an object
> that centrally represents a query. Maybe this could be handled in the
> way regular expressions are handled.
> - The user creates a query in a supported query language
> - The query abstraction layer compiles the query into the abstract
> representation
> - At the store backend, the query is translated to the store-native
> representation
>
> So instead of
>
> Client -----{OCL}-----> Server ------{OCL}-------> Store [parses and
> translates to e.g.SQL]----> DB/MEM/...
>
> we would have
>
> Client --{OCL}-> AbstractQuery(AQ) ---> Server ---{AQ}--> Store
> [translates to e.g.SQL]---->DB/MEM/...
>
> If this is implemented in an efficient way, also store-internal queries
> (like the above-mentioned where-parts) could be replaced by AQ
> representations.
>
In general, it is a good idea to have objects for query... but i didn't
want to create a query framework!! It can be very complicated so I
prefer to concentrate on a requirements/solutions ?
Can you formulate the requirements/problems for the end-users with the
current approach ? So we could address them. Maybe it will lead on what
you described.

>
> 3. More intelligent mapping of references in DBStore
>
> Currently, MultiReferences are stored in (souce_id, version, index,
> target_id) tuples.
> If in 1000 iterations an object O is added a reference and committed as in
> for(i in 0..999) { o.getRef().add(foo[i]); commit; }
> this would result in 1+2+3+4+5+6+...+999 rows in total. This does not
> scale well, as revisions with a large number of references do need much
> longer to be written -- even if the reference list has not been changed.
>
> There was the idea of converting the reference list to a string
> representation and store this like an attribute. However, depending on
> the database system used, this results in a similar overhead, plus the
> representation has to be created and parsed. (And BTW it violates the
> basic principle of atomic DB rows - or 1NF in short).
>
> My alternative idea would be to express the reference table as
> (source_id, version_created, version_revised, index, target_id).
> version_created would be the source version, for which the reference has
> been created.
> version_revised would be 0 if the reference still exists and if not ==0
> it has been revised with the
> creation of the given version. This would lead to no operation if the
> references are unchanged,
> O(1) if a reference is appended and only O(n) if a reference is removed
> (and the subsequent indices are
> updated).

Did you think to only keep the last version and the inverse deltas to
go back at a specific version ? It will eliminate most of the problems
you have.

>
>
> 4. Solution for OutOfMemoryErrors during huge commits
>
> This one is a quote of Eike and is here just as a reminder.
> If newObjects and dirtyObjects grow, because of a very large
> transaction, it might me necessary to "swap out" those objects to
> prevent OOME-s. (or transmitted to the server, or even temporarily
> stored into the store).
>
Yes, this is a common strategy used by databases when modified objects
too many objects. It is an excellent idea!

>
> So, discussion is opened :-)
>
> Cheers,
> Stefan

Report message to a moderator

Re: [CDO] Some thoughts on enhancements [message #425359 is a reply to message #425314]

Sat, 22 November 2008 12:48

Thomas Schindl

Messages: 6651
Registered: July 2009

Senior Member

Eike Stepper schrieb:
> Stefan,
>
> Comments below...
>
>
>
> Stefan Winkler schrieb:
>> Hi,
>>
>> during and around ESE a few ideas about CDO enhancements popped into
>> my mind.
>> I'm interested in feedback. How do you think of the following issues:
>>
>> 1. Serverless operation
>>
>> I'm a user of CDO's large-model-handling capabilities. I like the way,
>> CDO materializes just those parts of a model in memory which are
>> currently needed and makes it possible that the rest is GC'd. I don't
>> use the multi-user/multi-client features. This is why I only have one
>> session and use the JVM connector.
>> However, I wondered, if it would be possible to eliminate the Net4j
>> layer completely in a way that the session is directly wired to a
>> store. This would decrease communication and memory overhead which is
>> caused by the cascaded revision caches (client- and server-side). So
>> what do you think?
>> Do the existing interfaces support connecting the session directly to
>> the internal-container-"server-side"-revision cache?
> This requires a deep refactoring of the CDO internals, but I also think
> this is a good idea!
>
>> If this would work, maybe CDO would be an excellent back-end for the
>> E4 platform, which is more or less completely based on models if I
>> understood that correctly at ESE.
> Interesting idea and I already planned for a session with Boris ;-)
>

As one of the guys part of the E4-Modeled Workbench Team I'm also
interested in taking part in this discussion. I already had this idea
some time ago but had no time to implement it (there've been more
important working areas) because it fits in my idea of RCP-Applications
and Collaborated RCP-Application development and deployment in the future.

Think about how developing and deploying would look like if not only the
workbench itself is model using Ecore but also the rest of the UI you
are currently looking at (the vision of a Live-DOM backing up the
declarative defined UI) :-)

Tom

--
B e s t S o l u t i o n . at
------------------------------------------------------------ --------
Tom Schindl JFace-Committer
------------------------------------------------------------ --------

Report message to a moderator

Re: [CDO] Some thoughts on enhancements [message #425360 is a reply to message #425359]

Sat, 22 November 2008 12:56

Eike Stepper

Messages: 6682
Registered: July 2009

Senior Member

Tom,

I'd appreciate if you'd like to participate in our telecon. Ed and Simon
will also be on board ;-)
I'll keep you informed on the schedule...

Cheers
/Eike

----
http://thegordian.blogspot.com

Tom Schindl schrieb:
> Eike Stepper schrieb:
>
>> Stefan,
>>
>> Comments below...
>>
>>
>>
>> Stefan Winkler schrieb:
>>
>>> Hi,
>>>
>>> during and around ESE a few ideas about CDO enhancements popped into
>>> my mind.
>>> I'm interested in feedback. How do you think of the following issues:
>>>
>>> 1. Serverless operation
>>>
>>> I'm a user of CDO's large-model-handling capabilities. I like the way,
>>> CDO materializes just those parts of a model in memory which are
>>> currently needed and makes it possible that the rest is GC'd. I don't
>>> use the multi-user/multi-client features. This is why I only have one
>>> session and use the JVM connector.
>>> However, I wondered, if it would be possible to eliminate the Net4j
>>> layer completely in a way that the session is directly wired to a
>>> store. This would decrease communication and memory overhead which is
>>> caused by the cascaded revision caches (client- and server-side). So
>>> what do you think?
>>> Do the existing interfaces support connecting the session directly to
>>> the internal-container-"server-side"-revision cache?
>>>
>> This requires a deep refactoring of the CDO internals, but I also think
>> this is a good idea!
>>
>>
>>> If this would work, maybe CDO would be an excellent back-end for the
>>> E4 platform, which is more or less completely based on models if I
>>> understood that correctly at ESE.
>>>
>> Interesting idea and I already planned for a session with Boris ;-)
>>
>>
>
> As one of the guys part of the E4-Modeled Workbench Team I'm also
> interested in taking part in this discussion. I already had this idea
> some time ago but had no time to implement it (there've been more
> important working areas) because it fits in my idea of RCP-Applications
> and Collaborated RCP-Application development and deployment in the future.
>
> Think about how developing and deploying would look like if not only the
> workbench itself is model using Ecore but also the rest of the UI you
> are currently looking at (the vision of a Live-DOM backing up the
> declarative defined UI) :-)
>
> Tom
>
>

Cheers
/Eike

----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper

Report message to a moderator

Previous Topic:	[CDO] Welcome Stefan Winkler as new committer
Next Topic:	[Teneo] Contained eObject referenced by another eObject?

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 25 11:30:08 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter