Home » Modeling » EMF » [CDO] Some thoughts on enhancements
[CDO] Some thoughts on enhancements [message #425311] |
Sat, 22 November 2008 11:01 |
Stefan Winkler Messages: 307 Registered: July 2009 Location: Germany |
Senior Member |
|
|
Hi,
during and around ESE a few ideas about CDO enhancements popped into my
mind.
I'm interested in feedback. How do you think of the following issues:
1. Serverless operation
I'm a user of CDO's large-model-handling capabilities. I like the way,
CDO materializes just those parts of a model in memory which are
currently needed and makes it possible that the rest is GC'd. I don't
use the multi-user/multi-client features. This is why I only have one
session and use the JVM connector.
However, I wondered, if it would be possible to eliminate the Net4j
layer completely in a way that the session is directly wired to a store.
This would decrease communication and memory overhead which is caused by
the cascaded revision caches (client- and server-side). So what do you
think?
Do the existing interfaces support connecting the session directly to
the internal-container-"server-side"-revision cache? If this would work,
maybe CDO would be an excellent back-end for the E4 platform, which is
more or less completely based on models if I understood that correctly
at ESE.
2. Query abstraction
I don't like two things in the current backend implementation:
a) Mapping Strategies pass around SQL where-parts as strings and
b) Queries are passed around as query-language specific strings.
Both issues could be adressed by an abstracted query, which is an object
that centrally represents a query. Maybe this could be handled in the
way regular expressions are handled.
- The user creates a query in a supported query language
- The query abstraction layer compiles the query into the abstract
representation
- At the store backend, the query is translated to the store-native
representation
So instead of
Client -----{OCL}-----> Server ------{OCL}-------> Store [parses and
translates to e.g.SQL]----> DB/MEM/...
we would have
Client --{OCL}-> AbstractQuery(AQ) ---> Server ---{AQ}--> Store
[translates to e.g.SQL]---->DB/MEM/...
If this is implemented in an efficient way, also store-internal queries
(like the above-mentioned where-parts) could be replaced by AQ
representations.
3. More intelligent mapping of references in DBStore
Currently, MultiReferences are stored in (souce_id, version, index,
target_id) tuples.
If in 1000 iterations an object O is added a reference and committed as in
for(i in 0..999) { o.getRef().add(foo[i]); commit; }
this would result in 1+2+3+4+5+6+...+999 rows in total. This does not
scale well, as revisions with a large number of references do need much
longer to be written -- even if the reference list has not been changed.
There was the idea of converting the reference list to a string
representation and store this like an attribute. However, depending on
the database system used, this results in a similar overhead, plus the
representation has to be created and parsed. (And BTW it violates the
basic principle of atomic DB rows - or 1NF in short).
My alternative idea would be to express the reference table as
(source_id, version_created, version_revised, index, target_id).
version_created would be the source version, for which the reference has
been created.
version_revised would be 0 if the reference still exists and if not ==0
it has been revised with the
creation of the given version. This would lead to no operation if the
references are unchanged,
O(1) if a reference is appended and only O(n) if a reference is removed
(and the subsequent indices are
updated).
4. Solution for OutOfMemoryErrors during huge commits
This one is a quote of Eike and is here just as a reminder.
If newObjects and dirtyObjects grow, because of a very large
transaction, it might me necessary to "swap out" those objects to
prevent OOME-s. (or transmitted to the server, or even temporarily
stored into the store).
So, discussion is opened :-)
Cheers,
Stefan
|
|
|
Re: [CDO] Some thoughts on enhancements [message #425314 is a reply to message #425311] |
Sat, 22 November 2008 12:28 |
|
Stefan,
Comments below...
Stefan Winkler schrieb:
> Hi,
>
> during and around ESE a few ideas about CDO enhancements popped into
> my mind.
> I'm interested in feedback. How do you think of the following issues:
>
> 1. Serverless operation
>
> I'm a user of CDO's large-model-handling capabilities. I like the way,
> CDO materializes just those parts of a model in memory which are
> currently needed and makes it possible that the rest is GC'd. I don't
> use the multi-user/multi-client features. This is why I only have one
> session and use the JVM connector.
> However, I wondered, if it would be possible to eliminate the Net4j
> layer completely in a way that the session is directly wired to a
> store. This would decrease communication and memory overhead which is
> caused by the cascaded revision caches (client- and server-side). So
> what do you think?
> Do the existing interfaces support connecting the session directly to
> the internal-container-"server-side"-revision cache?
This requires a deep refactoring of the CDO internals, but I also think
this is a good idea!
> If this would work, maybe CDO would be an excellent back-end for the
> E4 platform, which is more or less completely based on models if I
> understood that correctly at ESE.
Interesting idea and I already planned for a session with Boris ;-)
>
>
> 2. Query abstraction
>
> I don't like two things in the current backend implementation:
You really mean the *DBStore* back-end!
> a) Mapping Strategies pass around SQL where-parts as strings and
What's bad about this?
It could and should be changed with only local impact...
> b) Queries are passed around as query-language specific strings.
>
> Both issues could be adressed by an abstracted query, which is an
> object that centrally represents a query. Maybe this could be handled
> in the way regular expressions are handled.
> - The user creates a query in a supported query language
> - The query abstraction layer compiles the query into the abstract
> representation
> - At the store backend, the query is translated to the store-native
> representation
>
> So instead of
>
> Client -----{OCL}-----> Server ------{OCL}-------> Store [parses and
> translates to e.g.SQL]----> DB/MEM/...
>
> we would have
>
> Client --{OCL}-> AbstractQuery(AQ) ---> Server ---{AQ}--> Store
> [translates to e.g.SQL]---->DB/MEM/...
>
> If this is implemented in an efficient way, also store-internal
> queries (like the above-mentioned where-parts) could be replaced by AQ
> representations.
I don't see a much value (which does not necessarily imply there is
none!) in producing additional translations for languages that are
back-end specifc anyway.
But I think a common (i.e. back-end independent) query language like OCL
or XPath is a good idea. We already have a bugzilla for this:
245658: [Query] Provide OCL query language
https://bugs.eclipse.org/bugs/show_bug.cgi?id=245658
>
>
> 3. More intelligent mapping of references in DBStore
>
> Currently, MultiReferences are stored in (souce_id, version, index,
> target_id) tuples.
> If in 1000 iterations an object O is added a reference and committed
> as in
> for(i in 0..999) { o.getRef().add(foo[i]); commit; }
> this would result in 1+2+3+4+5+6+...+999 rows in total. This does not
> scale well, as revisions with a large number of references do need
> much longer to be written -- even if the reference list has not been
> changed.
This is a known issue of the DBStore ;-(
I fully agree that we should have a more scalable option, too.
>
> There was the idea of converting the reference list to a string
> representation and store this like an attribute. However, depending on
> the database system used, this results in a similar overhead, plus the
> representation has to be created and parsed. (And BTW it violates the
> basic principle of atomic DB rows - or 1NF in short).
>
> My alternative idea would be to express the reference table as
> (source_id, version_created, version_revised, index, target_id).
> version_created would be the source version, for which the reference
> has been created.
> version_revised would be 0 if the reference still exists and if not
> ==0 it has been revised with the
> creation of the given version. This would lead to no operation if the
> references are unchanged,
> O(1) if a reference is appended and only O(n) if a reference is
> removed (and the subsequent indices are
> updated).
That sounds like a good idea.
Would you agree that it's better to add a general, non-auditing mode to
the DBStore first?
>
>
> 4. Solution for OutOfMemoryErrors during huge commits
>
> This one is a quote of Eike and is here just as a reminder.
> If newObjects and dirtyObjects grow, because of a very large
> transaction, it might me necessary to "swap out" those objects to
> prevent OOME-s. (or transmitted to the server, or even temporarily
> stored into the store).
Nothing to add from my side :P
Cheers
/Eike
----
http://thegordian.blogspot.com
>
>
> So, discussion is opened :-)
>
> Cheers,
> Stefan
Cheers
/Eike
----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper
|
|
|
Re: [CDO] Some thoughts on enhancements [message #425315 is a reply to message #425311] |
Sat, 22 November 2008 12:40 |
Simon Mc Duff Messages: 596 Registered: July 2009 |
Senior Member |
|
|
Stefan Winkler wrote:
> Hi,
>
> during and around ESE a few ideas about CDO enhancements popped into my
> mind.
> I'm interested in feedback. How do you think of the following issues:
>
> 1. Serverless operation
>
> I'm a user of CDO's large-model-handling capabilities. I like the way,
> CDO materializes just those parts of a model in memory which are
> currently needed and makes it possible that the rest is GC'd. I don't
> use the multi-user/multi-client features. This is why I only have one
> session and use the JVM connector.
> However, I wondered, if it would be possible to eliminate the Net4j
> layer completely in a way that the session is directly wired to a store.
> This would decrease communication and memory overhead which is caused by
> the cascaded revision caches (client- and server-side). So what do you
> think?
> Do the existing interfaces support connecting the session directly to
> the internal-container-"server-side"-revision cache? If this would work,
> maybe CDO would be an excellent back-end for the E4 platform, which is
> more or less completely based on models if I understood that correctly
> at ESE.
>
>
I don't see how can it be done without extra works each time we create a
new signal. Did you measure the time spend in the JVM connector to see
the gain we will have ?
Signal are used at many differents place at the moment. We could maybe
used a strategy for all of them... It is feasible I believed... with
some refactoring... but I would like to measure the gain first!
> 2. Query abstraction
>
> I don't like two things in the current backend implementation:
> a) Mapping Strategies pass around SQL where-parts as strings and
> b) Queries are passed around as query-language specific strings.
>
> Both issues could be adressed by an abstracted query, which is an object
> that centrally represents a query. Maybe this could be handled in the
> way regular expressions are handled.
> - The user creates a query in a supported query language
> - The query abstraction layer compiles the query into the abstract
> representation
> - At the store backend, the query is translated to the store-native
> representation
>
> So instead of
>
> Client -----{OCL}-----> Server ------{OCL}-------> Store [parses and
> translates to e.g.SQL]----> DB/MEM/...
>
> we would have
>
> Client --{OCL}-> AbstractQuery(AQ) ---> Server ---{AQ}--> Store
> [translates to e.g.SQL]---->DB/MEM/...
>
> If this is implemented in an efficient way, also store-internal queries
> (like the above-mentioned where-parts) could be replaced by AQ
> representations.
>
In general, it is a good idea to have objects for query... but i didn't
want to create a query framework!! It can be very complicated so I
prefer to concentrate on a requirements/solutions ?
Can you formulate the requirements/problems for the end-users with the
current approach ? So we could address them. Maybe it will lead on what
you described.
>
> 3. More intelligent mapping of references in DBStore
>
> Currently, MultiReferences are stored in (souce_id, version, index,
> target_id) tuples.
> If in 1000 iterations an object O is added a reference and committed as in
> for(i in 0..999) { o.getRef().add(foo[i]); commit; }
> this would result in 1+2+3+4+5+6+...+999 rows in total. This does not
> scale well, as revisions with a large number of references do need much
> longer to be written -- even if the reference list has not been changed.
>
> There was the idea of converting the reference list to a string
> representation and store this like an attribute. However, depending on
> the database system used, this results in a similar overhead, plus the
> representation has to be created and parsed. (And BTW it violates the
> basic principle of atomic DB rows - or 1NF in short).
>
> My alternative idea would be to express the reference table as
> (source_id, version_created, version_revised, index, target_id).
> version_created would be the source version, for which the reference has
> been created.
> version_revised would be 0 if the reference still exists and if not ==0
> it has been revised with the
> creation of the given version. This would lead to no operation if the
> references are unchanged,
> O(1) if a reference is appended and only O(n) if a reference is removed
> (and the subsequent indices are
> updated).
Did you think to only keep the last version and the inverse deltas to
go back at a specific version ? It will eliminate most of the problems
you have.
>
>
> 4. Solution for OutOfMemoryErrors during huge commits
>
> This one is a quote of Eike and is here just as a reminder.
> If newObjects and dirtyObjects grow, because of a very large
> transaction, it might me necessary to "swap out" those objects to
> prevent OOME-s. (or transmitted to the server, or even temporarily
> stored into the store).
>
Yes, this is a common strategy used by databases when modified objects
too many objects. It is an excellent idea!
>
> So, discussion is opened :-)
>
> Cheers,
> Stefan
|
|
| |
Re: [CDO] Some thoughts on enhancements [message #425360 is a reply to message #425359] |
Sat, 22 November 2008 12:56 |
|
Tom,
I'd appreciate if you'd like to participate in our telecon. Ed and Simon
will also be on board ;-)
I'll keep you informed on the schedule...
Cheers
/Eike
----
http://thegordian.blogspot.com
Tom Schindl schrieb:
> Eike Stepper schrieb:
>
>> Stefan,
>>
>> Comments below...
>>
>>
>>
>> Stefan Winkler schrieb:
>>
>>> Hi,
>>>
>>> during and around ESE a few ideas about CDO enhancements popped into
>>> my mind.
>>> I'm interested in feedback. How do you think of the following issues:
>>>
>>> 1. Serverless operation
>>>
>>> I'm a user of CDO's large-model-handling capabilities. I like the way,
>>> CDO materializes just those parts of a model in memory which are
>>> currently needed and makes it possible that the rest is GC'd. I don't
>>> use the multi-user/multi-client features. This is why I only have one
>>> session and use the JVM connector.
>>> However, I wondered, if it would be possible to eliminate the Net4j
>>> layer completely in a way that the session is directly wired to a
>>> store. This would decrease communication and memory overhead which is
>>> caused by the cascaded revision caches (client- and server-side). So
>>> what do you think?
>>> Do the existing interfaces support connecting the session directly to
>>> the internal-container-"server-side"-revision cache?
>>>
>> This requires a deep refactoring of the CDO internals, but I also think
>> this is a good idea!
>>
>>
>>> If this would work, maybe CDO would be an excellent back-end for the
>>> E4 platform, which is more or less completely based on models if I
>>> understood that correctly at ESE.
>>>
>> Interesting idea and I already planned for a session with Boris ;-)
>>
>>
>
> As one of the guys part of the E4-Modeled Workbench Team I'm also
> interested in taking part in this discussion. I already had this idea
> some time ago but had no time to implement it (there've been more
> important working areas) because it fits in my idea of RCP-Applications
> and Collaborated RCP-Application development and deployment in the future.
>
> Think about how developing and deploying would look like if not only the
> workbench itself is model using Ecore but also the rest of the UI you
> are currently looking at (the vision of a Live-DOM backing up the
> declarative defined UI) :-)
>
> Tom
>
>
Cheers
/Eike
----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper
|
|
|
Goto Forum:
Current Time: Thu Mar 28 10:52:24 GMT 2024
Powered by FUDForum. Page generated in 0.02592 seconds
|