Eclipse Community Forums: EMF "Technology" (Ecore Tools, EMFatic, etc)

Help

Home

Home » Modeling » EMF "Technology" (Ecore Tools, EMFatic, etc) » teneo and volume data

Show: Today's Messages :: Show Polls :: Message Navigator

teneo and volume data [message #60972]

Sun, 12 November 2006 17:55

Andre Pareis

Messages: 113
Registered: July 2009

Senior Member

Hi,

I'm using teneo to persist models of quite some size, a million objects or more.
The idea is to use the EMF editor to make some small changes to the data and
then to invoke save on the editor in order to let the resource/hibernate do the
changes in the database.

However, I figured that when the save op starts, the model is read completely
from the DB. This will not be feasible for me in the future, when I have more
data in the system than test data.

I wonder what's happening when I run save? Is it the validator that does not
distinguish between fully loaded objects and proxies? Is it the CASCADE='all'
(which includes 'save')?

Can you give me some advice what I can/have to do in order to make teneo work
for larger models?

Thanks
Andre

Report message to a moderator

Re: teneo and volume data [message #61065 is a reply to message #60972]

Mon, 13 November 2006 08:50

Martin Taal

Messages: 5468
Registered: July 2009

Senior Member

Hi Andre,
It is the validator, at savetime the validateContents method in HibernateResource (or actually its
superclass StoreResource) will be called. This method calls StoreResource.validateObject(EObject
eObject) which uses the Diagnostician.INSTANCE.validate(eObject) method. This method again
eventually calls doValidateContents in the Diagnostician which does eObject.eContents() which will
iterate the content and recursively call validate again.
To prevent the load of all information it should have a non-loading validator. However not-loading
an elist can result in schema/model invalid instances (e.g. if the minoccurs>0).

This is a missing-feature, please enter a bugzilla then I will add a non-resolving diagnostican.

EReferences which have a minOccurence>0 will then always be loaded to prevent invalid objects. This
works most of the times as in my experience most maxOccurs=unbounded relations have minOccurs=0 (but
not always).

gr. Martin

Andre Pareis wrote:
> Hi,
>
> I'm using teneo to persist models of quite some size, a million objects
> or more. The idea is to use the EMF editor to make some small changes to
> the data and then to invoke save on the editor in order to let the
> resource/hibernate do the changes in the database.
>
> However, I figured that when the save op starts, the model is read
> completely from the DB. This will not be feasible for me in the future,
> when I have more data in the system than test data.
>
> I wonder what's happening when I run save? Is it the validator that does
> not distinguish between fully loaded objects and proxies? Is it the
> CASCADE='all' (which includes 'save')?
>
> Can you give me some advice what I can/have to do in order to make teneo
> work for larger models?
>
> Thanks
> Andre

--

With Regards, Martin Taal

Springsite/Elver.org
Office: Hardwareweg 4, 3821 BV Amersfoort
Postal: Nassaulaan 7, 3941 EC Doorn
The Netherlands
Tel: +31 (0)84 420 2397
Fax: +31 (0)84 225 9307
Mail: mtaal@springsite.com - mtaal@elver.org
Web: www.springsite.com - www.elver.org

Report message to a moderator

Re: teneo and volume data [message #61136 is a reply to message #61065]

Mon, 13 November 2006 22:15

Andre Pareis

Messages: 113
Registered: July 2009

Senior Member

Hi Martin,

thank you for the explanation, now at least I know what happens at save time and
can do something about it.

I am very aware of the possible negative outcome if an invalid model is stored
in a database schema, which is expecting valid data and enforcing this by SQL
contraints added to the DB schema.

However, I am friend of a somewhat different thinking: I prefer to give the OO
model the leadership over the database. The relational database is IMO a pure
durable storage for my model. Validation, constraint checking (for instance
using OCL) etc. happen against the model only. The DB is good in storing and
retrieving data, but that's it. Following this paradigm, I hardly use
minOccurs>0 as this would require to store "complete" models only. But, the
models I want to store should go through to the persistent store, no matter if
they are complete or not. The model is refined incrementally until it is
complete, but that can take time and many DB sessions. That's why I would prefer
a database applying no constraints at all, but I do know that most people's
thinking is exactly the other way around (the DB is in the lead) so I try my
best to stay on my little island by just not using minOccurs>0 :).

But, anyhow, a non-resolving validation concept would be pretty cool given the
amount of data I want to store. Or at least some pluggability or to be able to
switch validation completely off.

But I think I can accomplish that today already by overriding the
validateContents() method. I will give that one a try. If I do it this way,
there is no action required from your side and we can omit the bug request.
Except you want to provide the non-resolving validator for performance reasons,
then I can do it of course. (?) Just drop me another line and I will do it.

Thanks
Andre

Martin Taal wrote:
> Hi Andre,
> It is the validator, at savetime the validateContents method in
> HibernateResource (or actually its superclass StoreResource) will be
> called. This method calls StoreResource.validateObject(EObject eObject)
> which uses the Diagnostician.INSTANCE.validate(eObject) method. This
> method again eventually calls doValidateContents in the Diagnostician
> which does eObject.eContents() which will iterate the content and
> recursively call validate again.
> To prevent the load of all information it should have a non-loading
> validator. However not-loading an elist can result in schema/model
> invalid instances (e.g. if the minoccurs>0).
>
> This is a missing-feature, please enter a bugzilla then I will add a
> non-resolving diagnostican.
>
> EReferences which have a minOccurence>0 will then always be loaded to
> prevent invalid objects. This works most of the times as in my
> experience most maxOccurs=unbounded relations have minOccurs=0 (but not
> always).
>
> gr. Martin
>
> Andre Pareis wrote:
>> Hi,
>>
>> I'm using teneo to persist models of quite some size, a million
>> objects or more. The idea is to use the EMF editor to make some small
>> changes to the data and then to invoke save on the editor in order to
>> let the resource/hibernate do the changes in the database.
>>
>> However, I figured that when the save op starts, the model is read
>> completely from the DB. This will not be feasible for me in the
>> future, when I have more data in the system than test data.
>>
>> I wonder what's happening when I run save? Is it the validator that
>> does not distinguish between fully loaded objects and proxies? Is it
>> the CASCADE='all' (which includes 'save')?
>>
>> Can you give me some advice what I can/have to do in order to make
>> teneo work for larger models?
>>
>> Thanks
>> Andre
>
>

Report message to a moderator

Re: teneo and volume data [message #61184 is a reply to message #61136]

Mon, 13 November 2006 22:41

Martin Taal

Messages: 5468
Registered: July 2009

Senior Member

Hi Andre,
I have already solved this (with a non-loading validation) and checked it in cvs (there was also
another bug which caused unnecessary loads). If you do a get latest (on most plugins) you can try it
out.
By overriding the validateContents method you can prevent the whole validation from occuring which
saves some time when saving.

I think your island is not so small :-). The topic you touch is interesting (were to apply/maintain
constraints). Although db constraints are good from a acid point of view my main issue with db
constraints is that it is difficult to translate a sql exception to a meaningfull thing in the
application/ui layer. In addition there are business rules which are difficult to model in a db
schema. In the end this means that most constraints need to be checked anyway in the higher app layers.
And as you point out relational db schemas are restrictive for flexible models. From my side I am
very interested in combining structured data (like a product model) with more unstructured data (the
content of the product manual for example). Wereby a relational db is more suited for the first
while a xml-like db is more suited for the second.

gr. Martin

Andre Pareis wrote:
> Hi Martin,
>
> thank you for the explanation, now at least I know what happens at save
> time and can do something about it.
>
> I am very aware of the possible negative outcome if an invalid model is
> stored in a database schema, which is expecting valid data and enforcing
> this by SQL contraints added to the DB schema.
>
> However, I am friend of a somewhat different thinking: I prefer to give
> the OO model the leadership over the database. The relational database
> is IMO a pure durable storage for my model. Validation, constraint
> checking (for instance using OCL) etc. happen against the model only.
> The DB is good in storing and retrieving data, but that's it. Following
> this paradigm, I hardly use minOccurs>0 as this would require to store
> "complete" models only. But, the models I want to store should go
> through to the persistent store, no matter if they are complete or not.
> The model is refined incrementally until it is complete, but that can
> take time and many DB sessions. That's why I would prefer a database
> applying no constraints at all, but I do know that most people's
> thinking is exactly the other way around (the DB is in the lead) so I
> try my best to stay on my little island by just not using minOccurs>0 :).
>
> But, anyhow, a non-resolving validation concept would be pretty cool
> given the amount of data I want to store. Or at least some pluggability
> or to be able to switch validation completely off.
>
> But I think I can accomplish that today already by overriding the
> validateContents() method. I will give that one a try. If I do it this
> way, there is no action required from your side and we can omit the bug
> request. Except you want to provide the non-resolving validator for
> performance reasons, then I can do it of course. (?) Just drop me
> another line and I will do it.
>
> Thanks
> Andre
>
>
> Martin Taal wrote:
>> Hi Andre,
>> It is the validator, at savetime the validateContents method in
>> HibernateResource (or actually its superclass StoreResource) will be
>> called. This method calls StoreResource.validateObject(EObject
>> eObject) which uses the Diagnostician.INSTANCE.validate(eObject)
>> method. This method again eventually calls doValidateContents in the
>> Diagnostician which does eObject.eContents() which will iterate the
>> content and recursively call validate again.
>> To prevent the load of all information it should have a non-loading
>> validator. However not-loading an elist can result in schema/model
>> invalid instances (e.g. if the minoccurs>0).
>>
>> This is a missing-feature, please enter a bugzilla then I will add a
>> non-resolving diagnostican.
>>
>> EReferences which have a minOccurence>0 will then always be loaded to
>> prevent invalid objects. This works most of the times as in my
>> experience most maxOccurs=unbounded relations have minOccurs=0 (but
>> not always).
>>
>> gr. Martin
>>
>> Andre Pareis wrote:
>>> Hi,
>>>
>>> I'm using teneo to persist models of quite some size, a million
>>> objects or more. The idea is to use the EMF editor to make some small
>>> changes to the data and then to invoke save on the editor in order to
>>> let the resource/hibernate do the changes in the database.
>>>
>>> However, I figured that when the save op starts, the model is read
>>> completely from the DB. This will not be feasible for me in the
>>> future, when I have more data in the system than test data.
>>>
>>> I wonder what's happening when I run save? Is it the validator that
>>> does not distinguish between fully loaded objects and proxies? Is it
>>> the CASCADE='all' (which includes 'save')?
>>>
>>> Can you give me some advice what I can/have to do in order to make
>>> teneo work for larger models?
>>>
>>> Thanks
>>> Andre
>>
>>

--

With Regards, Martin Taal

Springsite/Elver.org
Office: Hardwareweg 4, 3821 BV Amersfoort
Postal: Nassaulaan 7, 3941 EC Doorn
The Netherlands
Tel: +31 (0)84 420 2397
Fax: +31 (0)84 225 9307
Mail: mtaal@springsite.com - mtaal@elver.org
Web: www.springsite.com - www.elver.org

Report message to a moderator