Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » EMF » [CDO] Random access of long, flat lists of large data
[CDO] Random access of long, flat lists of large data [message #1277136] Tue, 25 March 2014 14:54 Go to next message
Matt Mursko is currently offline Matt MurskoFriend
Messages: 5
Registered: October 2012
Junior Member
Hi all,

Our application necessitates us storing a long, flat list of large CLOB data items and retrieving the data items in a random access fashion. The size of these CLOBs is typically multiple MB. To alleviate as much client overhead as possible when dealing with long, flat lists, we decided to store each CLOB in it's own dedicated CDOResource, all of which will be contained in a parent CDOResourceFolder. These CLOBs each are uniquely identifiable to the client, so we've built our resource URIs as follows:
String path = "uri://repo/folder/${UUID}";

Our theory was that in order to read or write a CLOB, a client can simply use the well known URI to get the resource and operate on it:
CDOView.hasResource(path) and CDOView.getResource(path).

For small amounts of data items, this works well. One of our larger data sets contains approximately 4500 CLOB data item resources. For data sets of this large magnitude, we began to notice a lot of processing that we were hoping to avoid by using a dedicated resource.

On the client side, we see the expected tracing:
AbstractCDOView.hasResource(String)
AbstractCDOView.getResourceNodeID(String) -- this method effectively splits the URI path into segments and begins to load the parent CDOResourceFolder.
...
AbstractCDOView.getLocalRevision(CDOID)
CDOViewImpl.getRevision(CDOID, true)

The server then starts to retrieve the revision of the CDOResourceFolder. The features of the folder are loaded, which in our case is the list of ~4500 resources, exactly the list processing overhead we were attempting to avoid. The initial load of this CDOResourceFolder is especially expensive in terms of time. Once the data is cached, subsequent accesses are much faster. This behavior is somewhat obvious to us now, given that a CDOResource and CDOResourceFolder are CDOObjects themselves.

Do you know of anything we could do to make our approach more performant, other than priming the cache? Is there a way to truly randomly access a given CDOResource, without having to iterate over the list of siblings? Do you have any recommendations for best practices for random access of long, flat lists of large data objects outside of the implementation we've chosen?
Re: [CDO] Random access of long, flat lists of large data [message #1277219 is a reply to message #1277136] Tue, 25 March 2014 17:27 Go to previous messageGo to next message
Erdal Karaca is currently offline Erdal KaracaFriend
Messages: 854
Registered: July 2009
Senior Member
Sounds like you have rewritten what CDO already supports...
Did you have a look at the test cases?
For example:
org.eclipse.emf.cdo.tests.ResourceTest.testTextResource()

Matt Mursko wrote on Tue, 25 March 2014 15:54
Hi all,

Our application necessitates us storing a long, flat list of large CLOB data items and retrieving the data items in a random access fashion. The size of these CLOBs is typically multiple MB. To alleviate as much client overhead as possible when dealing with long, flat lists, we decided to store each CLOB in it's own dedicated CDOResource, all of which will be contained in a parent CDOResourceFolder. These CLOBs each are uniquely identifiable to the client, so we've built our resource URIs as follows:
String path = "uri://repo/folder/${UUID}";

Our theory was that in order to read or write a CLOB, a client can simply use the well known URI to get the resource and operate on it:
CDOView.hasResource(path) and CDOView.getResource(path).

For small amounts of data items, this works well. One of our larger data sets contains approximately 4500 CLOB data item resources. For data sets of this large magnitude, we began to notice a lot of processing that we were hoping to avoid by using a dedicated resource.

On the client side, we see the expected tracing:
AbstractCDOView.hasResource(String)
AbstractCDOView.getResourceNodeID(String) -- this method effectively splits the URI path into segments and begins to load the parent CDOResourceFolder.
...
AbstractCDOView.getLocalRevision(CDOID)
CDOViewImpl.getRevision(CDOID, true)

The server then starts to retrieve the revision of the CDOResourceFolder. The features of the folder are loaded, which in our case is the list of ~4500 resources, exactly the list processing overhead we were attempting to avoid. The initial load of this CDOResourceFolder is especially expensive in terms of time. Once the data is cached, subsequent accesses are much faster. This behavior is somewhat obvious to us now, given that a CDOResource and CDOResourceFolder are CDOObjects themselves.

Do you know of anything we could do to make our approach more performant, other than priming the cache? Is there a way to truly randomly access a given CDOResource, without having to iterate over the list of siblings? Do you have any recommendations for best practices for random access of long, flat lists of large data objects outside of the implementation we've chosen?

Re: [CDO] Random access of long, flat lists of large data [message #1277714 is a reply to message #1277136] Wed, 26 March 2014 11:09 Go to previous messageGo to next message
Matt Mursko is currently offline Matt MurskoFriend
Messages: 5
Registered: October 2012
Junior Member
Hi Erdal,

Thank you for the reply; I did not know about CDOTextResource. Unfortunately, I do not think this will help the behavior we are seeing with a large, flat list of CDOResources inside a CDOResourceFolder. It appears that AbstractCDOView.getTextResource() follows the same call stack as getResource(), and the retrieval of a single CDOResource from the CDOResourceFolder is causing our problems. We are seeing the entire list of CDOResources loaded by the client:

The segment of the code that does this is here:
protected synchronized CDOID getResourceNodeID(CDOID folderID, String name)

...

    InternalCDORevision folderRevision = getLocalRevision(folderID);
    EClass resourceFolderClass = EresourcePackage.eINSTANCE.getCDOResourceFolder();
    if (folderRevision.getEClass() != resourceFolderClass)
    {
      throw new CDOException(MessageFormat.format(Messages.getString("CDOViewImpl.4"), folderID)); //$NON-NLS-1$
    }

    EReference nodesFeature = EresourcePackage.eINSTANCE.getCDOResourceFolder_Nodes();
    EAttribute nameFeature = EresourcePackage.eINSTANCE.getCDOResourceNode_Name();

    CDOList list;
    boolean bypassPermissionChecks = folderRevision.bypassPermissionChecks(true);

    try
    {
      list = folderRevision.getList(nodesFeature);
    }
    finally
    {
      folderRevision.bypassPermissionChecks(bypassPermissionChecks);
    }

    CDOStore store = getStore();
    int size = list.size();
    for (int i = 0; i < size; i++)
    {
      Object value = list.get(i);
      value = store.resolveProxy(folderRevision, nodesFeature, i, value);

      CDOID childID = (CDOID)convertObjectToID(value);
      InternalCDORevision childRevision = getLocalRevision(childID);
      String childName = (String)childRevision.get(nameFeature, 0);
      if (name.equals(childName))
      {
        return childID;
      }
    }

...


As you can see, the client CDOView will iterate over each CDOResource child from the store, looking for a name match. This becomes problematic in our use case of ~4,500 CDOResources within the CDOResourceFolder.

We are looking for a mechanism to retrieve a known CDOResource from a CDOResourceFolder directly from the server, without loading all of the child CDOResources in the client and then iterating through them.

Thanks,
Matt
Re: [CDO] Random access of long, flat lists of large data [message #1277733 is a reply to message #1277714] Wed, 26 March 2014 11:38 Go to previous message
Eike Stepper is currently offline Eike StepperFriend
Messages: 6682
Registered: July 2009
Senior Member
Am 26.03.2014 12:09, schrieb Matt Mursko:
> Hi Erdal,
>
> Thank you for the reply; I did not know about CDOTextResource. Unfortunately, I do not think this will help the
> behavior we are seeing with a large, flat list of CDOResources inside a CDOResourceFolder. It appears that
> AbstractCDOView.getTextResource() follows the same call stack as getResource(), and the retrieval of a single
> CDOResource from the CDOResourceFolder is causing our problems. We are seeing the entire list of CDOResources loaded
> by the client:
Yes, a CDOTextResource is almost identical to a normal model resource that contains an EObject with a CDOClob attribute.
You just spare one object but lose the ability to add other structural features.

What you're probably after is CDO's partial collection loading:

CDOSession session = openSession();
session.options().setCollectionLoadingPolicy(CDOUtil.createCollectionLoadingPolicy(10, 10));

This is one of CDO's advanced options and it can happen that it doesn't properly work with other advanced functions like
the new undo detection in 4.3.

If it's possible it's preferrable to avoid these large flat lists. You may want to have a look at
CDOBalancedTree.addObject(), which helps to conveniently build deeper structures of objects. Please also note that you
can use CDOView.createQuery() to find entries into this tree without navigating the tree and thereby loading container
objects with large lists.

Cheers
/Eike

----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper



>
> The segment of the code that does this is here:
>
> protected synchronized CDOID getResourceNodeID(CDOID folderID, String name)
>
> ..
>
> InternalCDORevision folderRevision = getLocalRevision(folderID);
> EClass resourceFolderClass = EresourcePackage.eINSTANCE.getCDOResourceFolder();
> if (folderRevision.getEClass() != resourceFolderClass)
> {
> throw new CDOException(MessageFormat.format(Messages.getString("CDOViewImpl.4"), folderID)); //$NON-NLS-1$
> }
>
> EReference nodesFeature = EresourcePackage.eINSTANCE.getCDOResourceFolder_Nodes();
> EAttribute nameFeature = EresourcePackage.eINSTANCE.getCDOResourceNode_Name();
>
> CDOList list;
> boolean bypassPermissionChecks = folderRevision.bypassPermissionChecks(true);
>
> try
> {
> list = folderRevision.getList(nodesFeature);
> }
> finally
> {
> folderRevision.bypassPermissionChecks(bypassPermissionChecks);
> }
>
> CDOStore store = getStore();
> int size = list.size();
> for (int i = 0; i < size; i++)
> {
> Object value = list.get(i);
> value = store.resolveProxy(folderRevision, nodesFeature, i, value);
>
> CDOID childID = (CDOID)convertObjectToID(value);
> InternalCDORevision childRevision = getLocalRevision(childID);
> String childName = (String)childRevision.get(nameFeature, 0);
> if (name.equals(childName))
> {
> return childID;
> }
> }
>
> ..
>
>
> As you can see, the client CDOView will iterate over each CDOResource child from the store, looking for a name match.
> This becomes problematic in our use case of ~4,500 CDOResources within the CDOResourceFolder.
>
> We are looking for a mechanism to retrieve a known CDOResource from a CDOResourceFolder directly from the server,
> without loading all of the child CDOResources in the client and then iterating through them.
>
> Thanks,
> Matt
>


Previous Topic:how to get the model element from a FeatureMapEntryWrapperItemProvider ?
Next Topic:[CDO] update
Goto Forum:
  


Current Time: Fri Mar 29 10:55:38 GMT 2024

Powered by FUDForum. Page generated in 0.02849 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top