Home » Modeling » EMF » [CDO] Random access of long, flat lists of large data
[CDO] Random access of long, flat lists of large data [message #1277136] |
Tue, 25 March 2014 14:54 |
Matt Mursko Messages: 5 Registered: October 2012 |
Junior Member |
|
|
Hi all,
Our application necessitates us storing a long, flat list of large CLOB data items and retrieving the data items in a random access fashion. The size of these CLOBs is typically multiple MB. To alleviate as much client overhead as possible when dealing with long, flat lists, we decided to store each CLOB in it's own dedicated CDOResource, all of which will be contained in a parent CDOResourceFolder. These CLOBs each are uniquely identifiable to the client, so we've built our resource URIs as follows:
String path = "uri://repo/folder/${UUID}";
Our theory was that in order to read or write a CLOB, a client can simply use the well known URI to get the resource and operate on it:
CDOView.hasResource(path) and CDOView.getResource(path).
For small amounts of data items, this works well. One of our larger data sets contains approximately 4500 CLOB data item resources. For data sets of this large magnitude, we began to notice a lot of processing that we were hoping to avoid by using a dedicated resource.
On the client side, we see the expected tracing:
AbstractCDOView.hasResource(String)
AbstractCDOView.getResourceNodeID(String) -- this method effectively splits the URI path into segments and begins to load the parent CDOResourceFolder.
...
AbstractCDOView.getLocalRevision(CDOID)
CDOViewImpl.getRevision(CDOID, true)
The server then starts to retrieve the revision of the CDOResourceFolder. The features of the folder are loaded, which in our case is the list of ~4500 resources, exactly the list processing overhead we were attempting to avoid. The initial load of this CDOResourceFolder is especially expensive in terms of time. Once the data is cached, subsequent accesses are much faster. This behavior is somewhat obvious to us now, given that a CDOResource and CDOResourceFolder are CDOObjects themselves.
Do you know of anything we could do to make our approach more performant, other than priming the cache? Is there a way to truly randomly access a given CDOResource, without having to iterate over the list of siblings? Do you have any recommendations for best practices for random access of long, flat lists of large data objects outside of the implementation we've chosen?
|
|
|
Re: [CDO] Random access of long, flat lists of large data [message #1277219 is a reply to message #1277136] |
Tue, 25 March 2014 17:27 |
Erdal Karaca Messages: 854 Registered: July 2009 |
Senior Member |
|
|
Sounds like you have rewritten what CDO already supports...
Did you have a look at the test cases?
For example:
org.eclipse.emf.cdo.tests.ResourceTest.testTextResource()
Matt Mursko wrote on Tue, 25 March 2014 15:54Hi all,
Our application necessitates us storing a long, flat list of large CLOB data items and retrieving the data items in a random access fashion. The size of these CLOBs is typically multiple MB. To alleviate as much client overhead as possible when dealing with long, flat lists, we decided to store each CLOB in it's own dedicated CDOResource, all of which will be contained in a parent CDOResourceFolder. These CLOBs each are uniquely identifiable to the client, so we've built our resource URIs as follows:
String path = "uri://repo/folder/${UUID}";
Our theory was that in order to read or write a CLOB, a client can simply use the well known URI to get the resource and operate on it:
CDOView.hasResource(path) and CDOView.getResource(path).
For small amounts of data items, this works well. One of our larger data sets contains approximately 4500 CLOB data item resources. For data sets of this large magnitude, we began to notice a lot of processing that we were hoping to avoid by using a dedicated resource.
On the client side, we see the expected tracing:
AbstractCDOView.hasResource(String)
AbstractCDOView.getResourceNodeID(String) -- this method effectively splits the URI path into segments and begins to load the parent CDOResourceFolder.
...
AbstractCDOView.getLocalRevision(CDOID)
CDOViewImpl.getRevision(CDOID, true)
The server then starts to retrieve the revision of the CDOResourceFolder. The features of the folder are loaded, which in our case is the list of ~4500 resources, exactly the list processing overhead we were attempting to avoid. The initial load of this CDOResourceFolder is especially expensive in terms of time. Once the data is cached, subsequent accesses are much faster. This behavior is somewhat obvious to us now, given that a CDOResource and CDOResourceFolder are CDOObjects themselves.
Do you know of anything we could do to make our approach more performant, other than priming the cache? Is there a way to truly randomly access a given CDOResource, without having to iterate over the list of siblings? Do you have any recommendations for best practices for random access of long, flat lists of large data objects outside of the implementation we've chosen?
|
|
|
Re: [CDO] Random access of long, flat lists of large data [message #1277714 is a reply to message #1277136] |
Wed, 26 March 2014 11:09 |
Matt Mursko Messages: 5 Registered: October 2012 |
Junior Member |
|
|
Hi Erdal,
Thank you for the reply; I did not know about CDOTextResource. Unfortunately, I do not think this will help the behavior we are seeing with a large, flat list of CDOResources inside a CDOResourceFolder. It appears that AbstractCDOView.getTextResource() follows the same call stack as getResource(), and the retrieval of a single CDOResource from the CDOResourceFolder is causing our problems. We are seeing the entire list of CDOResources loaded by the client:
The segment of the code that does this is here:
protected synchronized CDOID getResourceNodeID(CDOID folderID, String name)
...
InternalCDORevision folderRevision = getLocalRevision(folderID);
EClass resourceFolderClass = EresourcePackage.eINSTANCE.getCDOResourceFolder();
if (folderRevision.getEClass() != resourceFolderClass)
{
throw new CDOException(MessageFormat.format(Messages.getString("CDOViewImpl.4"), folderID)); //$NON-NLS-1$
}
EReference nodesFeature = EresourcePackage.eINSTANCE.getCDOResourceFolder_Nodes();
EAttribute nameFeature = EresourcePackage.eINSTANCE.getCDOResourceNode_Name();
CDOList list;
boolean bypassPermissionChecks = folderRevision.bypassPermissionChecks(true);
try
{
list = folderRevision.getList(nodesFeature);
}
finally
{
folderRevision.bypassPermissionChecks(bypassPermissionChecks);
}
CDOStore store = getStore();
int size = list.size();
for (int i = 0; i < size; i++)
{
Object value = list.get(i);
value = store.resolveProxy(folderRevision, nodesFeature, i, value);
CDOID childID = (CDOID)convertObjectToID(value);
InternalCDORevision childRevision = getLocalRevision(childID);
String childName = (String)childRevision.get(nameFeature, 0);
if (name.equals(childName))
{
return childID;
}
}
...
As you can see, the client CDOView will iterate over each CDOResource child from the store, looking for a name match. This becomes problematic in our use case of ~4,500 CDOResources within the CDOResourceFolder.
We are looking for a mechanism to retrieve a known CDOResource from a CDOResourceFolder directly from the server, without loading all of the child CDOResources in the client and then iterating through them.
Thanks,
Matt
|
|
|
Re: [CDO] Random access of long, flat lists of large data [message #1277733 is a reply to message #1277714] |
Wed, 26 March 2014 11:38 |
|
Am 26.03.2014 12:09, schrieb Matt Mursko:
> Hi Erdal,
>
> Thank you for the reply; I did not know about CDOTextResource. Unfortunately, I do not think this will help the
> behavior we are seeing with a large, flat list of CDOResources inside a CDOResourceFolder. It appears that
> AbstractCDOView.getTextResource() follows the same call stack as getResource(), and the retrieval of a single
> CDOResource from the CDOResourceFolder is causing our problems. We are seeing the entire list of CDOResources loaded
> by the client:
Yes, a CDOTextResource is almost identical to a normal model resource that contains an EObject with a CDOClob attribute.
You just spare one object but lose the ability to add other structural features.
What you're probably after is CDO's partial collection loading:
CDOSession session = openSession();
session.options().setCollectionLoadingPolicy(CDOUtil.createCollectionLoadingPolicy(10, 10));
This is one of CDO's advanced options and it can happen that it doesn't properly work with other advanced functions like
the new undo detection in 4.3.
If it's possible it's preferrable to avoid these large flat lists. You may want to have a look at
CDOBalancedTree.addObject(), which helps to conveniently build deeper structures of objects. Please also note that you
can use CDOView.createQuery() to find entries into this tree without navigating the tree and thereby loading container
objects with large lists.
Cheers
/Eike
----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper
>
> The segment of the code that does this is here:
>
> protected synchronized CDOID getResourceNodeID(CDOID folderID, String name)
>
> ..
>
> InternalCDORevision folderRevision = getLocalRevision(folderID);
> EClass resourceFolderClass = EresourcePackage.eINSTANCE.getCDOResourceFolder();
> if (folderRevision.getEClass() != resourceFolderClass)
> {
> throw new CDOException(MessageFormat.format(Messages.getString("CDOViewImpl.4"), folderID)); //$NON-NLS-1$
> }
>
> EReference nodesFeature = EresourcePackage.eINSTANCE.getCDOResourceFolder_Nodes();
> EAttribute nameFeature = EresourcePackage.eINSTANCE.getCDOResourceNode_Name();
>
> CDOList list;
> boolean bypassPermissionChecks = folderRevision.bypassPermissionChecks(true);
>
> try
> {
> list = folderRevision.getList(nodesFeature);
> }
> finally
> {
> folderRevision.bypassPermissionChecks(bypassPermissionChecks);
> }
>
> CDOStore store = getStore();
> int size = list.size();
> for (int i = 0; i < size; i++)
> {
> Object value = list.get(i);
> value = store.resolveProxy(folderRevision, nodesFeature, i, value);
>
> CDOID childID = (CDOID)convertObjectToID(value);
> InternalCDORevision childRevision = getLocalRevision(childID);
> String childName = (String)childRevision.get(nameFeature, 0);
> if (name.equals(childName))
> {
> return childID;
> }
> }
>
> ..
>
>
> As you can see, the client CDOView will iterate over each CDOResource child from the store, looking for a name match.
> This becomes problematic in our use case of ~4,500 CDOResources within the CDOResourceFolder.
>
> We are looking for a mechanism to retrieve a known CDOResource from a CDOResourceFolder directly from the server,
> without loading all of the child CDOResources in the client and then iterating through them.
>
> Thanks,
> Matt
>
Cheers
/Eike
----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper
|
|
|
Goto Forum:
Current Time: Fri Mar 29 10:55:38 GMT 2024
Powered by FUDForum. Page generated in 0.02849 seconds
|