Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » EMF "Technology" (Ecore Tools, EMFatic, etc)  » Dynamic model load performance(Loading XML using the schemaLocation for XSD)
Dynamic model load performance [message #765899] Wed, 14 December 2011 21:25 Go to next message
Don Laidlaw is currently offline Don LaidlawFriend
Messages: 8
Registered: July 2009
Junior Member
I am trying to load an XML resource while depending on the feature to automatically create a dynamic model for the XML by looking at the schemaLocation attribute on the root node. The performance of this is quite slow. I do expect the first load of a new schema to be slow as the xsd needs to be loaded and converted into a dynamic model. However, subsequent loads with the same schema are also quite slow. For example, and XML document of about 300 lines takes about 8 seconds to load, without using any CPU resources in any significant way. In fact, the CPU usage is barely detectable at all. The CPU is a new i7 in a macbook pro.

The following code describes the process being used:
public class ShredService {

	public ShredService() {
		Resource.Factory.Registry.INSTANCE.getExtensionToFactoryMap().
			put("xml",  new GenericXMLResourceFactoryImpl());
		Resource.Factory.Registry.INSTANCE.getExtensionToFactoryMap().
			put("xsd",  new XSDResourceFactoryImpl());
	}
	
	
	public EObject load(InputStream inputStream) {
		ResourceSet resourceSet = new ResourceSetImpl();
		Resource resource = resourceSet.createResource(
			org.eclipse.emf.common.util.URI.createURI("*.xml"));
		
		resource.load(inputStream, null);
		
		EObject document = resource.getContents().get(0);
		EObject rootNode = document.eContents().get(0);
		
		saveDynamicPackages(resourceSet);
		
		return rootNode;
	}
	
	public void saveDynamicPackages(ResourceSet resourceSet) {
                for (String key : resourceSet.getPackageRegistry().keySet()) {
                      EPackage.Registry.INSTANCE.put(key, resourceSet.getPackageRegistry().get(key));
                }
	}
}


I suspect that when loading the XML a second time, there is still some network activity happening to get the xsd from the location specified in the schemaLocation. If I am right on that, is there a way to stop this check if the namespace is already available in the global package registry?

If I am wrong in my guess, any other ideas??

Thanks,
Don Laidlaw
Infor
Re: Dynamic model load performance [message #766094 is a reply to message #765899] Thu, 15 December 2011 08:19 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33137
Registered: July 2009
Senior Member
Don,

Comments below.

On 14/12/2011 10:25 PM, Don Laidlaw wrote:
> I am trying to load an XML resource while depending on the feature to
> automatically create a dynamic model for the XML by looking at the
> schemaLocation attribute on the root node. The performance of this is
> quite slow.
The schema could be a lot larger than the instance...
> I do expect the first load of a new schema to be slow as the xsd needs
> to be loaded and converted into a dynamic model. However, subsequent
> loads with the same schema are also quite slow.
It seems likely that the schema will be loaded each time unless you've
taken steps to reuse it. Have you used a performance analyzer tool?
Does XMLHandler.getPackage do a lot of work each time?
> For example, and XML document of about 300 lines takes about 8 seconds
> to load, without using any CPU resources in any significant way. In
> fact, the CPU usage is barely detectable at all. The CPU is a new i7
> in a macbook pro.
That makes it sound I/O bound doesn't it?
>
> The following code describes the process being used:
> public class ShredService {
>
> public ShredService() {
> Resource.Factory.Registry.INSTANCE.getExtensionToFactoryMap().
> put("xml", new GenericXMLResourceFactoryImpl());
> Resource.Factory.Registry.INSTANCE.getExtensionToFactoryMap().
> put("xsd", new XSDResourceFactoryImpl());
> }
>
>
> public EObject load(InputStream inputStream) {
> ResourceSet resourceSet = new ResourceSetImpl();
> Resource resource = resourceSet.createResource(
> org.eclipse.emf.common.util.URI.createURI("*.xml"));
>
> resource.load(inputStream, null);
>
> EObject document = resource.getContents().get(0);
> EObject rootNode = document.eContents().get(0);
>
> saveDynamicPackages(resourceSet);
>
> return rootNode;
> }
>
> public void saveDynamicPackages(ResourceSet resourceSet) {
> for (String key :
> resourceSet.getPackageRegistry().keySet()) {
> EPackage.Registry.INSTANCE.put(key,
> resourceSet.getPackageRegistry().get(key));
> }
> }
> }
>
> I suspect that when loading the XML a second time, there is still some
> network activity happening to get the xsd from the location specified
> in the schemaLocation. If I am right on that, is there a way to stop
> this check if the namespace is already available in the global package
> registry?
Set a breakpoint in XMLHandler.getPackage and see why it would load the
schema multiple times. If it finds the package in the registry, it
shouldn't try to load it a second time.
>
> If I am wrong in my guess, any other ideas??
>
> Thanks,
> Don Laidlaw
> Infor


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Dynamic model load performance [message #766310 is a reply to message #766094] Thu, 15 December 2011 14:37 Go to previous messageGo to next message
Don Laidlaw is currently offline Don LaidlawFriend
Messages: 8
Registered: July 2009
Junior Member
Thanks Ed,

Yes, the schema is much, much larger than the XML document. The schema is the OAGIS schema.

However, there is no XMLHandler.getPackage method. I think you meant the XMLHandler.getPackageForURI(String) method. On the second document using the same schema the majority of the time spent is in the XML parsing before the first call to XMLHandler.getPackageForURI(String). The first call to that method is near the end of the time spent on the total resource.load(null) call. I will take a look at the XMLHandler class to see if anything else in there could be taking some time.

-Don
Re: Dynamic model load performance [message #766343 is a reply to message #766310] Thu, 15 December 2011 15:27 Go to previous messageGo to next message
Don Laidlaw is currently offline Don LaidlawFriend
Messages: 8
Registered: July 2009
Junior Member
I have narrowed this down. It turns out that the XMLHandler will always reload the schema into a dynamic model for every document. The call stack look like the following for the root element of the XML document when handling the second document with the same schema.
DefaultEcoreBuilder.generate(Map<String,URI>) line: 96	
SAXXMLHandler(XMLHandler).processSchemaLocations(String, String) line: 1707	
SAXXMLHandler(XMLHandler).handleTopLocations(String, String) line: 1748	
SAXXMLHandler(XMLHandler).createObjectByType(String, String, boolean) line: 1296	
SAXXMLHandler(XMLHandler).createTopObject(String, String) line: 1468	
SAXXMLHandler(XMLHandler).processElement(String, String, String) line: 1019	
SAXXMLHandler(XMLHandler).startElement(String, String, String) line: 1001	
SAXXMLHandler(XMLHandler).startElement(String, String, String, Attributes) line: 712	
SAXParserImpl$JAXPSAXParser(AbstractSAXParser).startElement(QName, XMLAttributes, Augmentations) line: 501	
XMLDTDValidator.startElement(QName, XMLAttributes, Augmentations) line: 767	


The XMLHandler.processSchemaLocations method has no test to see if the namespace is already loaded before attempting to load it again. The DefaultEcoreBuilder.generate method does the actual loading of the schema into a dynamic model.

It would be a simple matter to not call the ecoreBuilder.generate method if all the namespaces were already available. So would this be considered a bug or an enhancement request?

-Don
Re: Dynamic model load performance [message #766375 is a reply to message #766343] Thu, 15 December 2011 16:41 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33137
Registered: July 2009
Senior Member
Don,

There certainly ought to be a nice way to be able to reuse the same
model multiple time. Whether it's a bug or an enhancement is
debatable. Certainly if you use XMLOptions to reuse an Ecore builder,
you'd hope and likely expect that to act like a cache for reuse. Have
you tried that. (It's not clear at glance that would work; it looks
like a new resource set would be created that loads the schema each
time.) It's not clear that just ignoring the schema location and using
a registered model is the right idea, because you might well want to
dynamically load a more up-to-date version of the model/schema...



On 15/12/2011 4:28 PM, Don Laidlaw wrote:
> I have narrowed this down. It turns out that the XMLHandler will
> always reload the schema into a dynamic model for every document. The
> call stack look like the following for the root element of the XML
> document when handling the second document with the same schema.
>
> DefaultEcoreBuilder.generate(Map<String,URI>) line: 96
> SAXXMLHandler(XMLHandler).processSchemaLocations(String, String) line:
> 1707
> SAXXMLHandler(XMLHandler).handleTopLocations(String, String) line: 1748
> SAXXMLHandler(XMLHandler).createObjectByType(String, String, boolean)
> line: 1296
> SAXXMLHandler(XMLHandler).createTopObject(String, String) line: 1468
> SAXXMLHandler(XMLHandler).processElement(String, String, String) line:
> 1019
> SAXXMLHandler(XMLHandler).startElement(String, String, String) line: 1001
> SAXXMLHandler(XMLHandler).startElement(String, String, String,
> Attributes) line: 712
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).startElement(QName,
> XMLAttributes, Augmentations) line: 501
> XMLDTDValidator.startElement(QName, XMLAttributes, Augmentations)
> line: 767
>
>
> The XMLHandler.processSchemaLocations method has no test to see if the
> namespace is already loaded before attempting to load it again. The
> DefaultEcoreBuilder.generate method does the actual loading of the
> schema into a dynamic model.
>
> It would be a simple matter to not call the ecoreBuilder.generate
> method if all the namespaces were already available. So would this be
> considered a bug or an enhancement request?
>
> -Don


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Dynamic model load performance [message #766445 is a reply to message #766375] Thu, 15 December 2011 20:05 Go to previous messageGo to next message
Don Laidlaw is currently offline Don LaidlawFriend
Messages: 8
Registered: July 2009
Junior Member
Reusing the ecoreBuilder did not help at all. But I did find a solution to my problem. I instead extended the DefaultEcoreBuilder class as follows:
/**
 * An EcoreBuilder that checks to see if the package namespaces
 * are already loaded into the global package registry. If they are
 * then the URI for that namespace is not loaded. The effect is to cache
 * the namespaces in the global registry.
 * 
 * @author donlaidlaw
 * 
 */
public class SchemaCachingEcoreBuilder extends DefaultEcoreBuilder {

	public SchemaCachingEcoreBuilder(ExtendedMetaData extendedMetaData) {
		super(extendedMetaData);
	}

	@Override
	public Collection<? extends Resource> generate(
			Map<String, URI> targetNamespaceToURI) throws Exception {
		if (targetNamespaceToURI != null
				&& XSD_ECORE_BUILDER_CONSTRUCTOR != null
				&& XSD_ECORE_BUILDER_GENERATE_RESOURCES_METHOD != null) {
			Map<String, URI> notCachedMap = new HashMap<String, URI>();
			for (Map.Entry<String, URI> entry : targetNamespaceToURI.entrySet()) {
				if (EPackage.Registry.INSTANCE.getEPackage(entry.getKey()) == null) {
					notCachedMap.put(entry.getKey(), entry.getValue());
				}
			}
			if (!notCachedMap.isEmpty())
				return doOriginalGenerate(notCachedMap);
		}
		return Collections.emptyList();
	}
	
	public Collection<? extends Resource> doOriginalGenerate(Map<String, URI> targetNamespaceToURI) throws Exception {
		Object ecoreBuilder = XSD_ECORE_BUILDER_CONSTRUCTOR
				.newInstance(new Object[] { extendedMetaData,
						XSD_ECORE_BUILDER_OPTIONS });
		@SuppressWarnings("unchecked")
		Collection<? extends Resource> result = (Collection<? extends Resource>) XSD_ECORE_BUILDER_GENERATE_RESOURCES_METHOD
				.invoke(ecoreBuilder,
						new Object[] { targetNamespaceToURI.values() });
		return result;
	}
}


I simply added the new ecore builder above to the XMLOptions so that that builder is used instead of the default.

The only problem with this one is that the previously loaded schemas must be put into the global package registry. The DefaultEcoreBuilder does not include a reference to the resource or resourceSet so we cannot check the package registry in the resourceSet.

If the same code was put in the XMLHandler.processSchemaLocations method then you could also check the resourceSet and bypass the call to the builder completely.

Thanks for all your help. Let me know if you want me to do anything else with this.

-Don
Re: Dynamic model load performance [message #766670 is a reply to message #766445] Fri, 16 December 2011 08:35 Go to previous message
Ed Merks is currently offline Ed MerksFriend
Messages: 33137
Registered: July 2009
Senior Member
Don,

Comments below.


On 15/12/2011 9:05 PM, Don Laidlaw wrote:
> Reusing the ecoreBuilder did not help at all. But I did find a
> solution to my problem. I instead extended the DefaultEcoreBuilder
> class as follows:
> /**
> * An EcoreBuilder that checks to see if the package namespaces
> * are already loaded into the global package registry. If they are
> * then the URI for that namespace is not loaded. The effect is to cache
> * the namespaces in the global registry.
> * * @author donlaidlaw
> * */
> public class SchemaCachingEcoreBuilder extends DefaultEcoreBuilder {
>
> public SchemaCachingEcoreBuilder(ExtendedMetaData extendedMetaData) {
> super(extendedMetaData);
> }
>
> @Override
> public Collection<? extends Resource> generate(
> Map<String, URI> targetNamespaceToURI) throws Exception {
> if (targetNamespaceToURI != null
> && XSD_ECORE_BUILDER_CONSTRUCTOR != null
> && XSD_ECORE_BUILDER_GENERATE_RESOURCES_METHOD != null) {
> Map<String, URI> notCachedMap = new HashMap<String, URI>();
> for (Map.Entry<String, URI> entry :
> targetNamespaceToURI.entrySet()) {
> if
> (EPackage.Registry.INSTANCE.getEPackage(entry.getKey()) == null) {
> notCachedMap.put(entry.getKey(), entry.getValue());
> }
> }
> if (!notCachedMap.isEmpty())
> return doOriginalGenerate(notCachedMap);
> }
> return Collections.emptyList();
> }
>
> public Collection<? extends Resource>
> doOriginalGenerate(Map<String, URI> targetNamespaceToURI) throws
> Exception {
> Object ecoreBuilder = XSD_ECORE_BUILDER_CONSTRUCTOR
> .newInstance(new Object[] { extendedMetaData,
> XSD_ECORE_BUILDER_OPTIONS });
> @SuppressWarnings("unchecked")
> Collection<? extends Resource> result = (Collection<? extends
> Resource>) XSD_ECORE_BUILDER_GENERATE_RESOURCES_METHOD
> .invoke(ecoreBuilder,
> new Object[] { targetNamespaceToURI.values() });
> return result;
> }
> }
>
> I simply added the new ecore builder above to the XMLOptions so that
> that builder is used instead of the default.
Yes, it makes good sense to reuse it...
>
> The only problem with this one is that the previously loaded schemas
> must be put into the global package registry.
Another approach is to specialize the actual XSDEcoreBuilder; the one in
XMI is just a reflective facade to avoid direct dependencies of EMF on XSD.
> The DefaultEcoreBuilder does not include a reference to the resource
> or resourceSet so we cannot check the package registry in the
> resourceSet.
If you specialized it, you could specialize createResourceSet so that it
only ever creates one and reuses it; that might be important if
subsequent instances might use schemas that refer back to
already-processed schemas.

Note that the extended metadata instance that's passed in to the
constructor is used as a package registry, so if you create an extended
meta data instance for your instance resource set's package registry,
things should automatically be registered there...
>
> If the same code was put in the XMLHandler.processSchemaLocations
> method then you could also check the resourceSet and bypass the call
> to the builder completely.
It will certainly be nice to include support for something along these
lines in the core framework so please open a bugzilla with a test case
illustrating the problem along with any patches or add-ons that are
proving helpful.
>
> Thanks for all your help. Let me know if you want me to do anything
> else with this.
Thanks for taking the initiative to investigate solutions to the problem!
>
> -Don


Ed Merks
Professional Support: https://www.macromodeling.com/
Previous Topic:How to link meta-classes from different meta-models at the model level?
Next Topic:[EEF] Multiple Tabs in the properties view
Goto Forum:
  


Current Time: Sat Apr 20 02:04:54 GMT 2024

Powered by FUDForum. Page generated in 0.03263 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top