Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » EMF » Problem serialization of model to encrypted binary file
Problem serialization of model to encrypted binary file [message #1067279] Mon, 08 July 2013 08:02 Go to next message
Niels Brouwers is currently offline Niels BrouwersFriend
Messages: 80
Registered: July 2009
Member
Hi,

We have been serializing EMF models to binary files for some while now. The binary models are exported to the execution environment. Binary files are compared with their predecessors to allow for incremental upgrading of the execution environment.

The problem we are facing now is that binary serialization of identical models, validated by EMF Compare and by saving the binary models as XMI, are sometimes different to each other. We have a set of about 100+ models which are being serialized, and from this set it seems that about 20 models of them are not identical to their predecessors. Unfortunately, this set of non-identical files is not constant, and seems to be randomly different.

Unfortunately, it is a strong requirement to only upgrade the execution environment when files are really functionally different.

Help is much appreciated to tackle this problem. Can someone please help us? Provide a solution, or guide us into the correct direction?

Thanks!


Kind regards,
Niels Brouwers.
Re: Problem serialization of model to encrypted binary file [message #1067285 is a reply to message #1067279] Mon, 08 July 2013 08:13 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33113
Registered: July 2009
Senior Member
Niels,

Comments below.


On 08/07/2013 10:03 AM, Niels Brouwers wrote:
> Hi,
>
> We have been serializing EMF models to binary files for some while
> now. The binary models are exported to the execution environment.
> Binary files are compared with their predecessors to allow for
> incremental upgrading of the execution environment.
>
> The problem we are facing now is that binary serialization of
> identical models, validated by EMF Compare and by saving the binary
> models as XMI, are sometimes different to each other.
Hmmm. It would be hard to see what the difference is... Are there XMI
ID's involved? That's using a HashMap so the iteration order is not
well defined...
> We have a set of about 100+ models which are being serialized, and
> from this set it seems that about 20 models of them are not identical
> to their predecessors. Unfortunately, this set of non-identical files
> is not constant, and seems to be randomly different.
> Unfortunately, it is a strong requirement to only upgrade the
> execution environment when files are really functionally different.
>
> Help is much appreciated to tackle this problem. Can someone please
> help us? Provide a solution, or guide us into the correct direction?
I guess I'll wait to know whether extrinsic IDs are involved...
>
> Thanks!


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Problem serialization of model to encrypted binary file [message #1067324 is a reply to message #1067285] Mon, 08 July 2013 10:03 Go to previous messageGo to next message
Niels Brouwers is currently offline Niels BrouwersFriend
Messages: 80
Registered: July 2009
Member
Hi Ed,

yes, extrensic ID's are indeed involved:
	
@Override
protected boolean useUUIDs() {
return true;
}


I believe the CRC of the file used as input for a QVTO transformation is used to determine the UUIDs. As such, if the input file is identical and the transformation produces deterministic output, the UUIDs of the objects should be deterministic as well.

Furthermore, these are load and save options of the RealXmiResourceImpl class, which is derived from XmiResourceImpl:

	
private void setOptions() {
	// Make binary data deterministic
	eObjectToIDMap = new LinkedHashMap<EObject, String>();
		
	URIHandler uriHandler = new XmiUriHandeler();

	// Update default de-serialization (load) options.
	Map<Object, Object> loadOptions = getDefaultLoadOptions();

	// Recommended load options for performance
	loadOptions.put(OPTION_DEFER_ATTACHMENT, true);
	loadOptions.put(OPTION_DEFER_IDREF_RESOLUTION, true);
	loadOptions.put(OPTION_USE_DEPRECATED_METHODS, false);
	loadOptions.put(OPTION_USE_PARSER_POOL, parserPool);
	loadOptions.put(OPTION_USE_XML_NAME_TO_FEATURE_MAP,
			nameToFeatureMap.get());

	// Other options
	loadOptions.put(OPTION_URI_HANDLER, uriHandler);

	// Update default serialization (save) options.
	Map<Object, Object> saveOptions = getDefaultSaveOptions();

	// Recommended safe options for performance
	saveOptions.put(OPTION_CONFIGURATION_CACHE, true);
	saveOptions.put(OPTION_USE_CACHED_LOOKUP_TABLE, lookupTable.get());

	// Other options
	saveOptions.put(OPTION_URI_HANDLER, uriHandler);

	saveOptions.put(OPTION_KEEP_DEFAULT_CONTENT, true);
	saveOptions.put(OPTION_DECLARE_XML, true);
	saveOptions.put(OPTION_PROCESS_DANGLING_HREF,
			OPTION_PROCESS_DANGLING_HREF_RECORD);
	saveOptions.put(OPTION_SCHEMA_LOCATION, true);
	saveOptions.put(OPTION_USE_XMI_TYPE, true);
	saveOptions.put(OPTION_SAVE_TYPE_INFORMATION, true);
	saveOptions.put(OPTION_SKIP_ESCAPE_URI, false);
	saveOptions.put(OPTION_ENCODING, XMI_ENCODING);

	// Set XML encoding to the encoding defined for XMI, if necessary.
	if (!getEncoding().equals(XMI_ENCODING))
		setEncoding(XMI_ENCODING);

	setIntrinsicIDToEObjectMap(new LinkedHashMap<String, EObject>());
	}


The actual class used for (de-)serialization is derived from RealXmiResourceImpl and adds encryption, BinaryResourceImpl:

private void setOptions() {
	getDefaultLoadOptions().put(OPTION_BINARY, true);
	getDefaultLoadOptions().put(BinaryResourceImpl.OPTION_VERSION,
		Version.VERSION_1_1);
	getDefaultLoadOptions().put(
		BinaryResourceImpl.OPTION_STYLE_BINARY_ENUMERATOR, true);
	getDefaultLoadOptions().put(
	BinaryResourceImpl.OPTION_STYLE_PROXY_ATTRIBUTES, true);
	try {
		getDefaultLoadOptions().put(OPTION_CIPHER, new DESCipherImpl(getKey()));
			}
	} catch (Exception e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

	getDefaultSaveOptions().put(OPTION_BINARY, true);
	getDefaultSaveOptions().put(BinaryResourceImpl.OPTION_VERSION,
			Version.VERSION_1_1);
	getDefaultSaveOptions().put(
			BinaryResourceImpl.OPTION_STYLE_BINARY_ENUMERATOR, true);
	getDefaultSaveOptions().put(
			BinaryResourceImpl.OPTION_STYLE_PROXY_ATTRIBUTES, true);
	// Disable xml-formatting
	getDefaultSaveOptions().put(OPTION_FORMATTED, false);
	try {
		getDefaultSaveOptions().put(OPTION_CIPHER, new DESCipherImpl(getKey()));
	} catch (Exception e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
		}
	super.init();


Kind regards,
Niels Brouwers.

[Updated on: Mon, 08 July 2013 10:13]

Report message to a moderator

Re: Problem serialization of model to encrypted binary file [message #1067326 is a reply to message #1067324] Mon, 08 July 2013 10:15 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33113
Registered: July 2009
Senior Member
Niels,

So very likely the problem is related to that. Something to try is to
override this method of XMLResourceImpl to use a LinkedHashMap instead
of just a HashMap.

public Map<EObject, String> getEObjectToIDMap()
{
if (eObjectToIDMap == null)
{
eObjectToIDMap = new HashMap<EObject, String>();
}

return eObjectToIDMap;
}

That should have a more consistent order compared to something that
depends on the hash order of the EObject hash codes. Please let me know
how that works out. Perhaps this should be changed in the base framework.


On 08/07/2013 12:03 PM, Niels Brouwers wrote:
> Hi Ed,
>
> yes, extrensic ID's are indeed involved:
>
> @Override
> protected boolean useUUIDs() {
> return true;
> }


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Problem serialization of model to encrypted binary file [message #1067333 is a reply to message #1067326] Mon, 08 July 2013 10:38 Go to previous messageGo to next message
Niels Brouwers is currently offline Niels BrouwersFriend
Messages: 80
Registered: July 2009
Member
Hi Ed,

thanks for your fast responses. In this case a bit too fast, as I editted my previous post after I've accidentally submitted it incompletely. As you can see in my previous (editted) post, we already set some options to make the output deterministic.

Although we didn't do it by overriding the getEObjectToIDMap() method, we basically set the protected member variable eObjectToIDMap of the XMIResourceImpl to a LinkedHashMap in our derived class whener we the options are set during construction of the object.

It might indeed be a good idea to always use a LinkedHashMap.

Anymore suggestions?


Kind regards,
Niels Brouwers.
Re: Problem serialization of model to encrypted binary file [message #1067337 is a reply to message #1067333] Mon, 08 July 2013 10:50 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33113
Registered: July 2009
Senior Member
Niels,

It seems to me everything else is deterministically order by the
traversal of the EObjects and their features, so I don't have other
theories. Have to diffed the binaries to see where/how they differ?


On 08/07/2013 12:38 PM, Niels Brouwers wrote:
> Hi Ed,
>
> thanks for your fast responses. In this case a bit too fast, as I
> editted my previous post after I've accidentally submitted it
> incompletely. As you can see in my previous (editted) post, we already
> set some options to make the output deterministic.
>
> Although we didn't do it by overriding the getEObjectToIDMap() method,
> we basically set the protected member variable eObjectToIDMap of the
> XMIResourceImpl to a LinkedHashMap in our derived class whener we the
> options are set during construction of the object.
>
> It might indeed be a good idea to always use a LinkedHashMap.
>
> Anymore suggestions?


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Problem serialization of model to encrypted binary file [message #1067390 is a reply to message #1067337] Mon, 08 July 2013 13:40 Go to previous messageGo to next message
Niels Brouwers is currently offline Niels BrouwersFriend
Messages: 80
Registered: July 2009
Member
Hi Ed,

We've done additional testing. First test was to disable the encryption options mentioned in the code in my previous posts and run the same transformations in a batch twice. The same amount of output models were determined to be different compared to a previous run.

Next, we set the output format to XMI and executed the transformations in a batch twice again. We have found all output models to be binary identical to the previous run.

Our conclusion until now is that the difference is somehow caused during the serialization from the model in memory to the binary format.

When comparing two non-identical binary models with each other, we see small differences at the place were model elements from another resource are being referenced. Sometimes the first part of the URI (containing the file) is missing and replaced by a token. Sometimes the token in both files is just different.

At this point it may also be note worthy that we created our own URI converter, which allows us to find models according to a certain scheme within two distinct execution environments.

Any more ideas?



Kind regards,
Niels Brouwers.
Re: Problem serialization of model to encrypted binary file [message #1067412 is a reply to message #1067390] Mon, 08 July 2013 14:30 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 33113
Registered: July 2009
Senior Member
Niels,

Comments below.

On 08/07/2013 3:40 PM, Niels Brouwers wrote:
> Hi Ed,
>
> We've done additional testing. First test was to disable the
> encryption options mentioned in the code in my previous posts and run
> the same transformations in a batch twice. The same amount of output
> models were determined to be different compared to a previous run.
> Next, we set the output format to XMI and executed the transformations
> in a batch twice again. We have found all output models to be binary
> identical to the previous run.
>
> Our conclusion until now is that the difference is somehow caused
> during the serialization from the model in memory to the binary format.
I see.
> When comparing two non-identical binary models with each other, we see
> small differences at the place were model elements from another
> resource are being referenced.
The referenced resource has the same URI but somehow the proxy URI being
serialized is a little different?
> Sometimes the first part of the URI (containing the file) is missing
> and replaced by a token.
Which version of EMF are you using? In
org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl.EObjectOutputStream.writeURI(URI,
String), even in older versions of EMF, that method generally uses a URI
table so it will write the full URI only the first time, and after that
write out a compressed int for the repeated occurrence.
> Sometimes the token in both files is just different.
I would suggest adding print statements to the writeURI method to see if
each case produces the same sequence of URI/fragment pairs.
> At this point it may also be note worthy that we created our own URI
> converter, which allows us to find models according to a certain
> scheme within two distinct execution environments.
Another thing to be sure about is whether the URIs of all the referenced
resources are identical in each case; of course those URIs are used to
encode proxy references for the cross document references. Such a
problem would be clear from the printed traces of all the calls to
writeURI...
>
> Any more ideas?
>
>


Ed Merks
Professional Support: https://www.macromodeling.com/
Re: Problem serialization of model to encrypted binary file [message #1067992 is a reply to message #1067412] Thu, 11 July 2013 15:13 Go to previous messageGo to next message
Niels Brouwers is currently offline Niels BrouwersFriend
Messages: 80
Registered: July 2009
Member
Hi Ed,

I finally got time to modify the BinaryResourceImpl and use this version to serialze the models to a binary file and see what is causing the difference in the binary output.

It seems that the difference is indeed caused when a uri is serialized. Apparently we have multiple references to the same model which can be reached through multiple paths on the filesystem. Sometimes the one path ends up in the uri, sometimes another. We are not sure if that is caused by our custom written URI converter, or maybe a non-determinsm in the transformation.

So, I am pretty sure it is something we are doing wrong. Hopefully, we can fix it ourselves without any further assistance.

Thanks for your help!


Kind regards,
Niels Brouwers.
Re: Problem serialization of model to encrypted binary file [message #1068010 is a reply to message #1067992] Thu, 11 July 2013 16:13 Go to previous message
Ed Merks is currently offline Ed MerksFriend
Messages: 33113
Registered: July 2009
Senior Member
Niels,

With URI mapping as used by URIConverter.normalize it is possible that
two different URIs will normalize to the same final URI and the resource
could have either of those URIs and behave much the same. Once you've
serialized an undesirable URI, it can tend to show up because the URI on
the resource will the that of the first attempt to load it, i.e., the
first proxy resolve.

So one approach to consider it using EcoreUtil.resolveAll on the
resource set. Look at all the URIs of all the resources. Any that you
don't consider the "canonical form of the URI", set it to what it should
be and then save all the resources again.


On 11/07/2013 5:13 PM, Niels Brouwers wrote:
> Hi Ed,
>
> I finally got time to modify the BinaryResourceImpl and use this
> version to serialze the models to a binary file and see what is
> causing the difference in the binary output.
>
> It seems that the difference is indeed caused when a uri is
> serialized. Apparently we have multiple references to the same model
> which can be reached through multiple paths on the filesystem.
> Sometimes the one path ends up in the uri, sometimes another. We are
> not sure if that is caused by our custom written URI converter, or
> maybe a non-determinsm in the transformation.
> So, I am pretty sure it is something we are doing wrong. Hopefully, we
> can fix it ourselves without any further assistance.
>
> Thanks for your help!


Ed Merks
Professional Support: https://www.macromodeling.com/
Previous Topic:Class inheritance across plugins
Next Topic:Notifying of changes to attributes of child objects in a list.
Goto Forum:
  


Current Time: Thu Mar 28 23:35:20 GMT 2024

Powered by FUDForum. Page generated in 0.03364 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top