EMF Performance Tips

Last updated: July 6, 2005

Authors: Dave Steinberg, Elena Litani

Reduce memory footprint of EMF models using boolean flags

The generator provides an option to combine boolean flags into an int field in class implementations. This applies to the values of boolean attributes and the flags representing whether unsettable features are set. This can significantly reduce the memory consumed by instances of models with classes containing a number of such features. To enable this option, specify a field name in the "Boolean Flags Field" property, under "Model Feature Defaults", of a generator model.

The "Boolean Flags Reserved Bits" property can be used to indicate that the int field is actually inherited from a base object implementation, and to specify the number of bits reserved for use by the base. For example, Ecore uses eFlags, which is defined on EObjectImpl, for its boolean flags, with 8 bits reserved for the base. Note that, though it's unlikely that EObjectImpl will require more than 8 bits of this field, we do reserve the right to break compatibility by exceeding this number, if necessary. So, it is safer to use a separate int field for the boolean flags in your model, which can still yield substantial memory savings.

Improve performance of save and load operations

The following options (specified on org.eclipse.emf.ecore.xmi.XMLResource) can be used on load(InputStream, Map) and save(OutputStream, Map) to improve the performance of the save or load operation, by decreasing either the running time of the operation or the amount of memory it uses.

Load options

OPTION_DEFER_IDREF_RESOLUTION

Option value: Boolean.

This option can be enabled to defer resolving references within a resource until the whole document has been parsed. The default strategy is to try to resolve each reference as it is encountered and then, at the end, resolve any ones that failed. This wastes time looking up forward references that do not exist yet, which, if you're using intrinsic IDs, can involve iterating over every object in the resource.

OPTION_USE_PARSER_POOL

Option value: org.eclipse.emf.ecore.xmi.XMLParserPool.

We strongly encourage to set this option for EMF load operation. This option is used to provide a parser pool, from which SAXParser instances are created and reused. XMLParserPool defines a simple interface for obtaining and releasing parsers based on their features and properties. Specifying a parser pool, such as an instance of the default implementation, XMLParserPoolImpl, can improve performance dramatically when a resource performs repeated loads. A single parser pool can also be shared among multiple resources. Default implementation is also thread-safe.

OPTION_USE_XML_NAME_TO_FEATURE_MAP

Option value: java.util.Map.

This option can be used to share the cache of mappings between qualified XML names (namespace + local name) and corresponding Ecore features across invocations of load(), or even among resources. It can take some time to determine these associations, since they can be affected by ExtendedMetaData or an XMLMap, so they are cached during a load. If you use this option to specify the same map for several loads, that instance will be used as the cache, improving the performance for all but the first. You can share a single map among multiple resources, unless they load different models with conflicting qualified names.

Save options

OPTION_FLUSH_THRESHOLD

Option value: Integer.

This option can be used to specify a maximum number of characters to allow in the output stream before flushing it. This reduces the memory used in serializing a large file, but it is slower than the default behavior.

OPTION_USE_CACHED_LOOKUP_TABLE

Option value: List.

This option provides a placeholder to cache information about structure of the model. (using qualified XML names as a key for caching information). When possible, the same placeholder list should be shared among resources, unless you intent to serialize instances of different models.

OPTION_CONFIGURATION_CACHE

Option value: Boolean.

This option can be enabled to cache and reuse generic data in repeated save operations, avoiding the cost of reinitializing the data. This option is save to use when serializing instances of the same model.

OPTION_FORMATTED

Option value: Boolean.

This option is used to disable formatting of documents, omitting whitespaces and line brakes, to improve the performance of saving and, subsequently, loading the resulting document. This means fewer bytes have to be written and read, but the serialization will be less human-readable.

OPTION_USE_FILE_BUFFER

Option value: Boolean.

This option can be enabled to accumulate output during serialization in a temporary file, rather than an in-memory buffer. This reduces the memory used in serializing a large file, but it is slower than the default behavior.