Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » XML Schema Definition (XSD) » Load Performance
Load Performance [message #64987] Sat, 26 November 2005 13:18 Go to next message
David Carver is currently offline David Carver
Messages: 174
Registered: July 2009
Senior Member
What is the quickest way performance wise to load a schemas into the
model. I have schemas that have multiple includes and multiple imports.
Some of the includes are included multiple times, that can be worked out.
The imports are imported only once.

When loading a schemas, I noticed that it can take between 4 to 5 seconds
to load. When I've loaded the same schemas into other tools that use the
XSD API, they don't seem to take that long. I am going to put some timing
code in to see where exactly the slow down is occurring, but just what the
fastest method is to load the schemas.
Re: Load Performance [message #64997 is a reply to message #64987] Sat, 26 November 2005 16:07 Go to previous messageGo to next message
Eclipse User
Originally posted by: merks.ca.ibm.com

David,

The XSD model isn't highly performance tuned. It isn't really
performance tuned at all. There are many expensive and repeated
traversals of the model.

The XSDMainExample from the examples download supports a -validate
option that shows how one typically loads schemas...


David Carver wrote:

> What is the quickest way performance wise to load a schemas into the
> model. I have schemas that have multiple includes and multiple
> imports. Some of the includes are included multiple times, that can
> be worked out. The imports are imported only once.
>
> When loading a schemas, I noticed that it can take between 4 to 5
> seconds to load. When I've loaded the same schemas into other tools
> that use the XSD API, they don't seem to take that long. I am going
> to put some timing code in to see where exactly the slow down is
> occurring, but just what the fastest method is to load the schemas.
>
>
Re: Load Performance [message #65003 is a reply to message #64997] Sat, 26 November 2005 19:10 Go to previous messageGo to next message
David Carver is currently offline David Carver
Messages: 174
Registered: July 2009
Senior Member
Thanks, Ed. That code confirmed my suspicions that it was the
XSDResourceImpl.load method that is the bottle neck. I ran a time
comparison, looking at the time directly before and after calling the load
method, and it showed that procedure call took 7 seconds to complete.

Is there any active performance tuning going on with the XSD Infoset or
has development dropped off for more pressing maters. If I can get some
time, I may try and dig deeper int he XSDResouceImpl implemeation and see
if I can find where the bottle neck is during that process.
Re: Load Performance [message #65025 is a reply to message #65003] Sat, 26 November 2005 20:04 Go to previous messageGo to next message
Eclipse User
Originally posted by: merks.ca.ibm.com

This is a multi-part message in MIME format.
--------------010202050108010207070807
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

David,

Keep in mind that the first load of a user schema also involves loading
the schema for schemas, which is cached and reused once loaded for the
first time, so you might be measuring a large one-time initialization
step. Elena has just completed some work that supports pooling of
parsers.

http://bugs.eclipse.org/bugs/show_bug.cgi?id=117763

Much time is being wasted building parsers again and against and again
when they can be reused.

In addition, the new package literals support in EMF was used to improve
the speed of access to the metadata that is heavily using while build
and analyzing instances:

http://bugs.eclipse.org/bugs/show_bug.cgi?id=117353

So there is a little bit of active tuning going on, but there isn't much
time for it.

Contributions are most welcome.


David Carver wrote:

> Thanks, Ed. That code confirmed my suspicions that it was the
> XSDResourceImpl.load method that is the bottle neck. I ran a time
> comparison, looking at the time directly before and after calling the
> load method, and it showed that procedure call took 7 seconds to
> complete.
> Is there any active performance tuning going on with the XSD Infoset
> or has development dropped off for more pressing maters. If I can
> get some time, I may try and dig deeper int he XSDResouceImpl
> implemeation and see if I can find where the bottle neck is during
> that process.
>
>


--------------010202050108010207070807
Content-Type: text/html; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-15"
http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
David,<br>
<br>
Keep in mind that the first load of a user schema also involves loading
the schema for schemas, which is cached and reused once loaded for the
first time, so you might be measuring a large one-time initialization
step.
Re: Load Performance [message #65029 is a reply to message #65025] Sat, 26 November 2005 21:16 Go to previous messageGo to next message
David Carver is currently offline David Carver
Messages: 174
Registered: July 2009
Senior Member
Thanks Ed. The issue I'm experiencing I don't think would be benefited by
the cacheing enhacements as the utility I've written is called from an Ant
Task which passes a File Set of schemas. As each file in the fileset is
processed, it is loaded individually, processed, and then the new file is
written out. Basically it's done this way so I can support handling one
file or an unlimited number of files. It appears that the 7 seconds is
pretty constant given the way the program currently is code to work.

Eventually, I'm going to have 74 Schemas that this will be processing. So
at 7 seconds a load, that is a little over 8 1/2 minutes just in load time
for all the schemas (i.e. load schemas, process, write). Average time
from load to write is about 15-18 seconds per schemas, so we are looking
at 22 minutes to process all 74 in a batch mode. Not bad, but not that
good either.

If I get a chance I'll try and take a look at the XSD code and see if I
can identify the major bottle neck point, but for now I'll live with it.
The alternative is to rewrite at the DOM level, and have to deal with all
that mess. Much rather deal with a Schemas Specific API than a generic
one.
Re: Load Performance [message #65038 is a reply to message #65029] Sun, 27 November 2005 08:22 Go to previous message
Eclipse User
Originally posted by: merks.ca.ibm.com

David,

Each load of a schema involves creating a parser (two in fact) that can
be reused within that same JVM once the load is done, so I'm sure
there's something to gain from this. It's also important to keep in
mind that typically schemas aren't independent, so loading one schema
typically loads many schemas. As such, loading 74 schemas one at a time
separately might load the some of the same schemas 74 times, and 74
times 74 gets to be large. Certainly folks using the XSDEcoreBuilder
have found that loading all the schemas and converting them at once is
far faster than applying XSDEcoreBuilder to each schema separately. You
most likely will benefit greatly from using a single resource set to
process all your instances...


David Carver wrote:

> Thanks Ed. The issue I'm experiencing I don't think would be benefited
> by the cacheing enhacements as the utility I've written is called from
> an Ant Task which passes a File Set of schemas. As each file in the
> fileset is processed, it is loaded individually, processed, and then
> the new file is written out. Basically it's done this way so I can
> support handling one file or an unlimited number of files. It appears
> that the 7 seconds is pretty constant given the way the program
> currently is code to work.
>
> Eventually, I'm going to have 74 Schemas that this will be
> processing. So at 7 seconds a load, that is a little over 8 1/2
> minutes just in load time for all the schemas (i.e. load schemas,
> process, write). Average time from load to write is about 15-18
> seconds per schemas, so we are looking at 22 minutes to process all 74
> in a batch mode. Not bad, but not that good either.
>
> If I get a chance I'll try and take a look at the XSD code and see if
> I can identify the major bottle neck point, but for now I'll live with
> it. The alternative is to rewrite at the DOM level, and have to deal
> with all that mess. Much rather deal with a Schemas Specific API
> than a generic one.
>
>
Re: Load Performance [message #597160 is a reply to message #64987] Sat, 26 November 2005 16:07 Go to previous message
Ed Merks is currently offline Ed Merks
Messages: 25999
Registered: July 2009
Senior Member
David,

The XSD model isn't highly performance tuned. It isn't really
performance tuned at all. There are many expensive and repeated
traversals of the model.

The XSDMainExample from the examples download supports a -validate
option that shows how one typically loads schemas...


David Carver wrote:

> What is the quickest way performance wise to load a schemas into the
> model. I have schemas that have multiple includes and multiple
> imports. Some of the includes are included multiple times, that can
> be worked out. The imports are imported only once.
>
> When loading a schemas, I noticed that it can take between 4 to 5
> seconds to load. When I've loaded the same schemas into other tools
> that use the XSD API, they don't seem to take that long. I am going
> to put some timing code in to see where exactly the slow down is
> occurring, but just what the fastest method is to load the schemas.
>
>
Re: Load Performance [message #597168 is a reply to message #64997] Sat, 26 November 2005 19:10 Go to previous message
David Carver is currently offline David Carver
Messages: 174
Registered: July 2009
Senior Member
Thanks, Ed. That code confirmed my suspicions that it was the
XSDResourceImpl.load method that is the bottle neck. I ran a time
comparison, looking at the time directly before and after calling the load
method, and it showed that procedure call took 7 seconds to complete.

Is there any active performance tuning going on with the XSD Infoset or
has development dropped off for more pressing maters. If I can get some
time, I may try and dig deeper int he XSDResouceImpl implemeation and see
if I can find where the bottle neck is during that process.
Re: Load Performance [message #597182 is a reply to message #65003] Sat, 26 November 2005 20:04 Go to previous message
Ed Merks is currently offline Ed Merks
Messages: 25999
Registered: July 2009
Senior Member
This is a multi-part message in MIME format.
--------------010202050108010207070807
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

David,

Keep in mind that the first load of a user schema also involves loading
the schema for schemas, which is cached and reused once loaded for the
first time, so you might be measuring a large one-time initialization
step. Elena has just completed some work that supports pooling of
parsers.

http://bugs.eclipse.org/bugs/show_bug.cgi?id=117763

Much time is being wasted building parsers again and against and again
when they can be reused.

In addition, the new package literals support in EMF was used to improve
the speed of access to the metadata that is heavily using while build
and analyzing instances:

http://bugs.eclipse.org/bugs/show_bug.cgi?id=117353

So there is a little bit of active tuning going on, but there isn't much
time for it.

Contributions are most welcome.


David Carver wrote:

> Thanks, Ed. That code confirmed my suspicions that it was the
> XSDResourceImpl.load method that is the bottle neck. I ran a time
> comparison, looking at the time directly before and after calling the
> load method, and it showed that procedure call took 7 seconds to
> complete.
> Is there any active performance tuning going on with the XSD Infoset
> or has development dropped off for more pressing maters. If I can
> get some time, I may try and dig deeper int he XSDResouceImpl
> implemeation and see if I can find where the bottle neck is during
> that process.
>
>


--------------010202050108010207070807
Content-Type: text/html; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-15"
http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
David,<br>
<br>
Keep in mind that the first load of a user schema also involves loading
the schema for schemas, which is cached and reused once loaded for the
first time, so you might be measuring a large one-time initialization
step.
Re: Load Performance [message #597189 is a reply to message #65025] Sat, 26 November 2005 21:16 Go to previous message
David Carver is currently offline David Carver
Messages: 174
Registered: July 2009
Senior Member
Thanks Ed. The issue I'm experiencing I don't think would be benefited by
the cacheing enhacements as the utility I've written is called from an Ant
Task which passes a File Set of schemas. As each file in the fileset is
processed, it is loaded individually, processed, and then the new file is
written out. Basically it's done this way so I can support handling one
file or an unlimited number of files. It appears that the 7 seconds is
pretty constant given the way the program currently is code to work.

Eventually, I'm going to have 74 Schemas that this will be processing. So
at 7 seconds a load, that is a little over 8 1/2 minutes just in load time
for all the schemas (i.e. load schemas, process, write). Average time
from load to write is about 15-18 seconds per schemas, so we are looking
at 22 minutes to process all 74 in a batch mode. Not bad, but not that
good either.

If I get a chance I'll try and take a look at the XSD code and see if I
can identify the major bottle neck point, but for now I'll live with it.
The alternative is to rewrite at the DOM level, and have to deal with all
that mess. Much rather deal with a Schemas Specific API than a generic
one.
Re: Load Performance [message #597199 is a reply to message #65029] Sun, 27 November 2005 08:22 Go to previous message
Ed Merks is currently offline Ed Merks
Messages: 25999
Registered: July 2009
Senior Member
David,

Each load of a schema involves creating a parser (two in fact) that can
be reused within that same JVM once the load is done, so I'm sure
there's something to gain from this. It's also important to keep in
mind that typically schemas aren't independent, so loading one schema
typically loads many schemas. As such, loading 74 schemas one at a time
separately might load the some of the same schemas 74 times, and 74
times 74 gets to be large. Certainly folks using the XSDEcoreBuilder
have found that loading all the schemas and converting them at once is
far faster than applying XSDEcoreBuilder to each schema separately. You
most likely will benefit greatly from using a single resource set to
process all your instances...


David Carver wrote:

> Thanks Ed. The issue I'm experiencing I don't think would be benefited
> by the cacheing enhacements as the utility I've written is called from
> an Ant Task which passes a File Set of schemas. As each file in the
> fileset is processed, it is loaded individually, processed, and then
> the new file is written out. Basically it's done this way so I can
> support handling one file or an unlimited number of files. It appears
> that the 7 seconds is pretty constant given the way the program
> currently is code to work.
>
> Eventually, I'm going to have 74 Schemas that this will be
> processing. So at 7 seconds a load, that is a little over 8 1/2
> minutes just in load time for all the schemas (i.e. load schemas,
> process, write). Average time from load to write is about 15-18
> seconds per schemas, so we are looking at 22 minutes to process all 74
> in a batch mode. Not bad, but not that good either.
>
> If I get a chance I'll try and take a look at the XSD code and see if
> I can identify the major bottle neck point, but for now I'll live with
> it. The alternative is to rewrite at the DOM level, and have to deal
> with all that mess. Much rather deal with a Schemas Specific API
> than a generic one.
>
>
Previous Topic:Load Performance
Next Topic:Schema Validation problem with Geronimo Schemas
Goto Forum:
  


Current Time: Thu Aug 21 12:02:16 EDT 2014

Powered by FUDForum. Page generated in 0.02053 seconds