xs list parsing incorrect [message #846629] |
Mon, 16 April 2012 11:42 |
Csaba Koncz Messages: 49 Registered: July 2009 |
Member |
|
|
Hi all,
I have an ECore model created using XML Schema import. The source schema has a simple type defined as a list whose item type maps to an enumeration.
Recently I have upgraded the EMF libraries from version 2.5 to 2.7 and started to experience strange loading failures for XML documents that used to load without problems before. Tracked down the problem to XML attribute values like this:
allowedDirections="NORTH WEST SOUTH"
i.e. between WEST and SOUTH there are two spaces instead of one. Although unintentional, this attribute is totally valid according to the schema, as simple list types are supposed to collapse white spaces. The older EMF code generator generates code that parses the attribute value using StringTokenizer, the newer ones use String.split() for the same task. However, the two methods are not totally equivalent for the input shown above: the new code will find an extra empty token between WEST and SOUTH, which results in a load failure.
As a workaround I specify a "Runtime Version"=2.5 in my genmodel. However, I would not like to get stuck with this version, is there a better solution?
Thank you in advance,
Csaba
|
|
|
Re: xs list parsing incorrect [message #846718 is a reply to message #846629] |
Mon, 16 April 2012 13:29 |
Ed Merks Messages: 33212 Registered: July 2009 |
Senior Member |
|
|
Csaba,
Comments below.
On 16/04/2012 1:42 PM, Csaba Koncz wrote:
> Hi all,
>
> I have an ECore model created using XML Schema import. The source
> schema has a simple type defined as a list whose item type maps to an
> enumeration.
> Recently I have upgraded the EMF libraries from version 2.5 to 2.7 and
> started to experience strange loading failures for XML documents that
> used to load without problems before. Tracked down the problem to XML
> attribute values like this:
>
> allowedDirections="NORTH WEST SOUTH"
>
> i.e. between WEST and SOUTH there are two spaces instead of one.
> Although unintentional, this attribute is totally valid according to
> the schema, as simple list types are supposed to collapse white
> spaces. The older EMF code generator generates code that parses the
> attribute value using StringTokenizer, the newer ones use
> String.split() for the same task. However, the two methods are not
> totally equivalent for the input shown above: the new code will find
> an extra empty token between WEST and SOUTH, which results in a load
> failure.
Hmm. I didn't realize that.
>
> As a workaround I specify a "Runtime Version"=2.5 in my genmodel.
> However, I would not like to get stuck with this version, is there a
> better solution?
Please file a bug report, ideally with a minimal test case, and I'll
look into fixing it for EMF 2.8 (i.e., the upcoming release). Probably
I need to change
return value.split("[ \t\n\r\f]");
to
return value.split("[ \t\n\r\f]+");
While I'm at it, it's also probably much better for performance if I
compile this regular expression just once, and reuse the compiled form.
> Thank you in advance,
> Csaba
>
Ed Merks
Professional Support: https://www.macromodeling.com/
|
|
|
|
Powered by
FUDForum. Page generated in 0.07472 seconds