Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » XML Schema Definition (XSD) » XML Encoding attribute
XML Encoding attribute [message #48567] Fri, 02 July 2004 14:06 Go to next message
Eclipse User
Originally posted by: NOpcooperSPAM.uk.ibm.com

Hi,

I'm writing some XSD based code at the moment that is going to end up
being executed on a platform that uses an EBCDIC based filesystem. The
XSD code all seems to work fine in this enviroment except for in one
fairly common but erroneous use-case.

It's quite common for folk to ftp XML files in text mode. If this
happens and a codepage conversion also takes place then you end up with
an XML file that may contain an invalid encoding attribute (such as
<?xml version="1.0" encoding="UTF-8"?>) when the file is actually
something else (such as EBCDIC).

If I edit my schema documents by hand to correct the encoding then all's
well. If the schema is ftp'ed in binary mode then I would expect it to
work fine too. But if I ftp the schema in text mode and don't correct
the encoding statement then my XSD based code has problems parsing the
schema.

Is there any way to turn off support for the encoding tag? I know that
most XML parser will allow this to be done but when I use the XSD code
I don't have direct access to the underlying XML parser. Any ideas?

Thanks,

Paul.
Re: XML Encoding attribute [message #48686 is a reply to message #48567] Fri, 02 July 2004 19:25 Go to previous messageGo to next message
Eclipse User
Originally posted by: invalid.soft-gems.net

Paul Cooper wrote

>Is there any way to turn off support for the encoding tag?

I'm not sure if turning off the encoding tag is a solution. AFAIK an
instance without encoding is considered as encoded in UTF-8. Why don't you
store your XML/XSD files with the right encoding in the first place?

Mike
--
www.soft-gems.net
Re: XML Encoding attribute [message #48716 is a reply to message #48686] Sun, 04 July 2004 12:26 Go to previous messageGo to next message
Eclipse User
Originally posted by: NOpcooperSPAM.uk.ibm.com

Mike Lischke wrote:

> I'm not sure if turning off the encoding tag is a solution. AFAIK an
> instance without encoding is considered as encoded in UTF-8. Why don't
> you store your XML/XSD files with the right encoding in the first place?

The algorithm for determining the encoding to use is (or so I'm told)
quite complex. After all, the XML parser has to be reasonably sure about
what the codepage is before it can even go looking for the encoding tag.
The algorithm involves (for example) determing if the document is
encoded using little endian or big endian notations and whether there is
a UTF-16 byte order marker. This is documented in appendix F of the
third edition of the XML 1.0 spec.

In my test code if I just remove the encoding then the XSD toolkit is
able to parse the file just fine despite it being stored in an EBCDIC
variant. I'm told that the normal thing to do on my platform is to tell
the XML parser to ignore the encoding tag, but so far as I know I can't
do that with XSD.

Given that I am discussing an error case I could indeed require that the
users of my code correct the encoding specified within the schemas they
copy to the platform by hand, but this is error prone. If I could just
turn off support for the encoding attribute then a significant class of
user-error will dissapear. The risk involved in ignoring the encoding
attribute is low as odd characters have to be specified using XML escape
sequences anyway.

Thanks,

Paul.
Re: XML Encoding attribute [message #48746 is a reply to message #48716] Tue, 06 July 2004 13:36 Go to previous messageGo to next message
Eclipse User
Originally posted by: merks.ca.ibm.com

Paul,

There's currently no support for setting features or properties on the
underlying SAX parser that's used. Such supported could be added in the
future if you want to open a buzilla feature request.


Paul Cooper wrote:

> Mike Lischke wrote:
>
> > I'm not sure if turning off the encoding tag is a solution. AFAIK an
> > instance without encoding is considered as encoded in UTF-8. Why don't
> > you store your XML/XSD files with the right encoding in the first place?
>
> The algorithm for determining the encoding to use is (or so I'm told)
> quite complex. After all, the XML parser has to be reasonably sure about
> what the codepage is before it can even go looking for the encoding tag.
> The algorithm involves (for example) determing if the document is
> encoded using little endian or big endian notations and whether there is
> a UTF-16 byte order marker. This is documented in appendix F of the
> third edition of the XML 1.0 spec.
>
> In my test code if I just remove the encoding then the XSD toolkit is
> able to parse the file just fine despite it being stored in an EBCDIC
> variant. I'm told that the normal thing to do on my platform is to tell
> the XML parser to ignore the encoding tag, but so far as I know I can't
> do that with XSD.
>
> Given that I am discussing an error case I could indeed require that the
> users of my code correct the encoding specified within the schemas they
> copy to the platform by hand, but this is error prone. If I could just
> turn off support for the encoding attribute then a significant class of
> user-error will dissapear. The risk involved in ignoring the encoding
> attribute is low as odd characters have to be specified using XML escape
> sequences anyway.
>
> Thanks,
>
> Paul.
Re: XML Encoding attribute [message #48864 is a reply to message #48567] Fri, 09 July 2004 20:02 Go to previous message
Eclipse User
Originally posted by: dunkelpeter.gmx.net

Paul Cooper <NOpcooperSPAM@uk.ibm.com> schrieb:

>I'm writing some XSD based code at the moment that is going to end up
>being executed on a platform that uses an EBCDIC based filesystem. The
>XSD code all seems to work fine in this enviroment except for in one
>fairly common but erroneous use-case.

You are using Java with z/OS USS (Unix System Services)?
For this enviroment my german customers use the codepage EBCDIC "1047
USA". For TSO, ISPF, JES etc. they are using EBCDIC "273 Deutsch". A
filetransfer pro Windows/Unix to the USS via TSO never works, because
many character "{}[]@..." are not translate correct. Also a transfer
via ftp (Unix direct to USS) as "text" never works fine.

>It's quite common for folk to ftp XML files in text mode. If this
>happens and a codepage conversion also takes place then you end up with
>an XML file that may contain an invalid encoding attribute (such as
><?xml version="1.0" encoding="UTF-8"?>) when the file is actually
>something else (such as EBCDIC).

The I/O routines from java ( InputStreamReader ) can handle an
encoding.
Use "ISO-8859-1" or any other 8-bit encoding (UFT-8 is very special)
and transfer (ftp) everything binary. For special characters use the
HTML4 entities.
Do not use the automatic translate vom Unicode (Java) to EBCDIC (USS).

You can not read the files with oedit or obrowse, because these tools
use EBCDIC and your file is encoded in ASCII. But every JAVA program
on every platform can work with this file.

>Is there any way to turn off support for the encoding tag? I know that
>most XML parser will allow this to be done but when I use the XSD code
>I don't have direct access to the underlying XML parser. Any ideas?

Witch XML parser you are using?

Peter

www.dunkelpeter.de
Re: XML Encoding attribute [message #589601 is a reply to message #48567] Fri, 02 July 2004 19:25 Go to previous message
Mike Lischke is currently offline Mike Lischke
Messages: 78
Registered: July 2009
Member
Paul Cooper wrote

>Is there any way to turn off support for the encoding tag?

I'm not sure if turning off the encoding tag is a solution. AFAIK an
instance without encoding is considered as encoded in UTF-8. Why don't you
store your XML/XSD files with the right encoding in the first place?

Mike
--
www.soft-gems.net
Re: XML Encoding attribute [message #589612 is a reply to message #48686] Sun, 04 July 2004 12:26 Go to previous message
Paul Cooper is currently offline Paul Cooper
Messages: 11
Registered: July 2009
Junior Member
Mike Lischke wrote:

> I'm not sure if turning off the encoding tag is a solution. AFAIK an
> instance without encoding is considered as encoded in UTF-8. Why don't
> you store your XML/XSD files with the right encoding in the first place?

The algorithm for determining the encoding to use is (or so I'm told)
quite complex. After all, the XML parser has to be reasonably sure about
what the codepage is before it can even go looking for the encoding tag.
The algorithm involves (for example) determing if the document is
encoded using little endian or big endian notations and whether there is
a UTF-16 byte order marker. This is documented in appendix F of the
third edition of the XML 1.0 spec.

In my test code if I just remove the encoding then the XSD toolkit is
able to parse the file just fine despite it being stored in an EBCDIC
variant. I'm told that the normal thing to do on my platform is to tell
the XML parser to ignore the encoding tag, but so far as I know I can't
do that with XSD.

Given that I am discussing an error case I could indeed require that the
users of my code correct the encoding specified within the schemas they
copy to the platform by hand, but this is error prone. If I could just
turn off support for the encoding attribute then a significant class of
user-error will dissapear. The risk involved in ignoring the encoding
attribute is low as odd characters have to be specified using XML escape
sequences anyway.

Thanks,

Paul.
Re: XML Encoding attribute [message #589626 is a reply to message #48716] Tue, 06 July 2004 13:36 Go to previous message
Ed Merks is currently offline Ed Merks
Messages: 26152
Registered: July 2009
Senior Member
Paul,

There's currently no support for setting features or properties on the
underlying SAX parser that's used. Such supported could be added in the
future if you want to open a buzilla feature request.


Paul Cooper wrote:

> Mike Lischke wrote:
>
> > I'm not sure if turning off the encoding tag is a solution. AFAIK an
> > instance without encoding is considered as encoded in UTF-8. Why don't
> > you store your XML/XSD files with the right encoding in the first place?
>
> The algorithm for determining the encoding to use is (or so I'm told)
> quite complex. After all, the XML parser has to be reasonably sure about
> what the codepage is before it can even go looking for the encoding tag.
> The algorithm involves (for example) determing if the document is
> encoded using little endian or big endian notations and whether there is
> a UTF-16 byte order marker. This is documented in appendix F of the
> third edition of the XML 1.0 spec.
>
> In my test code if I just remove the encoding then the XSD toolkit is
> able to parse the file just fine despite it being stored in an EBCDIC
> variant. I'm told that the normal thing to do on my platform is to tell
> the XML parser to ignore the encoding tag, but so far as I know I can't
> do that with XSD.
>
> Given that I am discussing an error case I could indeed require that the
> users of my code correct the encoding specified within the schemas they
> copy to the platform by hand, but this is error prone. If I could just
> turn off support for the encoding attribute then a significant class of
> user-error will dissapear. The risk involved in ignoring the encoding
> attribute is low as odd characters have to be specified using XML escape
> sequences anyway.
>
> Thanks,
>
> Paul.
Re: XML Encoding attribute [message #589672 is a reply to message #48567] Fri, 09 July 2004 20:02 Go to previous message
Peter Dunkel is currently offline Peter Dunkel
Messages: 3
Registered: July 2009
Junior Member
Paul Cooper <NOpcooperSPAM@uk.ibm.com> schrieb:

>I'm writing some XSD based code at the moment that is going to end up
>being executed on a platform that uses an EBCDIC based filesystem. The
>XSD code all seems to work fine in this enviroment except for in one
>fairly common but erroneous use-case.

You are using Java with z/OS USS (Unix System Services)?
For this enviroment my german customers use the codepage EBCDIC "1047
USA". For TSO, ISPF, JES etc. they are using EBCDIC "273 Deutsch". A
filetransfer pro Windows/Unix to the USS via TSO never works, because
many character "{}[]@..." are not translate correct. Also a transfer
via ftp (Unix direct to USS) as "text" never works fine.

>It's quite common for folk to ftp XML files in text mode. If this
>happens and a codepage conversion also takes place then you end up with
>an XML file that may contain an invalid encoding attribute (such as
><?xml version="1.0" encoding="UTF-8"?>) when the file is actually
>something else (such as EBCDIC).

The I/O routines from java ( InputStreamReader ) can handle an
encoding.
Use "ISO-8859-1" or any other 8-bit encoding (UFT-8 is very special)
and transfer (ftp) everything binary. For special characters use the
HTML4 entities.
Do not use the automatic translate vom Unicode (Java) to EBCDIC (USS).

You can not read the files with oedit or obrowse, because these tools
use EBCDIC and your file is encoded in ASCII. But every JAVA program
on every platform can work with this file.

>Is there any way to turn off support for the encoding tag? I know that
>most XML parser will allow this to be done but when I use the XSD code
>I don't have direct access to the underlying XML parser. Any ideas?

Witch XML parser you are using?

Peter

www.dunkelpeter.de
Previous Topic:XSDSimpleFinal has no EXTENSION
Next Topic:Remove XSD manually to Rinstall
Goto Forum:
  


Current Time: Fri Oct 31 18:07:46 GMT 2014

Powered by FUDForum. Page generated in 0.02397 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software