Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Archived » IMP » Two problems using LPG(An auto-build and an error flagging utf8 problem)
Two problems using LPG [message #652461] Fri, 04 February 2011 03:35 Go to next message
Ken Walter is currently offline Ken Walter
Messages: 13
Registered: February 2010
Junior Member
When saving a *.g? file with auto build the LPG error files *.i are
not erased resulting in new error begin appended so that fixed errors do not go away.
Must turn auto build off and do a clean after each save.

LPG error messages are apparently in terms of bytes not characters.
Since my *.g files are UTF8 with many double byte characters, the error markers are way down the file giving no clue where the error is located. Sad

Too bad LPG doesn't include characters or tokens in their error messages.
Re: Two problems using LPG [message #653481 is a reply to message #652461] Wed, 09 February 2011 21:28 Go to previous messageGo to next message
Robert M. Fuhrer is currently offline Robert M. Fuhrer
Messages: 294
Registered: July 2009
Senior Member
On 2/3/11 10:35 PM, Ken Walter wrote:
> When saving a *.g? file with auto build the LPG error files *.i are
> not erased resulting in new error begin appended so that fixed errors do not go away.
> Must turn auto build off and do a clean after each save.

When you say "error" files do you mean the "listing" files, which use a ".l" (letter
ell) file name extension? If not, I have to confess I'm not familiar with such files.

We probably should clean those out at the beginning of each build. Should be simple to
do. The next release will definitely do that.

> LPG error messages are apparently in terms of bytes not characters.
> Since my *.g files are UTF8 with many double byte characters, the error markers are way down the file giving no clue
> where the error is located. :(
> Too bad LPG doesn't include characters or tokens in their error messages.

Ah, yes, I believe it's true that LPG's error messages report byte, not character,
offsets for their locations. I'll have to talk w/ Philippe to see about addressing
that...

Oh, and thanks for the bug reports!

--
Cheers,
-- Bob

--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center

IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
X10: Productive High-Performance Parallel Programming (http://x10.sf.net)
Re: Two problems using LPG [message #653482 is a reply to message #652461] Wed, 09 February 2011 21:30 Go to previous messageGo to next message
Robert M. Fuhrer is currently offline Robert M. Fuhrer
Messages: 294
Registered: July 2009
Senior Member
On 2/3/11 10:35 PM, Ken Walter wrote:
> Too bad LPG doesn't include characters or tokens in their error messages.

Oh, and please voice your concerns here to the LPG mailing list on SourceForge
to make sure Philippe Charles (the primary LPG author) hears you!

--
Cheers,
-- Bob

--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center

IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
X10: Productive High-Performance Parallel Programming (http://x10.sf.net)
Re: Two problems using LPG [message #653493 is a reply to message #653481] Wed, 09 February 2011 23:06 Go to previous message
Robert M. Fuhrer is currently offline Robert M. Fuhrer
Messages: 294
Registered: July 2009
Senior Member
On 2/9/11 4:28 PM, Robert M. Fuhrer wrote:
> On 2/3/11 10:35 PM, Ken Walter wrote:
>> When saving a *.g? file with auto build the LPG error files *.i are
>> not erased resulting in new error begin appended so that fixed errors do not go away.
>> Must turn auto build off and do a clean after each save.
>
> When you say "error" files do you mean the "listing" files, which use a ".l" (letter
> ell) file name extension? If not, I have to confess I'm not familiar with such files.
>
> We probably should clean those out at the beginning of each build. Should be simple to
> do. The next release will definitely do that.

I just chatted w/ Philippe, and he says:

1) There are no *.i files, only *.l files.
2) The listing files are cleared out on each invocation.

So can you confirm that you're really seeing the messages accumulate in these files?
And, if so, can you give us a few more details as to platform, IMP and LPG versions,
and so on?

>> LPG error messages are apparently in terms of bytes not characters.
>> Since my *.g files are UTF8 with many double byte characters, the error markers are way down the file giving no clue
>> where the error is located. :(
>> Too bad LPG doesn't include characters or tokens in their error messages.
>
> Ah, yes, I believe it's true that LPG's error messages report byte, not character,
> offsets for their locations. I'll have to talk w/ Philippe to see about addressing
> that...

What's happening here is that LPG doesn't actually know or use the encoding of the
input files, since the only characters it treats specially (e.g. ':', '%' and so on)
are valid ASCII characters. So it reads the grammar files and treats bytes as characters,
and it basically works out. Mostly. Except for the issue you discovered wrt the error
messages. Since in LPG's view of the input stream characters are bytes, the positions
that it includes in error messages are always byte offsets. If the grammar file happens
to be encoded in ASCII, everything is fine, since byte offsets are also character offsets.

If, on the other hand, the grammar uses any other encoding, the message positions are
still byte offsets, but the LPG IDE interprets these as character offsets (since that's
basically what the rest of the Eclipse text framework expects). Hence the mismatch.

We've been wanting to fix both the LPG generator and the runtime so that:

- When the input stream uses an encoding like UTF-8 or UTF-16, and it has a proper
"byte order mark" (BOM) at the beginning of the stream (per the UTF standard),
LPG will use the indicated encoding.

- When the input stream has no BOM, the LPG client must indicate in some other way
which encoding to use. In the case of a parser using the LPG runtime, this would
be done via some as-yet unavailable API. In the case of the LPG generator itself,
this would be specified via a cmd-line option.

Once this is done, LPG can report positions in its error messages as proper character
offsets, and the LPG IDE will be able to handle grammar files in non-ASCII encodings.

Again, I'd encourage you to cast your vote over on the LPG SourceForge mailing list
for this enhancement.

--
Cheers,
-- Bob

--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center

IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
X10: Productive High-Performance Parallel Programming (http://x10.sf.net)
Previous Topic:IMP integration/bridge to EMF
Next Topic:Incremental parsing
Goto Forum:
  


Current Time: Mon Sep 22 06:34:34 GMT 2014

Powered by FUDForum. Page generated in 0.06720 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software