Home » Archived » IMP » Two problems using LPG(An auto-build and an error flagging utf8 problem)
|
Re: Two problems using LPG [message #653481 is a reply to message #652461] |
Wed, 09 February 2011 21:28 |
Robert M. Fuhrer Messages: 294 Registered: July 2009 |
Senior Member |
|
|
On 2/3/11 10:35 PM, Ken Walter wrote:
> When saving a *.g? file with auto build the LPG error files *.i are
> not erased resulting in new error begin appended so that fixed errors do not go away.
> Must turn auto build off and do a clean after each save.
When you say "error" files do you mean the "listing" files, which use a ".l" (letter
ell) file name extension? If not, I have to confess I'm not familiar with such files.
We probably should clean those out at the beginning of each build. Should be simple to
do. The next release will definitely do that.
> LPG error messages are apparently in terms of bytes not characters.
> Since my *.g files are UTF8 with many double byte characters, the error markers are way down the file giving no clue
> where the error is located. :(
> Too bad LPG doesn't include characters or tokens in their error messages.
Ah, yes, I believe it's true that LPG's error messages report byte, not character,
offsets for their locations. I'll have to talk w/ Philippe to see about addressing
that...
Oh, and thanks for the bug reports!
--
Cheers,
-- Bob
--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center
IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
X10: Productive High-Performance Parallel Programming (http://x10.sf.net)
|
|
| |
Re: Two problems using LPG [message #653493 is a reply to message #653481] |
Wed, 09 February 2011 23:06 |
Robert M. Fuhrer Messages: 294 Registered: July 2009 |
Senior Member |
|
|
On 2/9/11 4:28 PM, Robert M. Fuhrer wrote:
> On 2/3/11 10:35 PM, Ken Walter wrote:
>> When saving a *.g? file with auto build the LPG error files *.i are
>> not erased resulting in new error begin appended so that fixed errors do not go away.
>> Must turn auto build off and do a clean after each save.
>
> When you say "error" files do you mean the "listing" files, which use a ".l" (letter
> ell) file name extension? If not, I have to confess I'm not familiar with such files.
>
> We probably should clean those out at the beginning of each build. Should be simple to
> do. The next release will definitely do that.
I just chatted w/ Philippe, and he says:
1) There are no *.i files, only *.l files.
2) The listing files are cleared out on each invocation.
So can you confirm that you're really seeing the messages accumulate in these files?
And, if so, can you give us a few more details as to platform, IMP and LPG versions,
and so on?
>> LPG error messages are apparently in terms of bytes not characters.
>> Since my *.g files are UTF8 with many double byte characters, the error markers are way down the file giving no clue
>> where the error is located. :(
>> Too bad LPG doesn't include characters or tokens in their error messages.
>
> Ah, yes, I believe it's true that LPG's error messages report byte, not character,
> offsets for their locations. I'll have to talk w/ Philippe to see about addressing
> that...
What's happening here is that LPG doesn't actually know or use the encoding of the
input files, since the only characters it treats specially (e.g. ':', '%' and so on)
are valid ASCII characters. So it reads the grammar files and treats bytes as characters,
and it basically works out. Mostly. Except for the issue you discovered wrt the error
messages. Since in LPG's view of the input stream characters are bytes, the positions
that it includes in error messages are always byte offsets. If the grammar file happens
to be encoded in ASCII, everything is fine, since byte offsets are also character offsets.
If, on the other hand, the grammar uses any other encoding, the message positions are
still byte offsets, but the LPG IDE interprets these as character offsets (since that's
basically what the rest of the Eclipse text framework expects). Hence the mismatch.
We've been wanting to fix both the LPG generator and the runtime so that:
- When the input stream uses an encoding like UTF-8 or UTF-16, and it has a proper
"byte order mark" (BOM) at the beginning of the stream (per the UTF standard),
LPG will use the indicated encoding.
- When the input stream has no BOM, the LPG client must indicate in some other way
which encoding to use. In the case of a parser using the LPG runtime, this would
be done via some as-yet unavailable API. In the case of the LPG generator itself,
this would be specified via a cmd-line option.
Once this is done, LPG can report positions in its error messages as proper character
offsets, and the LPG IDE will be able to handle grammar files in non-ASCII encodings.
Again, I'd encourage you to cast your vote over on the LPG SourceForge mailing list
for this enhancement.
--
Cheers,
-- Bob
--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center
IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
X10: Productive High-Performance Parallel Programming (http://x10.sf.net)
|
|
|
Goto Forum:
Current Time: Fri Apr 26 14:13:23 GMT 2024
Powered by FUDForum. Page generated in 0.03035 seconds
|