Hello Mike,
Thanks for the explanation. The @ sign now
arrives as an identifier to the token mapper class and is translated to the
proper LPG lexer token. The ScannerExtensionConfiguration used by the LR parser
must be copied over to the new project just to have supportAtSignInIdentifiers()
return true – the original class in the LR parser cannot be extended, and
I have seen that it adds some macros.
But still, this is a workaround for this
specific case. I cannot say if the ‘@’ char example is the only one,
or if the Lexer could also prevent other character sequences to be sent to the
parser.
Are there any plans to unify lexer and
parser in the future? I feel that being forced to use a different lexer partly
defeats the purpose of the LR parser project, which I really like. Consider
that once the token issue was fixed it took me 15 minutes to add the required
rules to the C99 grammar and get my custom extension working properly. I still
have to figure out how to patch the PDOM C99 parser to do the same thing, and
even if I did it would be more difficult to keep the parser updated, and to
ensure adherence to a specific set of rules.
An idea could be to have the Lexer play
nice towards different parsers, by ensuring that every character in the input
is passed to the parser as a token – possibly using a generic “unrecognized
element” token. The token mapper class in the LR plugin would then perform
all the further recognition, without the need of configuring the preprocessor
in a specific way.
Would this be a viable approach?
/Mario
From:
cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Mike Kucera
Sent: den 3 mars 2010 17:17
To: CDT General developers list.
Cc: CDT General developers list.;
cdt-dev-bounces@xxxxxxxxxxx
Subject: Re: [cdt-dev] LR parser
and token generation
Your understanding of the situation is exactly
correct. I think normally with LPG you would provide a grammar for both the
lexer and the parser parts. But in our situation we have a preprocessor sitting
between the lexer and the parser which complicates things terribly. So instead
the LR parser reuses the lexer/preprocessor from the CDT core. This is also
necessary because the CPreprocessor class has a lot of critical functionality,
but it does make adding new tokes other than keywords difficult. Worst case
scenario you might have to provide a patch to add support for the new token to
the core.
Since the LR parser is not using a lexer generated by LPG there needs to be a
token map that maps the tokens from the core to the tokens that LPG requires.
If all you need to support is the @ sign then you may be in luck. The CDT lexer
has an option to support @ in identifiers, if this option is turned on then the
@ sign alone should be returned as an identifier token which you can then
intercept and turn into the LPG token type that you want.
Mike Kucera
Software Developer
Eclipse CDT/PTP
IBM Toronto
mkucera@xxxxxxxxxx
"Mario
Pierro" ---03/03/2010 09:32:28 AM---Hello,

From:
|

"Mario
Pierro" <Mario.Pierro@xxxxxxx>
|

To:
|

"CDT General
developers list." <cdt-dev@xxxxxxxxxxx>
|

Date:
|

03/03/2010 09:32 AM
|

Subject:
|

[cdt-dev] LR parser
and token generation
|
Hello,
Another question on LR parser customization...
I am trying to add some custom extensions to the
C99 language as
specified in the LR parser plugin. The extensions
require both
additional keywords and additional grammar rules.
My ILanguage implementation extends the
C99Language class, and provides
the custom C99Parser via its getParser() method.
Additional keywords are
added via a custom ICLanguageKeywords
implementation (as described in
http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg15788.html)
which
extends CLanguageKeywords and adds the new ones.
>From what I understood, my custom parser will
process tokens which have
been produced by the CPreprocessor / Lexer classes
- as the PDOM parser
does - and use a customized version of the
DOMToC99TokenMap class to map
the preprocessor tokens (IToken interface) to the
tokens in the
generated C99Parsersym class.
So if the parser defines new tokens, the
CPreprocessor needs to know
about them as well. If I got it right, this can be
done by having the
language class supply an implementation of
IScannerExtensionConfiguration, which associates
the extended keywords
to token ids in the IExtensionToken interface in
its addKeyword(char[],
int) method.
Alternatively, the lexer can ignore the extensions
altogether, and the
customized DOMToC99TokenMap class can determine if
e.g. an "identifier"
token supplied by the lexer is actually an
"extended keyword" token in
the parser.
A customized LR parser will thus be dependent on
the tokens generated by
the preprocessor, no matter what its grammar
specifies. Circumventing
this might be difficult, some characters might
never be recognized as
the Lexer might not be generating any token at all
(e.g. the '@' char).
I would like to use the same grammar for the lexer
and the parser, so
that the token set is the same.
Is this possible? Am I getting something terribly
wrong here?
Thank you for your help!
/Mario
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev