Hi,
I disagree, the preprocessor should just do what the specification asks him to do:
Create preprocessor number tokens.
That is because multiple languages use the same specification for their preprocessing (at least C, C++, ObjectiveC), however each language is free to interpret
the pp-number tokens as they like. I’d like to leave the path open to use it for whatever language you need it for.
The preprocessor itself needs to interpret pp-number tokens within #if preprocessing directives and can use its own algorithm for that. What it has to do is
not 100% specified, however it’s obvious that it shall make an attempt to interpret as many of the integral numbers as possible and generate an error on the others.
However, there is no need to do the interpretation twice: Either it needs to be interpreted by the preprocessor (in #if directives) or by the parser, elsewhere.
Markus.
From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx]
On Behalf Of Richard
Sent: Wednesday, June 13, 2012 7:04 PM
To: CDT General developers list.
Subject: Re: [cdt-dev] Suggestions for dealing with tests
The problem with doing that is then you have to parse the number twice (or more) to determine what kind of number it is. In CDT, first the lexer returns the number possibly with a suffix. Then in the preprocessor, it has to check the suffix
to see if it's a UDL (can't have UDLs in preprocessor directives), then in the parser the number is broken down further to see if the number has the correct format depending if its an int/hex/binary/etc. My proposal is to ignore the suffixes in the lexer,
and return a number token and an identifier. The check for number validity can be done in the lexer, and the checks on the suffix can be done in the preprocessor and/or the parser.
An update on the progress I've made since this discussion began:
- I've moved the lexing of numbers into the lexer
- The lexer doesn't return the suffix as part of the number
- Added parser and scanner options of UDLs
- Added a suffix to CPPASTLiteralExpression
I'm still working through some bugs, but it's mostly there. The only questions I have now, is how I pass more information from the lexer to the parser, because it would be helpful if the lexer could tell the parser if the number is float/hex/int/hexfloat/etc.
And how to use IASTImplicitNameOwner; I'm having trouble finding an example of how to use it.
On 13 June 2012 00:28, Corbat Thomas (tcorbat@xxxxxx) <tcorbat@xxxxxx> wrote:
Hm... Actually, I got it differently, regarding handling of numbers.
The grammar for pp-number (preprocessor numbers) is as follows [lex.ppnumber]:
pp-number:
digit
. digit
pp-number digit
pp-number identifier-nondigit
pp-number e sign
pp-number E sign
pp-number .
Therefore, everything starting with a digit or a dot and a digit is a number. Thus .55.4h5ze+E-5gg
would be a pp-number, even though it will not be convertible into a meaningful floating number.
I guess the way Markus suggested was to rip the part districting the kinds of numbers out of the
preprocessor, leaving it yielding pp-numbers and move it to the parser.
Regarding the tests: At the moment I have 17 tests that error in a suite that tests included files (I forget which it is exactly), and a few that fail because
there aren't any tests in the suite (base classes I think). I'm assuming the latter is because I'm running the suite incorrectly: I'm right-clicking on the package and Run As->jUnit Plugin Test.
For the UDL implementation, I understand what you're saying about not introducing new elements, which is what my first attempt tried to do and I had problems
with, but now I look at it again, I think I understand how you're suggesting to implement it.
Yes, dealing with the suffix is the parsers task. By the standard it’s even the task of the parser to distinguish between the various kinds of number literals.
I don’t exactly know why this is done in the preprocessor. It’d be actually nice to move this task away from it, which would collapse the various kinds of number tokens to one preprocessor number token as described in 2.10.
So, if I understand correctly, the lexer should return two tokens for numbers with suffixes: the number, and an identifier. And it would be the job of the Parser
to combine these into one ASTNode. I agree, that does seem like the best approach, and would mean that the places (I've counted at least two, Lexer.java and CPreprocessor.java) that have more-or-less the same code for parsing numbers can be in just in the
place they belong in the lexer, and the AST token can be used in CPreprocessor to determine if the token is a UDL, by adding and extra method, isUserDefined, and maybe other methods such as getNumberType, which would return INT, HEX, BINARY, etc. The other
method I see that could be useful is isMalformed() for use in CPreprocessor.
I think the code for all this is already written between Corbat and myself, and just needs placing in the right places and testing.
On 11 June 2012 07:08, Corbat Thomas (tcorbat@xxxxxx) <tcorbat@xxxxxx>
wrote:
I had a closer look at the standard and I agree with you:
pp-number contains any number including user defined literals. But for preprocessing the characters
and strings are distinct from user defined characters and strings, considering preprocessing tokens (from translation phase 3).
Looking at translation phase 7 (conversion of preprocessing tokens to tokens) user defined literals
are distinct from the other kinds of literals [lex.literal.kinds], with the subcategories int, float, char and string.
From the description the conversion of the tokens seems to me like a pre-step of phase 7. But I guess
it could be done on the fly, while parsing as well.
Hi,
A user-defined literal is still of kind integer, char, floating-point or string. So it’d be natural
to add the property isUserDefined() to IASTLiteralExpression.
Yes, dealing with the suffix is the parsers task. By the standard it’s even the task of the parser
to distinguish between the various kinds of number literals. I don’t exactly know why this is done in the preprocessor. It’d be actually nice to move this task away from it, which would collapse the various kinds of number tokens to one preprocessor number
token as described in 2.10.
However, the LR-Parsers probably rely on receiving the tokens as they are delivered today. So we
could have an option whether the preprocessor shall classify the number tokens or not. For the GNUCPPSourceParser we could then move the classification into the parser.
Exactly, CPPASTLiteralExpresion.getExpressionType() needs to be changed. Plus, as mentioned before,
by using IASTImplicitNameOwner the literal _expression_ can provide the binding to the operator that is called to create the object denoted by the literal.
Markus.
Hi
Regarding the UDL implementation:
So your suggestion, Markus, is to lex user defined literals as tSTRING, tCHAR, tINTEGER and tFLOATINGPT?
Shall IASTExpression get a new kind for user defined literals?
Recognizing the suffix will be task of the parser then, right?
I guess getExpressionType() in CPPASTLiteralExpression must be extended to return the type of the
resolved literal operator.
That probably makes the implementation a bit easier.
Regards
Thomas
Hi,
The preprocessor is not run in a specific language. In case there needs to be different behavior
for C and C++ you need to introduce an option which should be controlled via IScannerExtensionConfiguration (The GPPLanguage and the GCCLanguage provide different objects for this configuration). The testcase needs to be elaborated to test with the two different
configurations.
For the case, where we allow user defined literals the checkNumber method needs to behave differently.
I also think that the classification of the number literals in the lexer needs to be changed, however it can be done, such that it works independently of whether user-defined literals are allowed or not.
My recommendation is to neither introduce new kinds of tokens nor to create a new IASTNode. I think
it is sufficient to let CPPASTLiteralExpression implement IASTImplicitNameOwner, which allows to provide the references to the implicit function calls.
The tests nested in org.eclipse.cdt.core.suite.AutomatedIntegrationSuite should all pass.
Markus.
From:
cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx]
On Behalf Of Richard
Sent: Saturday, June 09, 2012 10:35 PM
To: CDT General developers list.
Subject: [cdt-dev] Suggestions for dealing with tests
Hello all,
I've been working on getting user-defined literals working syntactically for the past week, I believe I'm in the home stretch as I'm now working through the failing test cases under
org.eclipse.cdt.core.tests.
I have a few questions regarding this.
- Firstly, how do I deal with tests that are supposed to fail in C mode, but pass in C++ mode?
The current test that demonstrates this is PerprocessorTests.testGCC43BinaryNumbers.
There are 5 binary literals that are tested for failing: 0b012, 0b01b, 0b1111e01, 0b1111p10, 0b10010.10010
With UDLs the first and the last of these should fail, and the middle three can be considered binary literals with UDL suffixes. But in C mode, or C++ sans UDL, they all should
fail. Is there a way to test which language the current test is being run in?
- Lastly, I've been testing my branch against master, and I've noticed there are a fair number of tests with errors or failing. Is this expected, or do I have my project set up
incorrectly?
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev
|