Init string parsing: Design

Parse Trees:


Here is the UML (in the attached PDF file) for the parse tree design.   ( Init String UML Diagram )


-----------------------------------------------------------------------------

NOTE: This is a conglomeration of the notes from the ve-dev mailing list. It is not necessarily complete nor easy to follow.


The parse tree design. The parse tree will be modeled after the AST node tree. It will be in EMF. This way it can be modeled and serialized out. It won't be an exact duplicate of the AST structure because there are far too many nodes in AST for what we need. We will take the necessary handful, plus some extensions, and what we use in the VE.

One necessary extension would be for this example:

JPanel jPanel1 = new JPanel();
jPanel1.setBackground(jPanel1.getForeground());

When having the parse tree for the background property, the AST node would have "jPanel1" as the receiver of the getForeground() method. When trying to instantiate this in the VE as a remote proxy object "jPanel1" means nothing. So there will be a different node that points to an EMF object instead. So when the AST tree is converted into the EMF tree the node would be converted into the EMF node with a pointer to the EMF object that in the model is the same as jPanel1. The instantiation code can find the proxy for the EMF object because it knows about that, but it doesn't know anything about a variable called "jPanel1".


As a start I'm going through all of the test cases we use for testing the init string parser. The parse tree should be able to handle these cases as a subset of what it can do. I'm going to see what minimum set of AST nodes will be needed for this, and see if it can be toned down or combined in some way to make not so numerous. Then we will address the new interesting cases, such as references to local variables or methods.

For the test cases, here are the unique cases showing the minimal set of nodes used:

TestCase                                                                Nodes

"Frog/123"                                                                StringLiteral(Frog/123)
String.valueOf(10)                                        MethodInvocation
                                                                                expression: SimpleName(String) (i.e. the receiver of the method)
                                                                                methodName: SimpleName(valueOf)
                                                                                arguments:
                                                                                        NumberLiteral(10) (Interesting to note, it is the string "10" not the value 10. May need to decide how to parse this into an actual number).
"Frog\"prince\"123"                                                StringLiteral(Frog"prince"123) (Interesting, there is getEscapedValue() which is the original value, and getLiteralValue which turns the escapes into the actual characters)
"Frog\\prince\\123"                                                StringLiteral(Frog\prince\123)
"Frog".length()                                                        MethodInvocation
                                                                                expression: StringLiteral(Frog)
                                                                                methodName: SimpleName(length)

'a'                                                                        CharacterLiteral(a) (Same interesting thing as StringLiteral, escaped versus actual value)
new Character('a')                                                ClassInstanceCreation
                                                                                name: SimpleName(Character)
                                                                                arguments:
                                                                                        CharacterLiteral(a)

'asdf'                                                                Never get an expression because it is invalid, an IProblem is created. How do we handle this?
null                                                                        NullLiteral
false                                                                        BooleanLiteral(false)
Boolean.TRUE                                                                QualifiedName(SimpleName(Boolean), SimpleName(TRUE)) This is interesting because this is actually a field access. However AST can't tell. According to docs AST may return either a FieldAccess or QualifiedName, undefined which
10                                                                        NumberLiteral(10)
(short)10                                                                 CastExpression
                                                                                type: PrimitiveType(PrimitiveTypeCode(short))
                                                                                expression: NumberLiteral(10)
(short)-10                                                                CastExpression
                                                                                type: PrimitiveType(PrimitiveTypeCode(short))
                                                                                expression: PrefixExpression
                                                                                        operator: PrefixExpression.Operator(MINUS)
                                                                                        operand: NumberLiteral(10)
10d                                                                        NumberLiteral(10d) Again not evaluated at all.
-10d                                                                        PrefixExpression (interesting, not a NumberLiteral that is negative)
                                                                                operator: PrefixExpression.Operator(MINUS)
                                                                                operand: NumberLiteral(10d)
new Float((float)10)                                                ClassInstanceCreation
                                                                                name: SimpleName(Float)
                                                                                arguments:
                                                                                        CastExpression
                                                                                                type: PrimitiveType(PrimitiveTypeCode(float))
                                                                                                expression: NumerLiteral(10)
new Float( (float) 10 )                                                Same as above, the spaces are ignored.
(String)null                                                        CastExpression
                                                                                type: SimpleType(String)
                                                                                expression: NullLiteral
new javax.swing.JLabel( (String) null)                        ClassInstanceCreation
                                                                                name: QualifiedName(QN(SN(javax),SN(swing)), SimpleName(JLabel))
                                                                                arguments:
                                                                                        CastExpression
                                                                                                type: SimpleType(String)
                                                                                                expression: NullLiteral
(java.lang.String)org.eclipse.jem.tests.proxy.initParser.NavigationParameters.getReversed("Frog")
                                                                        CastExpression
                                                                                type: SimpleType(java.lang.String)
                                                                                expression: MethodInvocation

                                                                                        expression: QualifiedName(org.eclipse.jem.tests.proxy.initParser.NavigationParameters)
                                                                                        methodName: SimpleName(getReversed)
                                                                                        arguments:
                                                                                                StringLiteral(Frog)
new javax.swing.table.DefaultTableModel(){}                ClassInstanceCreation
                                                                                name: QualifiedName(javax.swing.table.DefaultTableModel)

                                                                                anonymousClassDecoration: AnonymousClassDecoration ... (includes body declaration etc. We don't currently support this, but we do throw specific exception so it can be noted)
new org.eclipse.jem.tests.proxy.initParser.NavigationParameters().set((float)12,(float)24,(float)50)
                                                                        MethodInvocation
                                                                                expression: ClassInstanceCreation

                                                                                        name: QualifiedName(org.eclipse.jem.tests.proxy.initParser.NavigationParameters)
                                                                                name: SimpleName(set)
                                                                                arguments: (3 standard cast expression like for (float) 10 )
((new org.eclipse.jem.tests.proxy.initParser.NavigationParameters(3)).setElemAt("accountStatementDetails",0))
                                                                        ParenthesizedExpression
                                                                                expression: MethodInvocation
                                                                                        expression: ParenthesizedExpression

                                                                                                expression: ClassInstanceCreation (standard type like we have above)
                                                                                        name: SimpleName(setElemAt)
                                                                                        arguments: (StringLiteral and NumberLiteral)
new String[2]                                                        ArrayCreation:
                                                                                type: ArrayType
                                                                                        componentType: SimpleName(String)
                                                                                        dimensions: 1 (i.e. how many [] are there)
                                                                                dimensions: NumberLiteral(2) (i.e. the value filled into each [])
new String[2][2]                                                        ArrayCreation
                                                                                type: ArrayType
                                                                                        componentType: ArrayType
                                                                                                componentType: SimpleName(String)
                                                                                                dimensions: 1
                                                                                        dimensions: 2
                                                                                dimensions: 2 NumberLiterals.
new int[] {-2,3}                                                        ArrayCreation
                                                                                type: ArrayType
                                                                                        componentType: PrimitiveType(int)
                                                                                        dimensions: 1

                                                                                dimensions: (none, interesting because there is one dimension, but nothing specific in it)
                                                                                initializer: ArrayInitializer
                                                                                        expressions:
                                                                                                PrefixExpression(-2)
                                                                                                NumberLiteral(3)
new int[][] { { 2 , -3 } , { 4 , 5 } }                        ArrayCreation:
                                                                                type: ArrayType(int[][])
                                                                                initializer: ArrayInitializer
                                                                                        expressions:
                                                                                                ArrayInitializer(NumberLiteral(2), PrefixExpression(-3))
                                                                                                ArrayInitializer(NumberLiteral(4), NumberLiteral(5))
new org.eclipse.jem.tests.proxy.initParser.NavigationParameters(new int[][] {{1,2,3},{3,4,5}})
                                                                        ClassInstanceCreation (std)
                                                                                arguments: ArrayCreation (std)
new int[3][]                                                        ArrayCreation:
                                                                                type: ArrayType(int[][])
                                                                                dimensions: NumberLiteral(3)
new org.eclipse.jem.tests.proxy.initParser.NavigationParameters().set(false, false)                
                                                                        MethodInvocation:
                                                                                expression: ClassInstanceCreation(std)
                                                                                name: set
                                                                                arguments: 2 BooleanLiterals

So here's the list of used ASTNodes. Expression, StringLiteral, MethodInvocation, SimpleName, QualifiedName, NumberLiteral, CharacterLiteral, ClassInstanceCreation, NullLiteral, CastExpression, PrimitiveType, PrefixExpression, AnonymousClassDeclaration, ParenthesizedExpression, ArrayCreation, ArrayType, and ArrayInitializer.

SimpleName/QualifiedName: I don't see a need for us to distinquish between the two in our nodes. The bigger problem is that we get these, except for method name, in place of a class name. What really want for these is Type. So I think we don't need Simple/QualifiedName at all. Just a string attribute for MethodInvocation method name is sufficient. And all of the other uses we want a TypeLiteral instead.

ParenthesizedExpression: We don't need this for allocation purposes, but code gen would require it so that the parenthesis can be added to the generated code. It's not needed for allocation because the expression it wraps is already singled out in the node. But the parenthesis are important for the code so that the wrapped expression can be deduced correctly.

PrefixExpression: Aren't needed when dealing with just number literals and plus/minus, like -10, but they are needed if we intend to deal with expressions like - (3+4). There are also these kinds of prefix operations: (+, -, ++, --, ~,  and !). "++" and "--"  are problematic because they need to work against primitive variables. I don't think we can currently support this concept because that puts in a severe order dependency and we don't have the concept of a primitive variable where the object itself can be changed (i.e. assignment, other than the original assignment making it a part). My suggestion is that on parsing that PrefixExpressions that have a number literal as the operand and operations of plus or minus, that it just gets turned into a NumberLiteral with the plus/minus attached to the literal. This will be easier to read the xmi and to process since the number converters already can handle plus/minus as part of the string. I would leave "~" and "!" and leave it as prefix expression even if the operand is a literal. That is because even though we could easly turn the operand into the appropriate literal, we actually need to know the operation for code generation purposes (i.e. if it was "~0x00", then we shouldn't change it to "0xff" even though they are equivalent because codegeneration would then put out "0xff". And this could make it harder to read because the first form is often used in bit flags and it makes it clearer what bits are being affected).

PrimitiveType/Type/ArrayType: Not sure if we need to distinquish between the three. Need to see what it looks like if we don't and then add back in if we do. This is because for codegen, the string describing the type is sufficient, same for allocation. We can easily tell the difference just from the string. In fact we know exactly where Types need to show and we can simply use EMF string attributes instead. Type's will never be an expression.

TypeLiteral: However a type literal is different. This is something like get(XYZ.class). In here XYX.class is an expression, so the whole thing is a TypeLiteral. We haven't used that as a testcase, but we should allow it.

AnonymousDeclaration: This is a tuffy. We didn't actually support this, we just recognized it and didn't allow it. This was to distinquish from some generic too complicated msg. Need to think about this. It's not even really an expression, it just occur inside of an expression, so we may see it.

Here are the nodes of interest that we currently don't handle: ArrayAccess, Assignment, ConditionalExpression, FieldAccess, InfixExpression (e.g. 4 + 3), InstanceOfExpression, PostFixExpression, SuperFieldAccess, SuperMethodInvocation, ThisExpression, VariableDeclarationExpression. These are all expressions. There are also statements and declarations.

For the other expressions, I don't think we can handle, at least for now:

Assignment: This is of the form  (3 + (x =4)). For the same reason we can't handle ++x. We may want to allow it but only enable the expression evaluation on the right and ignore the 'x=' part. None of our property editors or palette entries would probably ever create something of this form. Though it may come in from parsing the code. One possibility is that on parsing the code, the Assignment expression is turned into just the expression that is on the right. We won't be roundtripping of the code, any changes from the model side will create an entire new expression which won't include the assignment.

PostFixExpression: Because this is of type x++ For the same reason we can't handle ++x.
SuperFieldAccess: This is of the form super.XYZ. Not sure how we can handle this.
SuperMethodInvocation: This is of the form super.xyz(). Not sure how we can handle this.
VariableDeclationExpression: Not necessary to handle. This only occurs in for() statements and since we don't process those.


---------------------------------------------------------------------------


Some changes from the previous note are:

  • We have a special expression node called "InvalidExpression". This will be used with a message in it whenever the expression is too complicated. Yup, we still can get too complicated expressions. Though they can be parsed, there is no way for us to evaluate them. For example an anonymous declaration.
  • We're not supporting anonymous declaration. Instead, if this is seen during AST conversion we will create an InvalidExpression with a message that we don't handle anonymous classes.
  • We created an expression node called "Name". This is used in place of SimpleName and QualifiedName. Turns out we still need these because they can show up as expressions.
  • Types aren't a separate node. Turns out wherever a type was required was spelled out explicitly and we could simply use an EMF string attribute instead as the type. We already have code that is smart enough to take a string and figure out quickly if it is a primitive, class, or array class.