|
Init string parsing: Design |
Parse Trees:
Here is the UML (in the attached PDF
file) for the parse tree design.
(
Init String UML Diagram
)
-----------------------------------------------------------------------------
NOTE: This is a conglomeration of the notes from the ve-dev mailing list. It is not necessarily complete nor easy to follow.
The parse tree design.
The parse tree will be modeled after the AST node tree. It will be in EMF.
This way it can be modeled and serialized out. It won't be an exact duplicate
of the AST structure because there are far too many nodes in AST for what
we need. We will take the necessary handful, plus some extensions, and
what we use in the VE.
One necessary extension would be for
this example:
JPanel jPanel1 = new JPanel();
jPanel1.setBackground(jPanel1.getForeground());
When having the parse tree for the background
property, the AST node would have "jPanel1" as the receiver of
the getForeground() method. When trying to instantiate this in the VE as
a remote proxy object "jPanel1" means nothing. So there will
be a different node that points to an EMF object instead. So when the AST
tree is converted into the EMF tree the node would be converted into the
EMF node with a pointer to the EMF object that in the model is the same
as jPanel1. The instantiation code can find the proxy for the EMF object
because it knows about that, but it doesn't know anything about a variable
called "jPanel1".
As a start I'm going through all of
the test cases we use for testing the init string parser. The parse tree
should be able to handle these cases as a subset of what it can do. I'm
going to see what minimum set of AST nodes will be needed for this, and
see if it can be toned down or combined in some way to make not so numerous.
Then we will address the new interesting cases, such as references to local
variables or methods.
For the test cases, here are the unique
cases showing the minimal set of nodes used:
TestCase
Nodes
"Frog/123"
StringLiteral(Frog/123)
String.valueOf(10)
MethodInvocation
expression: SimpleName(String) (i.e. the receiver
of the method)
methodName: SimpleName(valueOf)
arguments:
NumberLiteral(10)
(Interesting to note, it is the string "10" not the value 10.
May need to decide how to parse this into an actual number).
"Frog\"prince\"123"
StringLiteral(Frog"prince"123)
(Interesting, there is getEscapedValue() which is the original value, and
getLiteralValue which turns the escapes into the actual characters)
"Frog\\prince\\123"
StringLiteral(Frog\prince\123)
"Frog".length()
MethodInvocation
expression: StringLiteral(Frog)
methodName: SimpleName(length)
'a'
CharacterLiteral(a) (Same interesting
thing as StringLiteral, escaped versus actual value)
new Character('a')
ClassInstanceCreation
name: SimpleName(Character)
arguments:
CharacterLiteral(a)
'asdf'
Never
get an expression because it is invalid, an IProblem is created. How do
we handle this?
null
NullLiteral
false
BooleanLiteral(false)
Boolean.TRUE
QualifiedName(SimpleName(Boolean), SimpleName(TRUE))
This is interesting because this is actually a field access. However AST
can't tell. According to docs AST may return either a FieldAccess or QualifiedName,
undefined which
10
NumberLiteral(10)
(short)10
CastExpression
type: PrimitiveType(PrimitiveTypeCode(short))
expression: NumberLiteral(10)
(short)-10
CastExpression
type: PrimitiveType(PrimitiveTypeCode(short))
expression: PrefixExpression
operator:
PrefixExpression.Operator(MINUS)
operand:
NumberLiteral(10)
10d
NumberLiteral(10d)
Again not evaluated at all.
-10d
PrefixExpression
(interesting, not a NumberLiteral that is negative)
operator: PrefixExpression.Operator(MINUS)
operand: NumberLiteral(10d)
new Float((float)10)
ClassInstanceCreation
name: SimpleName(Float)
arguments:
CastExpression
type: PrimitiveType(PrimitiveTypeCode(float))
expression: NumerLiteral(10)
new Float( (float) 10 )
Same
as above, the spaces are ignored.
(String)null
CastExpression
type: SimpleType(String)
expression: NullLiteral
new javax.swing.JLabel( (String) null)
ClassInstanceCreation
name: QualifiedName(QN(SN(javax),SN(swing)),
SimpleName(JLabel))
arguments:
CastExpression
type: SimpleType(String)
expression: NullLiteral
(java.lang.String)org.eclipse.jem.tests.proxy.initParser.NavigationParameters.getReversed("Frog")
CastExpression
type: SimpleType(java.lang.String)
expression: MethodInvocation
expression:
QualifiedName(org.eclipse.jem.tests.proxy.initParser.NavigationParameters)
methodName:
SimpleName(getReversed)
arguments:
StringLiteral(Frog)
new javax.swing.table.DefaultTableModel(){}
ClassInstanceCreation
name: QualifiedName(javax.swing.table.DefaultTableModel)
anonymousClassDecoration: AnonymousClassDecoration
... (includes body declaration etc. We don't currently support this, but
we do throw specific exception so it can be noted)
new org.eclipse.jem.tests.proxy.initParser.NavigationParameters().set((float)12,(float)24,(float)50)
MethodInvocation
expression: ClassInstanceCreation
name:
QualifiedName(org.eclipse.jem.tests.proxy.initParser.NavigationParameters)
name: SimpleName(set)
arguments: (3 standard cast expression like
for (float) 10 )
((new org.eclipse.jem.tests.proxy.initParser.NavigationParameters(3)).setElemAt("accountStatementDetails",0))
ParenthesizedExpression
expression: MethodInvocation
expression:
ParenthesizedExpression
expression: ClassInstanceCreation (standard
type like we have above)
name:
SimpleName(setElemAt)
arguments:
(StringLiteral and NumberLiteral)
new String[2]
ArrayCreation:
type: ArrayType
componentType:
SimpleName(String)
dimensions:
1 (i.e. how many [] are there)
dimensions: NumberLiteral(2) (i.e. the value
filled into each [])
new String[2][2]
ArrayCreation
type: ArrayType
componentType:
ArrayType
componentType: SimpleName(String)
dimensions: 1
dimensions:
2
dimensions: 2 NumberLiterals.
new int[] {-2,3}
ArrayCreation
type: ArrayType
componentType:
PrimitiveType(int)
dimensions:
1
dimensions: (none, interesting because there
is one dimension, but nothing specific in it)
initializer: ArrayInitializer
expressions:
PrefixExpression(-2)
NumberLiteral(3)
new int[][] { { 2 , -3 } , { 4 , 5
} }
ArrayCreation:
type: ArrayType(int[][])
initializer: ArrayInitializer
expressions:
ArrayInitializer(NumberLiteral(2), PrefixExpression(-3))
ArrayInitializer(NumberLiteral(4), NumberLiteral(5))
new org.eclipse.jem.tests.proxy.initParser.NavigationParameters(new
int[][] {{1,2,3},{3,4,5}})
ClassInstanceCreation
(std)
arguments: ArrayCreation (std)
new int[3][]
ArrayCreation:
type: ArrayType(int[][])
dimensions: NumberLiteral(3)
new org.eclipse.jem.tests.proxy.initParser.NavigationParameters().set(false,
false)
MethodInvocation:
expression: ClassInstanceCreation(std)
name: set
arguments: 2 BooleanLiterals
So here's the list of used ASTNodes.
Expression, StringLiteral, MethodInvocation, SimpleName, QualifiedName,
NumberLiteral, CharacterLiteral, ClassInstanceCreation, NullLiteral, CastExpression,
PrimitiveType, PrefixExpression, AnonymousClassDeclaration, ParenthesizedExpression,
ArrayCreation, ArrayType, and ArrayInitializer.
SimpleName/QualifiedName: I don't
see a need for us to distinquish between the two in our nodes. The bigger
problem is that we get these, except for method name, in place of a class
name. What really want for these is Type. So I think we don't need
Simple/QualifiedName at all. Just a string attribute for MethodInvocation
method name is sufficient. And all of the other uses we want a TypeLiteral
instead.
ParenthesizedExpression: We don't
need this for allocation purposes, but code gen would require it so that
the parenthesis can be added to the generated code. It's not needed for
allocation because the expression it wraps is already singled out in the
node. But the parenthesis are important for the code so that the wrapped
expression can be deduced correctly.
PrefixExpression: Aren't needed
when dealing with just number literals and plus/minus, like -10, but they
are needed if we intend to deal with expressions like - (3+4). There are
also these kinds of prefix operations: (+, -, ++, --, ~, and !).
"++" and "--" are problematic because
they need to work against primitive variables. I don't think we can currently
support this concept because that puts in a severe order dependency and
we don't have the concept of a primitive variable where the object itself
can be changed (i.e. assignment, other than the original assignment making
it a part). My suggestion is that on parsing that PrefixExpressions that
have a number literal as the operand and operations of plus or minus, that
it just gets turned into a NumberLiteral with the plus/minus attached to
the literal. This will be easier to read the xmi and to process since the
number converters already can handle plus/minus as part of the string.
I would leave "~" and "!" and leave it as prefix expression
even if the operand is a literal. That is because even though we could
easly turn the operand into the appropriate literal, we actually need to
know the operation for code generation purposes (i.e. if it was "~0x00",
then we shouldn't change it to "0xff" even though they are equivalent
because codegeneration would then put out "0xff". And this could
make it harder to read because the first form is often used in bit flags
and it makes it clearer what bits are being affected).
PrimitiveType/Type/ArrayType:
Not sure if we need to distinquish between the three. Need to see what
it looks like if we don't and then add back in if we do. This is because
for codegen, the string describing the type is sufficient, same for allocation.
We can easily tell the difference just from the string. In fact we know
exactly where Types need to show and we can simply use EMF string attributes
instead. Type's will never be an expression.
TypeLiteral: However a type literal
is different. This is something like get(XYZ.class). In here XYX.class
is an expression, so the whole thing is a TypeLiteral. We haven't used
that as a testcase, but we should allow it.
AnonymousDeclaration: This is
a tuffy. We didn't actually support this, we just recognized it and didn't
allow it. This was to distinquish from some generic too complicated msg.
Need to think about this. It's not even really an expression, it just occur
inside of an expression, so we may see it.
Here are the nodes of interest that
we currently don't handle: ArrayAccess, Assignment, ConditionalExpression,
FieldAccess, InfixExpression (e.g. 4 + 3), InstanceOfExpression, PostFixExpression,
SuperFieldAccess, SuperMethodInvocation, ThisExpression, VariableDeclarationExpression.
These are all expressions. There are also statements and declarations.
For the other expressions, I don't think
we can handle, at least for now:
Assignment: This is of the form
(3 + (x =4)). For the same reason we can't handle ++x. We may want
to allow it but only enable the expression evaluation on the right and
ignore the 'x=' part. None of our property editors or palette entries would
probably ever create something of this form. Though it may come in from
parsing the code. One possibility is that on parsing the code, the Assignment
expression is turned into just the expression that is on the right. We
won't be roundtripping of the code, any changes from the model side will
create an entire new expression which won't include the assignment.
PostFixExpression: Because this
is of type x++ For the same reason we can't handle ++x.
SuperFieldAccess: This is of
the form super.XYZ. Not sure how we can handle this.
SuperMethodInvocation: This is
of the form super.xyz(). Not sure how we can handle this.
VariableDeclationExpression:
Not necessary to handle. This only occurs in for() statements and since
we don't process those.
---------------------------------------------------------------------------
Some changes from the previous note
are:
- We have a special expression node called
"InvalidExpression". This will be used with a message in it whenever
the expression is too complicated. Yup, we still can get too complicated
expressions. Though they can be parsed, there is no way for us to evaluate
them. For example an anonymous declaration.
- We're not supporting anonymous declaration.
Instead, if this is seen during AST conversion we will create an InvalidExpression
with a message that we don't handle anonymous classes.
- We created an expression node called
"Name". This is used in place of SimpleName and QualifiedName.
Turns out we still need these because they can show up as expressions.
- Types aren't a separate node. Turns
out wherever a type was required was spelled out explicitly and we could
simply use an EMF string attribute instead as the type. We already have
code that is smart enough to take a string and figure out quickly if it
is a primitive, class, or array class.