Eclipse Community Forums: TMF (Xtext) » Linux kernel (ARM) dts parser.

Help

Home

Home » Modeling » TMF (Xtext) » Linux kernel (ARM) dts parser.(Convert lex/yacc grammar to Xtext)

Show: Today's Messages :: Show Polls :: Message Navigator

Linux kernel (ARM) dts parser. [message #1706331]

Tue, 25 August 2015 22:36

Mauro Condarelli

Messages: 428
Registered: September 2009

Senior Member

I am trying to convert the dts parser found in Linux kernel "DTC" (Device Tree Compiler) to Xtext in order to implement a sensible editor, if possible.

I am working with available sources (https://git.kernel.org/cgit/utils/dtc/dtc.git/tree/dtc-lexer.l and https://git.kernel.org/cgit/utils/dtc/dtc.git/tree/dtc-parser.y).

Problem is I get a zillion of left-recursive errors (and a fair share of "An unassigned rule call is not allowed, when the 'current' was already created.", but I hope to understand how to handle those) and I do not really want to modify too much the grammar to avoid losing sync with a moving target.

I am aware of the possible use of "backtrack=true" option, but even enabling that does not solve many problems.

What is the suggested course of action here?

Note: I know much better lex/yacc than antler/Xtext (to be precise: this is the second project where I use Xtext and the first one did not have strict requirements on the grammar).

Please advise.
Best Regards
Mauro

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706377 is a reply to message #1706331]

Wed, 26 August 2015 11:38

Stefan Oehme

Messages: 159
Registered: April 2010
Location: Kiel

Senior Member

Hi Mauro,

without seeing your Xtext grammar there is now way we can help you.

Cheers,
Stefan

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706447 is a reply to message #1706377]

Wed, 26 August 2015 22:35

Mauro Condarelli

Messages: 428
Registered: September 2009

Senior Member

Thanks,
I found a semi-formal description of the grammar, so I'm rewriting from scratch, without trying to convert the lex/yacc code.
That seems much easier (and more readable too!).

A couple of specific questions (should I open separate questions?):

- Is it possible to insert "semantic" rules?
(i.e.: define "byte" as "unsigned int < 256")

- Is it possible to forbid whitespace between two tokens?
(i.e.: addr: '@'UNSIGNEDINT; with no space after the '@')

TiA
Mauro

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706470 is a reply to message #1706447]

Thu, 27 August 2015 07:11

Stefan Oehme

Messages: 159
Registered: April 2010
Location: Kiel

Senior Member

Hi Mauro,

I guess at first you were trying to copy-paste the yacc-grammar? That won't work since Xtext has a different syntax. Doing it from scratch is the way to go =)

Whitespace: To control which tokens are ignored inside a rule, use the "hidden"-clause. e.g.

Address hidden(): //<-- empty argument list means no tokens are hidden
  '@' UNSIGNEDINT
;

Semantic rules are handled in your language's validator. Keep your grammar simple and lenient. That makes it easier to maintain and allows you to give more meaningful error messages.

Cheers,
Stefan

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706551 is a reply to message #1706470]

Thu, 27 August 2015 15:48

Mauro Condarelli

Messages: 428
Registered: September 2009

Senior Member

Hi Stefan,
Thanks for Your answer.

Stefan Oehme wrote on Thu, 27 August 2015 03:11

Hi Mauro,
I guess at first you were trying to copy-paste the yacc-grammar?

No, I'm aware of syntactic differences, but I was trying a "one to one" translation to minimize chances to implement "something different".

Unfortunately there's no real description of the language (at least not one that matches yacc code). You are urged to peruse the sources (links in my first message) and confront them with the "specification" I found (https://www.power.org/wp-content/uploads/2012/06/Power_ePAPR_APPROVED_v1.1.pdf --> "Appendix A Device Tree Source Format (version 1)").

Yacc code is (IMHO) very messy and difficult to interpret (and definitely left-recursive!).
"Specification" seems to fall very short.

Quote:

Doing it from scratch is the way to go =)

That's what I'm doing now, but I fear I will diverge from yacc code (which is the "Ultimate Gold Standard", since it belongs to the compiler that will be used to compile the files!).

Any advice welcome.
TiA
Mauro

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706557 is a reply to message #1706551]

Thu, 27 August 2015 16:41

Stefan Oehme

Messages: 159
Registered: April 2010
Location: Kiel

Senior Member

I guess reverse engineering from the yacc grammar and lots of examples will be your best bet. And a big test suite =)

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706572 is a reply to message #1706557]

Thu, 27 August 2015 18:33

Mauro Condarelli

Messages: 428
Registered: September 2009

Senior Member

Thanks Atefan
I guessed that and I have lots of "real life examples" in Linux kernel Wink

I have troubles to understand what's going on, though.

My current (VERY sketchy) grammar is:

grammar it.condarelli.devicetree.DeviceTree hidden(WS, ML_COMMENT, SL_COMMENT)// with org.eclipse.xtext.common.Terminals

generate deviceTree "http://www.condarelli.it/devicetree/DeviceTree"

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

DeviceTree:
	(includes+=include)* '/dts-v1/' ';' (includes+=include)* memresl=memresl '/' '{' node=node '}' ';';

preproc:
	include;

include:
	EOL '#' 'include' (includesys | includeusr) EOL;

includesys:
	'<' FILESPEC '>';

includeusr:
	'"' FILESPEC '"';

memresl:
	{memresl} (memresl+=memres)*;

memres:
	'/memreserve/' address=INT32 length=intval ';';

node:
	propdefl=propdefl childl=childl;

propdefl:
	{propdefl} (propdefl+=propdef)*;

propdef:
	{propdef} label? NAME ('=' value=value)?;

value:
	array | v64 | {value} STRING | bytestring;

v64:
	'<' first=v32 second=v32 '>';

v32:
	INT32;

array:
	'<' (array+=byte)+ '>';

bytestring:
	'[' (bytestring+=INT8)+ ']';

childl:
	{childl} (childl+=child)*;

child:
	label NAME address '{' node '}';

byte:
	INT8;

word:
	INT16;

intval:
	INT8 | INT16 | INT32;

address hidden():
	'@' INT32;

label hidden():
	NAME ':';

//============================================= TERMINALS ==

terminal STRING:
	'"' ('\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\' | '"'))* '"';

terminal INT8:
	('0' ('x' | 'X'))? HEXDIGIT HEXDIGIT;

terminal INT16:
	('0' ('x' | 'X'))? HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT;

terminal INT32:
	('0' ('x' | 'X'))? HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT;

terminal HEXDIGIT:
	('0'..'9' | 'a'..'f' | 'A'..'F');

terminal NAME:
	('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*;

terminal SPEC:
	('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-')+;

terminal FILESPEC:
	(SPEC | '/' | '.')+;

terminal EOL:
	('\r'? '\n')?;

//terminal TOEOL:
//	!('\n' | '\r')* ('\r'? '\n')?;

//terminal ID  		: '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//terminal INT returns ecore::EInt: ('0'..'9')+;
terminal ML_COMMENT	: '/*' -> '*/';
terminal SL_COMMENT 	: '//' !('\n'|'\r')* ('\r'? '\n')?;
terminal WS			: (' '|'\t'|'\r'|'\n')+;
terminal ANY_OTHER: .;

and I am testing it with the following excerpt:

/*
 * aks-cdu.dts - Device Tree file for AK signal CDU
 */

/dts-v1/;

#include "ge863-pro3.dtsi"

/ {
	chosen {
		bootargs = "console=ttyS0,115200 ubi.mtd=4 root=ubi0:rootfs rootfstype=ubifs";
	};

	clocks {
		slow_xtal {
			clock-frequency = <32768>;
		};
	};
};

It gives me an error at >>#<<include ... saying: "mismatched input '#' expecting '/'"
BUT if I comment out the line (//#include ...) THEN I get an error at >>/<< {... saying "mismatched input '/' expecting '#'" !?!

It seems I am doing something very wrong.
Can You help me understanding?

TiA
Mauro

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706623 is a reply to message #1706572]

Fri, 28 August 2015 08:30

Stefan Oehme

Messages: 159
Registered: April 2010
Location: Kiel

Senior Member

Hey Mauro,

here are some suggestions:

- don't try to parse whitespace in a rule that ignores whitespace (if the whitespace is significant to the semantics, it should not be hidden. If it is just a coding convention, you can add a check in the validator)
- use terminals sparingly, just have one integer type and check value ranges in the validator (For example, 111 is a valid 8-bit integer, but would have been rejected by your grammar)
- the STRING rule clashed with the UserInclude rule. This is because as soon as the lexer sees a '"', it will consume a STRING token, but the UserInclude rule expected something different.
- don't nest you rules too deep. Every nested rule call will also lead to a nested element in the AST, making it hard to navigate (the childl and memresl rules were superflous for instance)
- the grammar was missing some expected characters like ';', so it couldn't match the example
- try to follow the Xtext naming conventions (upper camel for parser rules, upper underscore for terminal rules), that makes it easier for people to help you

See below a greatly simplified grammar that successfully matches your example. I made it inherit from terminals and use the rules from terminals as much as possible, as that gives you value converters and syntax highlighting out of the box. For your own types like SysFile and HEX, you will need to write value converters so that your AST contains the values you would expect. For instance, you probably want your HEX values converted to integers in the AST.

Cheers,
Stefan

grammar it.condarelli.devicetree.DeviceTree with org.eclipse.xtext.common.Terminals

generate deviceTree "http://www.condarelli.it/devicetree/DeviceTree"

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

DeviceTree:
	(includes+=Include)* '/dts-v1/' ';' (includes+=Include)* (memres+=Memres)* '/' node=Node;

Include:
	'#' 'include' file=FileSpec;

Memres:
	'/memreserve/' address=Number length=Number ';';

Node:
	{Node} '{' (properties+=Property)* '}' ';';

Property:
	CompoundProperty | SimpleProperty;

CompoundProperty:
	label=Label? name=ID address=Address? node=Node;

SimpleProperty:
	label=Label? name=ID ('=' value=Value)? ';';

Value:
	Array | StringValue | Bytestring;

StringValue:
	value=STRING;

Array:
	'<' components+=Number+ '>';

Bytestring:
	'[' bytes+=Number+ ']';

Address hidden():
	'@' Number;

Label hidden():
	ID ':';

Number:
	INT | HEX;

FileSpec:
	SysFile | UserFile;

SysFile:
	'<' (ID | '/' | '.')* '>';

UserFile:
	STRING;

terminal ID:
	('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '-' | '0'..'9')*;

terminal HEX:
	('0x' | '0X') ('0'..'9' | 'a'..'f' | 'A'..'F' | '_')+;

[Updated on: Fri, 28 August 2015 08:31]

Report message to a moderator

Re: Linux kernel (ARM) dts parser. [message #1706702 is a reply to message #1706623]

Fri, 28 August 2015 20:08

Stefan Oehme

Messages: 159
Registered: April 2010
Location: Kiel

Senior Member

By the way, the #include rule was giving me lots of headaches, since it'll always clash with Strings and Arrays. And then I realized, that #include is not part of your language per se, but just taken from the C preprocessor.

So instead I made includes a terminal and added them to the hidden tokens. This way they no longer conflict with other rules and they can appear at any point in the file (before they were only allowed at specific points at the top of the file)

terminal INCLUDE:
	'#include' -> '\r'? '\n'
;

Report message to a moderator

Previous Topic:	Changes on .xtend file crashes the workspace
Next Topic:	Combining two grammars problem

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 25 21:18:04 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter