Home » Modeling » TMF (Xtext) » Partial parsing of large files
Partial parsing of large files [message #735310] |
Tue, 11 October 2011 14:07 |
Moritz Messages: 22 Registered: July 2011 |
Junior Member |
|
|
Hi,
Description:
I use Xtext to build an User interface for something what is like a state machine. Xtext helped very much so far to parse the description of the machine. I have the following usecase now:
The file with the simulation results of the state machine is text-based, and very large (let's say 100k to millions of lines). The content is very simpel, though. It contains a header, many entries of the same format (for different steps) and an index:
Model:
'#Header' header = Header
'#Data' entries += Entry*
'#Table' index = Index
;
There are no references whatsoever. I started to use Xtext for this file, too, because it helps very much to build the parser. I currently try to use the generated parser to read the Header and the Table. As a second step, I want to parse several items on different positions of the file. I can't hold the complete tree in memory and try to parse single Entries.
Question:
As stated in the FAQ, I provided the file as a strem so that I can set the position of the file arbitrarily (and thereby start the parsing at any point). I can also start the parser with a given Rule (I understand that this is not what Xtext is intended for, but it is still helpful). When the parsing is done, I get the expected Entry model back. There is an error message "missing EOF" that I could accept, too. The problem is that the parser seems to parse an enormous amount of text after my Entry rule (more than a few 1000, maybe everything), even if the rule has been already perfectly parsed. I read about this problem (probably the lookahead) here and I wonder, if my goal can be achieved with the files generated by Xtext.
Does partial parsing helps me here? I couldn't bring it to work until now...
The manipulation of the initial lookahead didn't help either.
Maybe I can inject an EOF in the stream, but then I would have to know when the passed Rule (an Entry) was parsed succesfully.
I read that there is a great speed enhancement in 2.0.1, but I don't think it will help in my case.
I would appreciate any suggestions here.
Thank you!
[Updated on: Tue, 11 October 2011 14:15] Report message to a moderator
|
|
|
Re: Partial parsing of large files [message #735336 is a reply to message #735310] |
Tue, 11 October 2011 14:27 |
Sebastian Zarnekow Messages: 3118 Registered: July 2009 |
Senior Member |
|
|
Hi Moritz,
parsing is done in two steps. The first part is lexing where the
complete stream is read into memory and split into tokens. There is no
communication channel from the parser to the lexer thus the parser
cannot indicate when to stop lexing. If it possible to apply some
heuristics to cut the trailing parts of the stream prior to passing it
to the parser, that would help.
Best regards,
Sebastian
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com
On 11.10.11 16:07, Moritz wrote:
> Hi,
>
> Description:
> I use Xtext to build an User interface for something what is like a
> state machine. Xtext helped very much so far to parse the description of
> the machine. I have the following usecase now:
>
> The file with the simulation results of the state machine is text-based,
> and very large (let's say 100k to millions of lines). The content is
> very simpel, though. It contains a header, many entries of the same
> format (for different steps) and an index:
>
> Model:
> '#Header' header = Header
> '#Data' entries += Entry*
> '#Table' index = Index
> ;
>
> There are no references whatsoever. I started to use Xtext for this
> file, too, because it helps very much to build the parser. I currently
> try to use the generated parser to read the Header and the Table. As a
> second step, I want to parse several items on different positions of the
> file. I can't hold the complete tree in memory and try to parse single
> Entries.
>
> Question:
> As stated
> http://wiki.eclipse.org/Xtext/FAQ#How_do_I_load_my_model_in_a_standalone_Java_application.C2.A0.3F,
> I provided the file as a strem so that I can set the position of the
> file arbitrarily. I can also start the parser
> http://www.eclipse.org/forums/index.php/mv/msg/242621/730622/#msg_730622
> (I understand that this is not what Xtext is intended for, but it is
> still helpful). When the parsing is done, I get the expected Entry model
> back. There is an error message "missing EOF" that I could accept, too.
> The problem is that the parser seems to parse an enormous amount of text
> after my Entry rule (more than a few 1000, maybe everything), even if
> the rule has been already perfectly parsed. I read about this problem
> (probably the lookahead)
> http://www.eclipse.org/forums/index.php/mv/msg/17750/59272/#msg_59272
> and I wonder, if my goal can be achieved with the files generated by Xtext.
>
> Does partial parsing helps me here? I couldn't bring it to work until
> now...
> The manipulation of the initial lookahead didn't help either.
> Maybe I can inject an EOF in the stream, but then I would have to know
> when the passed Rule (an Entry) was parsed succesfully.
> I read that there is a great speed enhancement in 2.0.1, but I don't
> think it will help in my case.
>
> I would appreciate any suggestions here.
> Thank you!
|
|
|
Re: Partial parsing of large files [message #735364 is a reply to message #735336] |
Tue, 11 October 2011 15:34 |
Moritz Messages: 22 Registered: July 2011 |
Junior Member |
|
|
Sebastian Zarnekow wrote on Tue, 11 October 2011 10:27Hi Moritz,
parsing is done in two steps. The first part is lexing where the
complete stream is read into memory and split into tokens. There is no
communication channel from the parser to the lexer thus the parser
cannot indicate when to stop lexing. If it possible to apply some
heuristics to cut the trailing parts of the stream prior to passing it
to the parser, that would help.
Best regards,
Sebastian
Hi Sebastian,
thank you for the explanation. I thought the lexer passes the tokens directly to the parser.
Yes, there is a possible and simple heuristic: One item is between 15 and 30 lines long, so I could simply cut off the file stream after some characters. The Lexer would stop, the parser returns the object and I could examine (or ignore) the remaing parsing errors.
I think this is what you suggested. Thank you for that, I will go for it.
Anyway, I thought there may be a better solution? If I would take a hand written parser, afaik I would be able to parse single rules. Can I use the generated model on another way to call a parser?
Best regards,
Moritz
|
|
| | | | |
Goto Forum:
Current Time: Fri Apr 26 07:15:36 GMT 2024
Powered by FUDForum. Page generated in 0.03403 seconds
|