Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Whitespace handling



On Tue, 9 Mar 2021 at 04:06, Sylvain Leroux <sylvain@xxxxxxxxxxx> wrote:
On 07/03/2021 02:11, Lex Trotman wrote:> On Fri, 5 Mar 2021 at 23:30,
Sylvain Leroux <sylvain@xxxxxxxxxxx
> <mailto:sylvain@xxxxxxxxxxx>> wrote:
>
>
>
>     On 03/03/2021 02:14, Lex Trotman wrote:
>     > Interesting question since the spacing is context, not part of the
>     > markup itself, just like the character on the other side.
>     Correct. In my own experiments, to identify the context, I used the
>     lookahead/lookbehind features of the Parsing _expression_ grammar
(PEG --
>     [1]) I implemented. This adds some "context sensibility" on top of an
>     otherwise context-free grammar.
>
>
>     [1]: http://www.inf.puc-rio.br/%7Eroberto/docs/peg.pdf
>
>
> Yes, I'm also using a PEG, but for my experiments I separated the lexer
> (LEG) and the parser (PEG) so there can be a clear table of markup
> tokens.  Those tokens do indeed have copious uses of previous and
> following non-consuming operators :-).

Ahh, I see I wasn't clear enough, I use PEG for the specification, not the implementation, the lexer is implemented as simply a boringly huge C++ switch and the parser is a set of recursive C++ functions that basically mirrors the PEG.  Remember mine is an experiment, so using plain code allows me to play with all sorts of things, several of which turned out to be bad ideas, but now I know that :-).

Of course since the PEG is only text in the spec its cheap for me to extend it :-)
 
>
> I have found I needed to extend the PEG to handle nesting of sections
> and lists without writing out a limited depth set, how do you address it?
>
For now, I use two separate PEGs. One is for the inline parser, the
other one for block-level parsing.

The PEG for the inline parser is quite stable, and I don't encounter
significant difficulties when adding new features.

The PEG for the block-level parsing is a different beast, though. It
replaces a (multiple times rewrote) hand-written parser. For now, it is
used as a tokenizer rather than a recursive descent parser. And the
actual block hierarchy construction is delegated to a stateful object in
the spirit of the factory method pattern
(https://en.wikipedia.org/wiki/Factory_method_pattern).

I tried to implement a recursive grammar for the block-level parser. But
I struggled at finding a way to match the delimiter in nested blocks
like in:

  ====

   ======

   ======

  ====


Ahh yes another extension I did was that all non-terminals and tokens can return a numeric value (if they match) and I added a simple syntax for assigning that value if its needed, and comparing it to the parameter passed to the non-terminal.  Tokens return a count of relevant characters that varies from token to token but is effectively the level for section/list tokens or length of delimiter for block delimiters.  The (simplified) syntax  for a "section" would be:

Section(level) ::= token_level = <start_line_equals> (:token_level==level:) Markup_text_line Section_contents *Section(level+1)

where <line_start_equals> is the token from the lexer, (:_expression_:) is a test that will fail the non-terminal if it fails, and name= assigns the value to name. These have the obvious implementation in code.

I suspect there is an elegant way of doing that since PEG "can count",
but I yet have to find how. FWIW, I discovered PEG with this project, so
my knowledge of the technology is still fragile.

I'm not aware that a formal PEG (https://en.wikipedia.org/wiki/Parsing_expression_grammar) can count in a way that it can check context, eg number of equals is less, equal or more than current section level.  Of course any implementation of PEG in a programming language probably can use that language for the purpose, but the specification either needs some formal extension or the use of (shudder) words.  I'm in no way pushing my extensions, it just "works for me"(TM).

At the moment the spec and the code are out of sync and the code crashes, I want to fix that before I push to github "soon".

Cheers
Lex
 

Regards,
- Sylvain

_______________________________________________
asciidoc-lang-dev mailing list
asciidoc-lang-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/asciidoc-lang-dev

Back to the top