Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Whitespace handling



On Fri, 5 Mar 2021 at 23:30, Sylvain Leroux <sylvain@xxxxxxxxxxx> wrote:


On 03/03/2021 02:14, Lex Trotman wrote:
> Interesting question since the spacing is context, not part of the
> markup itself, just like the character on the other side.  
Correct. In my own experiments, to identify the context, I used the
lookahead/lookbehind features of the Parsing _expression_ grammar (PEG --
[1]) I implemented. This adds some "context sensibility" on top of an
otherwise context-free grammar.


[1]: http://www.inf.puc-rio.br/%7Eroberto/docs/peg.pdf

Yes, I'm also using a PEG, but for my experiments I separated the lexer (LEG) and the parser (PEG) so there can be a clear table of markup tokens.  Those tokens do indeed have copious uses of previous and following non-consuming operators :-).

I have found I needed to extend the PEG to handle nesting of sections and lists without writing out a limited depth set, how do you address it?
 


>
> Thinking about it, (as well as some defined code points) the non-spacing
> context character must be able to be any Unicode letter code point or it
> prevents the markup being used on some non-English languages, and so I
> don't see why the spacing context should not be any code point with the
> appropriate spacing Unicode property as well.  If non-ASCII context on
> one side is valid, there is no reason it should not be valid on both sides.
The valid spacing/non-spacing character around constrained markups needs
clarifications to me. Especially if we consider non-Latin scripts. This
is something we should discuss in its own thread. Or shouldn't we?

Yes, thread started.

Cheers
Lex
 
PS Sylvain, can you please configure your mailer to only reply to the list otherwise we get two replys of the same mail, and its easy to reply to the wrong one and get off-list

...

Back to the top