Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Whitespace handling
  • From: Sylvain Leroux <sylvain@xxxxxxxxxxx>
  • Date: Tue, 9 Mar 2021 09:08:43 +0100
  • Autocrypt: addr=sylvain@xxxxxxxxxxx; keydata= xsFNBFdFUf4BEACl0a/nxBGmY4eqGLMYQTVTaUt+Z7SXkaYiiMx00suDDJpCsE3f6Qet4zaC 1EBBseb0x/164kC92cc8ZV5NN00qOKWEkf05/JrVEFFq4le78l/9yO5GTE9ORnrOEqbYrFYf +3ArkXHnxFmR1SCRyFGKTtgE2nGqbKicQgjOYQFS4DfRVkEyPfKsr7/J1GUUTHu/sD7nnNik +7trfLwva9D6EetRUnd+H/AV6QVw3jhgR9klpKMo7+bXi35IZShnYAN+kvuAvoCQDjv1L2L5 XkOf9gGNLJAdEKbBcK0UiQ80RvO6Vr0FejpA0tmRGGIqB5m6WNxRxpeFhgK32l1+pInjGIP3 1to6xf0+pJWuWL5ZfQq8+8+4J+5ibX/klD5D6b78aNV/B/NTO+wE2B1Umw1JWthnKlTbKLCj t4IvAXsQCJWXi55pyz2S2m2vMd1ffHKPl59jIJzUXy2nM9sQhFTzLeKUZ0V6RBUF9lGDAWwh 3pR0OaIvQzuBEf1qEdLBsjMsI9SJdMY4VOKWMCuSMm+KlaF3jsEPkgu+GymUDCbvv2ZIGwwK kXQbs2gqpicPUKXwiszbgx43wiwpTLQ+6ZRlaoKlbVlHoCC/eO2fMvfasUOJZzLZSHOPPsOr xCtygLrSBx5hLdAA7syJv1GVGQaE8IfQPM7P+5QPHVhgQ/mJEQARAQABzSRTeWx2YWluIExl cm91eCA8c3lsdmFpbkBjaGljb3JlZS5mcj7CwYIEEwEIACwCGyMFCQlmAYAHCwkIBwMCAQYV CAIJCgsEFgIDAQIeAQIXgAUCV+WKiQIZAQAKCRCrWB8dH2HFIpzYD/9KVcvI3xAlR+Ahxlvl AnxzwT1ZIhRT1YPbX3Fwr6l7lBuFfp8sGHejY9XNsGMDM/C4h+GxHKiY87KMLTI2P5TfHy2j MYHW4x2VhXTqOmUMtTO1/4DfamlTF/xwaXTy+jx5Z3ghaZDWWflaNXpbwB1j/gl0TjXCSeiK 7GPGFTPJt04JmTDxuTKXqdwHUpKQSZ5pqdufP2po+W/uxgamRXjHD7z8X04+xK5E7ic5pgaE YtquzZDRfnil3W4GSodX6dKdnhCN2r8tDqV0FsRSp3qRuvzBJ692WCH5FmXmvqiNpVCo+Fj1 T45TYB49yiRAzyJZwgZnEB0vH/HzybPmJC9z3wjPaoFmGOUp2imbHlu3ABWRnqPtdYcbDHBF Mrpop7oFAGxhxxiCGv30eEPYdHWgj0pwgja4Z/dauS1NlHBBAdOtG1ixV0+KgW4mP2RrA8aa epUinq7PydEAS9NoYSeSRaBeFjrZPCS+En6/2jyON5nmlgcnRFbTQWjnhRj5tNXPC/QKNBOd 55m+mZkolkF8wkx44bv+jQ8mmgtQGbrBFF9PAaPidPs4C3t7duIeW8zVXmqFH5lF1KmTsljf j79DhHbz3H5gg1UXFe+NYNVEC3rbTFYkdeuFnAOsWUbXl2B+yJ5KR899aKF5yz6pEWPcwjGk jKOx3wzbebkbVvvHX87BTQRXRVH+ARAAoOcKbTwX/+5hwyqgxF//jDo3eMwQUdXUdi5JkiRA dEmJAlAAAfL6IL03rcrKCViPD9W/hL8coa4uUTko5EXkVFLIvq2Npmlr26lGnE5Ae+L4KHn+ qtUUm5Mg9xjtUoukhYjBv6IDXuONcI1iC93tpTsHbNmqG3QXjRWwVs3cCflZLvpKqoC7cXYt 7bKcb/B7lAD3aYqo+plr6zlqSHKTigGIO64eu/TfcUAQxU+/wGfSv1wekHauvFgRumfPJxU0 s4VLUCtAN9huRuET3iqVRtQk1TayLyZDeryxVJhcMTs6qs2n/9s4aZHRBM1iPbFqZ5YXVF03 ySgCj0fXSZ40PY8tqjMSuowRUSA8979EBMi94j4MLGmBwwbp4P1RaNbvvSyYebr2nV+LPDqc oDEI3BpJDz5PCYJOoKZWc2vTWnCjjzufybhZfzRWfzALupdbKq5XkQwMXxlx40GBngpvXc9P yPp8XkbkeEjx4Z2LWU6SUuZmmzoTDzo7J9KA4X3Shdxjdev8xlhSOCooHre3yi1VfPkeuggn 3JYycrio1uJqGUE01XtKKqmqe0sPNgBA+YyV+QNLsDRzk/qTDvbfjq76onYllZTl5mTEN94B uTmS6vKbqg5wiL9usGzOM9MdLzZ2VEUd2y3FqoUMngNRzpotsTqICNFYTzu7mOr1ji8AEQEA AcLBZQQYAQgADwUCV0VR/gIbDAUJCWYBgAAKCRCrWB8dH2HFIh6ID/9s+rRqmUPJm95gMamc W2qvfXmB60xP+Pcbt9tiJEvHF9PdwfEaREH7DxDrq/URgBJ/EYhcDdKJgOzMzV8dGE/EbuO4 KgpEDwT6P8ZjEhEdGouyPYL9SX0nBoxigI7RCmk+4WJ8S4RNcI6guOgGYKSKo/CdGBQhlhK+ 2PoviUaWpy/pBzMwCr6V74qifu0VS2kneOUYOB5UzI/dOy7akFZl7U1Wk8gtJg+Vcvik+UPg T59MWQU+NVJt2ehllXccjC3ImApufu5Yq4GIFEZ/zmAYCdD4TzgfvknDFC4ibyKkddv+eJHd Vn2bWK24s8f/JekOdOboWEBRPJg1XuGVdiB2o79KOhx42/wxZrnG07+1sUyhcpszruLbGn6H 1sjcPL/ELVoicVB3VcguXw+t3ZrnPSnuwBBNkJsQbA4rcBxbYlHV9BINbaV3W7+7FBnhPMT3 7FZ/xDGcGKlOpQVkuNhP7Awa8DPqPbO63mjnrYhkCQe5ySvNdpMxHVd/j6TWg4XE/fJx+62X NFeLWXsl9tKrrYx0Eqbay7NpodCZ/YhijGi8im46VVXBUH+jA7GLm9D8+afmOCadJj6MQZh1 LO60K3XtOlvoG+1DpnQpb982/zPVmr66FyzD4wHDOtU76+fC7GwnbnoEZIUYnIrLom+qdbsP ZVTXbkoKWnXazv6EYQ==
  • Delivered-to: asciidoc-lang-dev@xxxxxxxxxxx
  • List-archive: <>
  • List-help: <>
  • List-subscribe: <>, <>
  • List-unsubscribe: <>, <>
  • Openpgp: preference=signencrypt
  • User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 Thunderbird/60.9.0

On 09/03/2021 01:20, Lex Trotman wrote:
> [...] Remember mine is an experiment, so using
> plain code allows me to play with all sorts of things, several of which
> turned out to be bad ideas, but now I know that :-).

Same thing here. The difference is I implemented my own PEG engine. But
all this is highly experimental to say the least.

> Of course since the PEG is only text in the spec its cheap for me to
> extend it :-)
>     >
>     > I have found I needed to extend the PEG to handle nesting of sections
>     > and lists without writing out a limited depth set, how do you
>     address it?
>     >
>     For now, I use two separate PEGs. One is for the inline parser, the
>     other one for block-level parsing.
>     The PEG for the inline parser is quite stable, and I don't encounter
>     significant difficulties when adding new features.
>     The PEG for the block-level parsing is a different beast, though. It
>     replaces a (multiple times rewrote) hand-written parser. For now, it is
>     used as a tokenizer rather than a recursive descent parser. And the
>     actual block hierarchy construction is delegated to a stateful object in
>     the spirit of the factory method pattern
>     (
>     I tried to implement a recursive grammar for the block-level parser. But
>     I struggled at finding a way to match the delimiter in nested blocks
>     like in:
>       ====
>        ======
>        ======
>       ====
> Ahh yes another extension I did was that all non-terminals and tokens
> can return a numeric value (if they match) and I added a simple syntax
> for assigning that value if its needed, and comparing it to the
> parameter passed to the non-terminal.  Tokens return a count of relevant
> characters that varies from token to token but is effectively the level
> for section/list tokens or length of delimiter for block delimiters. 
> The (simplified) syntax  for a "section" would be:
> Section(level) ::= token_level = <start_line_equals>
> (:token_level==level:) Markup_text_line Section_contents *Section(level+1)
> where <line_start_equals> is the token from the lexer, (:expression:) is
> a test that will fail the non-terminal if it fails, and name= assigns
> the value to name. These have the obvious implementation in code.
>     I suspect there is an elegant way of doing that since PEG "can count",
>     but I yet have to find how. FWIW, I discovered PEG with this project, so
>     my knowledge of the technology is still fragile.

> I'm not aware that a formal PEG
> ( can count in
> a way that it can check context, eg number of equals is less, equal or
> more than current section level.  Of course any implementation of PEG in
> a programming language probably can use that language for the purpose,
> but the specification either needs some formal extension or the use of
> (shudder) words.  I'm in no way pushing my extensions, it just "works
> for me"(TM).

When I met the "PEG" acronym the first time, I searched it on Wikipedia.
I didn't find the article very enlightening. Fortunately, someone in the
V8 regex team pointed me toward the work of Roberto Ierusalimschy from
the PUC-Rio. He, and his colleagues, wrote great articles about Parsing
Expression Grammars. My PEG engine is an implementation of the virtual
machine described in "A Text Pattern-MatchingTool based on
ParsingExpression Grammars (2008)"

Take also a look at "Converting regexes to Parsing Expression Grammars
(2010)" []

When I said "PEG can count", I mean it can find balanced expressions of
arbitrary depth, something _formal_ regular expression cannot (though
many implementations support some form of recursion as an extension).
Since blocks in AsciiDoc are defined as nested balanced markups, I
suspect we could find a way to express that without requiring extensions
to PEGs. But my attempts at doing that were unfruitful. And I lack
experience in the field to spot a possible flaw in my reasoning.

> At the moment the spec and the code are out of sync and the code
> crashes, I want to fix that before I push to github "soon".

My code don't crash (thanks to JS), and all tests pass (most of the
time), but I'm not sure this can be useful to anyone. Anyhow it's on
github. If you want to take a look, the inline parser is here:
Feel free to comment ;)

- Sylvain

Attachment: signature.asc
Description: OpenPGP digital signature

Back to the top