Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Whitespace handling
  • From: Sylvain Leroux <sylvain@xxxxxxxxxxx>
  • Date: Fri, 12 Mar 2021 11:30:14 +0100
  • Autocrypt: addr=sylvain@xxxxxxxxxxx; keydata= xsFNBFdFUf4BEACl0a/nxBGmY4eqGLMYQTVTaUt+Z7SXkaYiiMx00suDDJpCsE3f6Qet4zaC 1EBBseb0x/164kC92cc8ZV5NN00qOKWEkf05/JrVEFFq4le78l/9yO5GTE9ORnrOEqbYrFYf +3ArkXHnxFmR1SCRyFGKTtgE2nGqbKicQgjOYQFS4DfRVkEyPfKsr7/J1GUUTHu/sD7nnNik +7trfLwva9D6EetRUnd+H/AV6QVw3jhgR9klpKMo7+bXi35IZShnYAN+kvuAvoCQDjv1L2L5 XkOf9gGNLJAdEKbBcK0UiQ80RvO6Vr0FejpA0tmRGGIqB5m6WNxRxpeFhgK32l1+pInjGIP3 1to6xf0+pJWuWL5ZfQq8+8+4J+5ibX/klD5D6b78aNV/B/NTO+wE2B1Umw1JWthnKlTbKLCj t4IvAXsQCJWXi55pyz2S2m2vMd1ffHKPl59jIJzUXy2nM9sQhFTzLeKUZ0V6RBUF9lGDAWwh 3pR0OaIvQzuBEf1qEdLBsjMsI9SJdMY4VOKWMCuSMm+KlaF3jsEPkgu+GymUDCbvv2ZIGwwK kXQbs2gqpicPUKXwiszbgx43wiwpTLQ+6ZRlaoKlbVlHoCC/eO2fMvfasUOJZzLZSHOPPsOr xCtygLrSBx5hLdAA7syJv1GVGQaE8IfQPM7P+5QPHVhgQ/mJEQARAQABzSRTeWx2YWluIExl cm91eCA8c3lsdmFpbkBjaGljb3JlZS5mcj7CwYIEEwEIACwCGyMFCQlmAYAHCwkIBwMCAQYV CAIJCgsEFgIDAQIeAQIXgAUCV+WKiQIZAQAKCRCrWB8dH2HFIpzYD/9KVcvI3xAlR+Ahxlvl AnxzwT1ZIhRT1YPbX3Fwr6l7lBuFfp8sGHejY9XNsGMDM/C4h+GxHKiY87KMLTI2P5TfHy2j MYHW4x2VhXTqOmUMtTO1/4DfamlTF/xwaXTy+jx5Z3ghaZDWWflaNXpbwB1j/gl0TjXCSeiK 7GPGFTPJt04JmTDxuTKXqdwHUpKQSZ5pqdufP2po+W/uxgamRXjHD7z8X04+xK5E7ic5pgaE YtquzZDRfnil3W4GSodX6dKdnhCN2r8tDqV0FsRSp3qRuvzBJ692WCH5FmXmvqiNpVCo+Fj1 T45TYB49yiRAzyJZwgZnEB0vH/HzybPmJC9z3wjPaoFmGOUp2imbHlu3ABWRnqPtdYcbDHBF Mrpop7oFAGxhxxiCGv30eEPYdHWgj0pwgja4Z/dauS1NlHBBAdOtG1ixV0+KgW4mP2RrA8aa epUinq7PydEAS9NoYSeSRaBeFjrZPCS+En6/2jyON5nmlgcnRFbTQWjnhRj5tNXPC/QKNBOd 55m+mZkolkF8wkx44bv+jQ8mmgtQGbrBFF9PAaPidPs4C3t7duIeW8zVXmqFH5lF1KmTsljf j79DhHbz3H5gg1UXFe+NYNVEC3rbTFYkdeuFnAOsWUbXl2B+yJ5KR899aKF5yz6pEWPcwjGk jKOx3wzbebkbVvvHX87BTQRXRVH+ARAAoOcKbTwX/+5hwyqgxF//jDo3eMwQUdXUdi5JkiRA dEmJAlAAAfL6IL03rcrKCViPD9W/hL8coa4uUTko5EXkVFLIvq2Npmlr26lGnE5Ae+L4KHn+ qtUUm5Mg9xjtUoukhYjBv6IDXuONcI1iC93tpTsHbNmqG3QXjRWwVs3cCflZLvpKqoC7cXYt 7bKcb/B7lAD3aYqo+plr6zlqSHKTigGIO64eu/TfcUAQxU+/wGfSv1wekHauvFgRumfPJxU0 s4VLUCtAN9huRuET3iqVRtQk1TayLyZDeryxVJhcMTs6qs2n/9s4aZHRBM1iPbFqZ5YXVF03 ySgCj0fXSZ40PY8tqjMSuowRUSA8979EBMi94j4MLGmBwwbp4P1RaNbvvSyYebr2nV+LPDqc oDEI3BpJDz5PCYJOoKZWc2vTWnCjjzufybhZfzRWfzALupdbKq5XkQwMXxlx40GBngpvXc9P yPp8XkbkeEjx4Z2LWU6SUuZmmzoTDzo7J9KA4X3Shdxjdev8xlhSOCooHre3yi1VfPkeuggn 3JYycrio1uJqGUE01XtKKqmqe0sPNgBA+YyV+QNLsDRzk/qTDvbfjq76onYllZTl5mTEN94B uTmS6vKbqg5wiL9usGzOM9MdLzZ2VEUd2y3FqoUMngNRzpotsTqICNFYTzu7mOr1ji8AEQEA AcLBZQQYAQgADwUCV0VR/gIbDAUJCWYBgAAKCRCrWB8dH2HFIh6ID/9s+rRqmUPJm95gMamc W2qvfXmB60xP+Pcbt9tiJEvHF9PdwfEaREH7DxDrq/URgBJ/EYhcDdKJgOzMzV8dGE/EbuO4 KgpEDwT6P8ZjEhEdGouyPYL9SX0nBoxigI7RCmk+4WJ8S4RNcI6guOgGYKSKo/CdGBQhlhK+ 2PoviUaWpy/pBzMwCr6V74qifu0VS2kneOUYOB5UzI/dOy7akFZl7U1Wk8gtJg+Vcvik+UPg T59MWQU+NVJt2ehllXccjC3ImApufu5Yq4GIFEZ/zmAYCdD4TzgfvknDFC4ibyKkddv+eJHd Vn2bWK24s8f/JekOdOboWEBRPJg1XuGVdiB2o79KOhx42/wxZrnG07+1sUyhcpszruLbGn6H 1sjcPL/ELVoicVB3VcguXw+t3ZrnPSnuwBBNkJsQbA4rcBxbYlHV9BINbaV3W7+7FBnhPMT3 7FZ/xDGcGKlOpQVkuNhP7Awa8DPqPbO63mjnrYhkCQe5ySvNdpMxHVd/j6TWg4XE/fJx+62X NFeLWXsl9tKrrYx0Eqbay7NpodCZ/YhijGi8im46VVXBUH+jA7GLm9D8+afmOCadJj6MQZh1 LO60K3XtOlvoG+1DpnQpb982/zPVmr66FyzD4wHDOtU76+fC7GwnbnoEZIUYnIrLom+qdbsP ZVTXbkoKWnXazv6EYQ==
  • Delivered-to: asciidoc-lang-dev@xxxxxxxxxxx
  • List-archive: <https://dev.eclipse.org/mailman/private/asciidoc-lang-dev/>
  • List-help: <mailto:asciidoc-lang-dev-request@eclipse.org?subject=help>
  • List-subscribe: <https://dev.eclipse.org/mailman/listinfo/asciidoc-lang-dev>, <mailto:asciidoc-lang-dev-request@eclipse.org?subject=subscribe>
  • List-unsubscribe: <https://dev.eclipse.org/mailman/options/asciidoc-lang-dev>, <mailto:asciidoc-lang-dev-request@eclipse.org?subject=unsubscribe>
  • Openpgp: preference=signencrypt
  • User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 Thunderbird/60.9.0

On 12/03/2021 00:09, Lex Trotman wrote:>
>
> On Tue, 9 Mar 2021 at 19:30, Sylvain Leroux <sylvain@xxxxxxxxxxx
> <mailto:sylvain@xxxxxxxxxxx>> wrote:
>
>     On 09/03/2021 01:20, Lex Trotman wrote:
>     > Ahh yes another extension I did was that all non-terminals and
tokens
>     > can return a numeric value (if they match) and I added a simple
syntax
>     > for assigning that value if its needed, and comparing it to the
>     > parameter passed to the non-terminal.  Tokens return a count of
>     relevant
>     > characters that varies from token to token but is effectively the
>     level
>     > for section/list tokens or length of delimiter for block
delimiters.
>     > The (simplified) syntax  for a "section" would be:
>     >
>     > Section(level) ::= token_level = <start_line_equals>
>     > (:token_level==level:) Markup_text_line Section_contents
>     *Section(level+1)
>     >
>     > where <line_start_equals> is the token from the lexer,
>     (:expression:) is
>     > a test that will fail the non-terminal if it fails, and name=
assigns
>     > the value to name. These have the obvious implementation in code.
>     >
>     Yes, that's a problem for a PEG-based _implementation_. For the
specs, I
>     suppose we can add such guard clauses without causing too much harm.
>
>
> On thinking about it some more, I guess "can count" is true for any
> recursive language, [...]

"can count" is a statement I heard years ago in a CS course. To be a
little bit more formal, as far as I remember:
* a regular language is decidable by a finite state automaton. So it can
count without bound a^n (one term repeated an arbitrary number of times).
* a context-free language is decidable by a pushdown automaton, and thus
it can count without bound (a^n)(b^n) (two terms repeated the same
number of times).
* a context-sensitive language is decidable by a linear bounded
automaton. It can count without bounds (a^n)(b^n)(c^n) and similar
language with more letters.


Context-sensitive languages are a proper subset of recursive languages.
And context-free languages are a proper subset of context-sensitive
languages. So if you have an automaton for a recursive language, it's
*largely* sufficient to recognize (a^n)(b^n)

Take all that with the (huge) grain of salt it deserves, given my study
years are far away. Now, back to our topic:


> [...] if you recurse on each character of a block
> delimiter then return on each character of the next you should be back
> where you started, otherwise its not a matching delimiter. But
> testing of section and list depth is trickier. Its the way some
> functional languages do looping (and C++ Template Metaprogramming).  But
> that sort of recursion isn't really the sort of thing I would try using
> in a specification, only full Computer Scientists would understand, and
> we want the specification to be clear to writers checking edge cases of
> their documents I would think.
>

I tried something like that a couple of days ago:


block := '=' block '=' | block-content
block-content := EOL block-list
block-list := (text EOL | block EOL ) block-list | ε
                          ^^^^^

As you said, the tricky part is when dealing with nested blocks. I
didn't find a way to handle the constraint "the nested block delimiter's
length must be different from the length of the delimiter for any
enclosing block." For that you need to keep track of all open parent
blocks and that didn't fit well with my approach of the problem. This
issue is somewhat related to the question I asked yesterday
(https://www.eclipse.org/lists/asciidoc-lang-dev/msg00141.html).


At this point, I gave up trying to implement an RD parser for block
parsing. I just tokenize the input and use a stateful object to
construct the tree from the tokens stream. The good thing is, that way,
I get rid of backtracking while parsing at the block level. And after
all, this solution is not that inelegant ;)


- Sylvain


Attachment: signature.asc
Description: OpenPGP digital signature


Back to the top