Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof

From: Lex Trotman <exciidoc@xxxxxxxxx>
Date: Mon, 8 Mar 2021 17:03:05 +1000
Delivered-to: asciidoc-lang-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/asciidoc-lang-dev/>
List-help: <mailto:asciidoc-lang-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/asciidoc-lang-dev>, <mailto:asciidoc-lang-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/asciidoc-lang-dev>, <mailto:asciidoc-lang-dev-request@eclipse.org?subject=unsubscribe>

...

1. constrained markup uses two single characters as markup, so to avoid clashes with content they are only valid markup in constrained situations. The general intention is that they surround words or groups of words with the opening being before a word and the closing being after a word. Start of word and end of word are defined as space followed by letter like characters and the end of the word is defined to be letter like character then space or some punctuations.

This is the No.1 confusion to me initially. There are no spaces between words/characters for CJKV, so the concept doesn't really apply. Better to just have one markup to cover all.

Yes the definition of constrained markups (single * _ # etc) is not suitable for languages/scripts where the concept of "word" and "word separator" does not exist. Unconstrained markups (doubled ** __ ##) should however work for those languages/scripts. If there are places where unconstrained does not work that should be addressed to the specification (or implementation if its a bug in one of those).

But removing the constrained markups that work in many languages/scripts simply because they don't work in all does not seem appropriate to me, especially as they are heavily used in existing documents and fit with the AsciiDoc lightweight readable markup goal.

...

4. Attribute lists are currently allowed on highlight (#) markups only. Should they be allowed on other markups? The use-case is that currently nesting only of differing types of markup is allowed so highlights don't nest so attributes cannot be specified on nested markup, whereas attributes on all markup would allow `[.arole]#foo [.brole]_blah_ footoo#` to be specified.

5. A recursive definition would allow nesting of the same markup:

5.a. so long as its inside a different markup it can be recognised as nested, eg it is possible to allow `*foo _blah *footoo* blah_ foo*`. How useful that is depends on the backend, but for example in HTML I'm sure its possible to use CSS to select and style the nested `footoo` as something other than just bold.

5.b. If attribute lists are allowed on markup other than highlight then the application of a role allows styling to be applied to nested markups even easier.

5.c. An alternative or additional method of providing nesting is to recognise that an attribute list can be used to distinguish an opening markup from a closing markup, so nesting of the same markup becomes possible, eg `[.red]#foo [.green]#blah# foo#` as `<bold, class=red>foo <bold, class=green>blah</bold> foo</bold>` is possible. Attributes and nesting of different markups would make it easier for humans to match opening and closing markups, eg `[.red]#lots and lots of text [.green]*lots and lots more text* so this text is far away from the opening markups#`

Regarding 4 and 5c, when attributes come in, I found it is a little harder to read (maybe just myself), and the attributes come before the markup, which is different from the inline macros.

AsciiDoc currently is fairly consistent, all entities have attributes specified before the entity except macros and directives (the xyz: type markups).

There are a number of reasons why attributes occur before most other entities:

1. Some entities like sections, lists etc have no end markup and several nested entities can finish at the same place, so it would be hard to decide which entity an attribute list applies to if it was after the entity, and so the attribute list is before the entity.

2. Some entities allow control of the markup that is recognised in them by the notorious `subs=` attribute. That has to be encountered before the content is parsed to know what to recognise unless the costs of multiple pass recognition is forced on all implementations.

3. So, since some entities have a reasonable need to have the attribute lists before the entity, and no entity has a requirement for the attribute list to be after the entity, most entities have them before for consistency.

The inconsistency is macros and directives which are not normal markup, they are indications of special processing and they don't need the attributes before they are recognised. But most need a "target" which is not AsciiDoc markup text, eg include:: needs a filename, https: needs a URL, and ifdef:: needs a list of attribute names. So the attribute list is placed after the target to act as delimiter for the part that is not markup. Yes, current implementations scan some targets as marked up text and that causes issues, but correcting that is a separate issue that the specification should address. To fit in with the lightweight readable goal some macros/directives allow alternative delimiters if no attribute list is needed, eg the target for https: is delimited by space as an alternative to an empty attribute list, but that is not the case for all macros/directives.

Probably it is better to only allow attributes in the long notation? like I proposed in another email thread that short and long notations (inline macro) can be used for all inline markups.

As I said above, AsciiDoc is intended to be a lightweight readable markup, cases where long notation is required (ie is the _only_ way to do it) should be kept to the minimum.

Although you personally find prefix notation harder to read, others find it easier and/or have a personal preference to know that special circumstances apply to the entity before they read it, there is no "right" answer unfortunately or life would be much simpler.

Cheers

Lex

...

Follow-Ups:
- Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof
  - From: Sylvain Leroux

References:
- [asciidoc-lang-dev] Text Markup, syntax and parsing thereof
  - From: Lex Trotman
- Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof
  - From: 马旋（MA Xuan）

Prev by Date: Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof
Next by Date: Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof
Previous by thread: Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof
Next by thread: Re: [asciidoc-lang-dev] Text Markup, syntax and parsing thereof
Index(es):
- Date
- Thread

Breadcrumbs