Thoughts, questions:
1.a. The constraints are context around the markup, not part of the token, so it does not violate the requirement that markup be ASCII for the context to allow any Unicode.
1.b. As proposed in another thread initial "letter like" could be defined as a Unicode class L code point and final "letter like" could be defined as Unicode class L code point followed by any number of Unicode class M (combining characters such as accents) code points.
1.c. currently the punctuation allowed is (,;".?!) which are common English punctuations but do not include any non-English punctuation.
1.d. So should common non-English punctuation be allowed and which?
1.e. Should all Unicode category P punctuation be allowed?
1.e. Should punctuation be allowed before the initial constrained markup?
1.f. Should only Unicode category Pi and Ps be allowed before and Pf and Pe and Po after?
1.g. What is "space" (here I'm talking in the context of constrained markup, there is another thread that addresses it more generally), eg Unicode category Zs (
https://www.compart.com/en/unicode/category/Zs) and AsciiDoc line separators?
1.h. Or since unconstrained markup is available should the specification be conservative on what is allowed bounding unconstrained markup, the markups (*_`#~^) are uncommon in general English text, but tend to occur when talking about programming code and math, and I don't know how common they are in other languages. The rules are intended to minimise nuisance recognition of such use-cases as markup, so the more situations that markup is allowed the more nuisance occurrences are likely.
2. Escaping of unwanted markups (see 1.h.), backslash? But if all ASCII punctuation is allowed in the context that may impact use of backslash as escaping.
3. Current implementations of AsciiDoc do no parsing of text markup and it does not exist in the DOM. Instead direct substitution in a specific order is used, meaning backend issues reach well forward into the implementation. Also currently occurrences of markup characters in legal context, but for which the matching open/close markup does not exist are silently left as text.
3.a. It is proposed that the specification deprecate this mechanism and move to a recursive definition and parsing the text markup into the DOM. But that definition will interpret overlaps in a different manner so it isn't backward compatible. For example `*foo _blah* bletch_` could currently parse as `<bold>foo _blah</bold> bletch_` or `*foo <italic>blah* bletch</italic>` or the illegal `<bold>foo <italic>blah</bold> bletch</italic>` depending on the order the markup is substituted and if substitution ignores previous markup.
3.b. Recursive definition would define the parse based on the order of the markup in the source rather than some order in the implementation, and prevent recognition of overlaps so the above is always `<bold>foo _blah</bold> bletch_` since the bold opening is recognised first and there is no closing underscore inside the bold markup and there is no opening underscore outside it.
3.c. This also allows option of warning of the possible unmatched markup (the underscores above) which is useful since its is easy for humans to miss a single character left in the text when proofreading.
3.d. A recursive definition allows nesting restrictions can be relaxed (see 5).
3.e. Parsing into the DOM allows the semantics to be separately defined for backends rather than as part of the language syntax.
4. Attribute lists are currently allowed on highlight (#) markups only. Should they be allowed on other markups? The use-case is that currently nesting only of differing types of markup is allowed so highlights don't nest so attributes cannot be specified on nested markup, whereas attributes on all markup would allow `[.arole]#foo [.brole]_blah_ footoo#` to be specified.
5. A recursive definition would allow nesting of the same markup:
5.a. so long as its inside a different markup it can be recognised as nested, eg it is possible to allow `*foo _blah *footoo* blah_ foo*`. How useful that is depends on the backend, but for example in HTML I'm sure its possible to use CSS to select and style the nested `footoo` as something other than just bold.
5.b. If attribute lists are allowed on markup other than highlight then the application of a role allows styling to be applied to nested markups even easier.
5.c. An alternative or additional method of providing nesting is to recognise that an attribute list can be used to distinguish an opening markup from a closing markup, so nesting of the same markup becomes possible, eg `[.red]#foo [.green]#blah# foo#` as `<bold, class=red>foo <bold, class=green>blah</bold> foo</bold>` is possible. Attributes and nesting of different markups would make it easier for humans to match opening and closing markups, eg `[.red]#lots and lots of text [.green]*lots and lots more text* so this text is far away from the opening markups#`