Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Avoiding Implementation Specifics Thoughts

On Tue, 23 Feb 2021 at 06:39, David Jencks <david.a.jencks@xxxxxxxxx> wrote:
I personally think it would be great if everyone who is working on an AsciiDoc grammar or parser would continually publish the current state of their work, no matter how little it does and how much it does wrong.

Yes, but I'm old enough and grumpy enough to know that very incomplete works generally only get comments that they are incomplete, so at least working to the point it proves the theory is necessary.

> On Feb 22, 2021, at 4:03 AM, Lex Trotman <exciidoc@xxxxxxxxx> wrote:
> One thing that has been mentioned several times is the concept of "extension points".  This needs careful definition and consideration if its to be included in any specification to avoid implementation specifics.
> Asciidoctor makes good use of the dynamism of its development environment, Ruby, to allow customisation.  And no markup is likely to cover every eventuality, so some extendability seems important (although as devil's advocate I point out that documents using extensions likely makes the documents _not_ Asciidoc Specification compliant). 

Well, since the behavior of Asciidoctor is the current spec, and Asciidoctor supports extensions, they are spec compliant :-)  More seriously, this is a big question.  Perhaps another way to state it is, ‘Does which extensions are installed affect whether a document is grammatical?”  This is especially complicated by inline macros defined using a regex.

Your :-) noted, but that is why Dan had the language separated from the implementation before the initial upload to where the extension point concept is not mentioned AFAICT.

Current extension methods are very implementation specific, people built large extension tool sets around Asciidoc Python and then Asciidoctor, but they are not portable and as shown by the Asciidoc Python experience, they are then left behind when the centre of gravity shifts to another implementation.  It would therefore be good if the specification included a standardised extension mechanism to help portability.
> But languages like C or other fully compiled languages are not so amenable to being changed dynamically.  It would seem to be not a good thing to specify the capability in a way that implementations in those languages cannot reasonably be made to comply.

You can always use CORBA…. but there must be something more usable by now.

I'm not sure that supporting CORBA counts as "reasonable" nor that such an implementation would be performant, but the IDL is certainly a candidate for specifying standard APIs such as the DOM interface and maybe standard extension capabilities.  At least then extensions could be portable between implementations in the same or compatible languages, eg C in C++.
> Another implementation specific is the handling of the simplest markup, constrained and unconstrained quotes.  The current implementations (AFAIK all of them) perform the recognition and replacement in a fixed order, rather than the order that the markup occurs in the document (this is the source of the problem, independent if you use regexes or lexers to recognise the tokens). 
> Replacing that with a recursive descent parser giving an in-order AST is simple (I have an experimental one already that I hope to publish to github in a few weeks, depending on how my "real" world goes, it has lots of other experiments too :-) but the results are different to existing processors.  So for Asciidoc 1.0 this might need to stay as is for compatibility, and documents that depend on it could be deprecated ready for Asciidoc 2.0 that changes the processing order.
> The same issue occurs at the next higher level as well, the ordered recognition of inline markup, special, quotes, replacements, etc and its sidekick, the infamous "subs=" attribute.  That makes the Asciidoc language not only context dependent (in structures like sections and lists), but "subs=" is _content_ dependent.  Not many (actually not any AFAIK) programming languages allow the source code to specify which language constructs are allowed in parts of the program, making normal formal computer language methods difficult to apply.  Imagine if a programming language source could say "no, the `while` construct isn't to be parsed in this part of the program", that is what "subs=" does.

I’ve wondered if Antlr modes can provide a solution to this, but haven’t seriously investigated.

I'm not an Antrlist but a quick google seems to suggest its modes are a context handling method, but specifying the link from content to context/modes/whatever you call 'em still seems to need English if its not to be expressed in a language specific manner (ie in Java/C/whatever).  But maybe a simple pseudocode would suffice for specification purposes.

I should emphasise my subtle wording difference above, the "subs=" is no longer a "substitute" controller, but a "parsing" controller and again that change can produce different results to current implementations.


> It will be interesting to see how this is formalised.

Yes indeed!

Thanks for posting!
David Jencks
> Cheers
> Lex
> _______________________________________________
> asciidoc-lang-dev mailing list
> asciidoc-lang-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit

asciidoc-lang-dev mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit

Back to the top