Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Avoiding Implementation Specifics Thoughts

Hi all,

I agree this is a long way off but while on the subject...

I don't think it is possible to create a well defined API in quite that way.
Firstly you have to consider where the API might be used.
That gives you at least two environments:
      web based   - where _javascript_ is the only acceptable common denominator (until WASM is more mature - unless you want to mandate asm.js)
      native           - where C is still top dog for most portable API language - meaning you lose or have to wrap 'modern' functionality - OO, automatic memory management etc.
                            (though in practice you could get away with C++ on any gcc platform except for highly constrained embedded platforms)

What you could do instead is define a reference API in a specific language and encourage people to implement the same API (via wrapping lowest common denominators like the above).
You then built that to the highest quality possible but in layers so that implementors can gradually increases levels of compliance (like DOM level 1, level 2 for XML).

What I think is actually more important here is the domain model / ASG. This needs to be designed to reflect the semantics and should not be coupled too tightly to the syntax.
The model for extensions should be in terms of external 'routines' that manipulate the ASG possibly subject to some limitations.
The extensions are then as portable in principle as the low level API is.

I strongly agree with @David Jencks

"Cross (computer) language portability has a long and unsuccessful career in programming"
"We have a giant challenge to develop a grammar.  Lets not add some known-to-be-unsolvable problems to our task list :-)"

We one caveat. I think the semantics are far more important than the syntax.
The reason I use asciidoc rather say markdown is that the semantics go furthest in principle towards 'doing anything with text'. Its like docbook without the ugly XML syntax.

Representing "meaning" is an even deeper rabbit hole than cross language portability - as its AI complete - but there is a good foundation to work from.
You don't need to go all the way down provided you have good abstraction mechanisms.

These should be based on the semantics wherever possible rather than text substitution.
Text substitution is an important use case however. I would like to be able to use asciidoc as a templating language instead of something like Mustache
text substition seems easier to naive (and even not so naive) implementers but it stores up a wealth of problems for the future - like the C preprocessor.
Some kind of module mechanism will need to augment or replace includes as soon as possible (but likely V2).

There is a fundamental tension between trying to write freely, try to make text presentable and trying to fully represent the semantics.
The weapon against complexity is abstraction.
The web industry has learned there are seveal layers including:
Likewise the software industry has explored many kinds of abstraction:
     Hygienic macro

The first thing is to stabilize the specifications for V1 which is enough work in itself but beyond that these matters need to be taken very seriously
lest an even better semantic markup topples asciidoc off its throne.


Bruce  (randomly delurking user / collaborator)

On Tuesday, February 23, 2021, 6:58:38 PM GMT, Mattias Holm <lorrden@xxxxxxxxxxxxx> wrote:

Why not _javascript_ / or WASM with a well defined API. A C or C++-implementation of AsciiDoc could easily integrate with existing JS runtimes (as could Asciidoctor in ruby and with the JS version it would be "for free").

There is enough of prior work in browser to establish that compatibility between implementations will be practical. But such an undertaking would certainly be challenging, but would IMO be of great benefit for the asciidoc ecosystem.

But I think everyone agrees that this is for later.


On 23 Feb 2021, at 18:25, David Jencks <david.a.jencks@xxxxxxxxx> wrote:

My bringing up CORBA was a bit facetious…. it would be a bit like trying to dust your Limoges china with a bulldozer. I don’t know how linking, dynamic or otherwise, works with C, but I believe CORBA would avoid that. If you want to write an AsciiDoc implementation in C, I’m fine with you also having to compile in the extensions you want to use… IIRC there’s a Linux bistro (gentoo?) that compiles everything from source as you install it.

Cross (computer) language portability has a long and unsuccessful career in programming…. REST is the current favorite AFAIK.  If the general programming community can’t solve this problem I think that’s a hint that it’s outside any reasonable bounds for this specification. Not only do I think the spec shouldn’t consider trying to specify anything about cross-language extension portability, I  don’t think it should specify much for any particular language.  It’s fine with me if extensions have to be written for a particular implementation in say Java.  For instance, I would be strongly opposed to requiring use of the ServiceLoader mechanism for loading extensions, when IMO OSGI with declarative services provide an infinitely superior alternative.

What’s important with extensions is to specify where and when they can run, what information they get, and, what they can modify.

We have a giant challenge to develop a grammar.  Lets not add some known-to-be-unsolvable problems to our task list :-)

David Jencks

On Feb 23, 2021, at 3:24 AM, Lex Trotman <exciidoc@xxxxxxxxx> wrote:

On Tue, 23 Feb 2021 at 06:39, David Jencks <david.a.jencks@xxxxxxxxx> wrote:
I personally think it would be great if everyone who is working on an AsciiDoc grammar or parser would continually publish the current state of their work, no matter how little it does and how much it does wrong.

Yes, but I'm old enough and grumpy enough to know that very incomplete works generally only get comments that they are incomplete, so at least working to the point it proves the theory is necessary.

> On Feb 22, 2021, at 4:03 AM, Lex Trotman <exciidoc@xxxxxxxxx> wrote:
> One thing that has been mentioned several times is the concept of "extension points".  This needs careful definition and consideration if its to be included in any specification to avoid implementation specifics.
> Asciidoctor makes good use of the dynamism of its development environment, Ruby, to allow customisation.  And no markup is likely to cover every eventuality, so some extendability seems important (although as devil's advocate I point out that documents using extensions likely makes the documents _not_ Asciidoc Specification compliant).  

Well, since the behavior of Asciidoctor is the current spec, and Asciidoctor supports extensions, they are spec compliant :-)  More seriously, this is a big question.  Perhaps another way to state it is, ‘Does which extensions are installed affect whether a document is grammatical?”  This is especially complicated by inline macros defined using a regex.

Your :-) noted, but that is why Dan had the language separated from the implementation before the initial upload to where the extension point concept is not mentioned AFAICT.

Current extension methods are very implementation specific, people built large extension tool sets around Asciidoc Python and then Asciidoctor, but they are not portable and as shown by the Asciidoc Python experience, they are then left behind when the centre of gravity shifts to another implementation.  It would therefore be good if the specification included a standardised extension mechanism to help portability.
> But languages like C or other fully compiled languages are not so amenable to being changed dynamically.  It would seem to be not a good thing to specify the capability in a way that implementations in those languages cannot reasonably be made to comply.

You can always use CORBA…. but there must be something more usable by now.

I'm not sure that supporting CORBA counts as "reasonable" nor that such an implementation would be performant, but the IDL is certainly a candidate for specifying standard APIs such as the DOM interface and maybe standard extension capabilities.  At least then extensions could be portable between implementations in the same or compatible languages, eg C in C++.
> Another implementation specific is the handling of the simplest markup, constrained and unconstrained quotes.  The current implementations (AFAIK all of them) perform the recognition and replacement in a fixed order, rather than the order that the markup occurs in the document (this is the source of the problem, independent if you use regexes or lexers to recognise the tokens).  
> Replacing that with a recursive descent parser giving an in-order AST is simple (I have an experimental one already that I hope to publish to github in a few weeks, depending on how my "real" world goes, it has lots of other experiments too :-) but the results are different to existing processors.  So for Asciidoc 1.0 this might need to stay as is for compatibility, and documents that depend on it could be deprecated ready for Asciidoc 2.0 that changes the processing order.
> The same issue occurs at the next higher level as well, the ordered recognition of inline markup, special, quotes, replacements, etc and its sidekick, the infamous "subs=" attribute.  That makes the Asciidoc language not only context dependent (in structures like sections and lists), but "subs=" is _content_ dependent.  Not many (actually not any AFAIK) programming languages allow the source code to specify which language constructs are allowed in parts of the program, making normal formal computer language methods difficult to apply.  Imagine if a programming language source could say "no, the `while` construct isn't to be parsed in this part of the program", that is what "subs=" does.

I’ve wondered if Antlr modes can provide a solution to this, but haven’t seriously investigated.

I'm not an Antrlist but a quick google seems to suggest its modes are a context handling method, but specifying the link from content to context/modes/whatever you call 'em still seems to need English if its not to be expressed in a language specific manner (ie in Java/C/whatever).  But maybe a simple pseudocode would suffice for specification purposes.

I should emphasise my subtle wording difference above, the "subs=" is no longer a "substitute" controller, but a "parsing" controller and again that change can produce different results to current implementations.


> It will be interesting to see how this is formalised.

Yes indeed!

Thanks for posting!
David Jencks
> Cheers
> Lex
> _______________________________________________
> asciidoc-lang-dev mailing list
> asciidoc-lang-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit

asciidoc-lang-dev mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
asciidoc-lang-dev mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
asciidoc-lang-dev mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit

asciidoc-lang-dev mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit

Back to the top