Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Whitespace handling

> Sylvain wrote:
> what about the spaces around constrained markups (strong, emphasis, ...)? May we safely assume only ASCII is used there too?

Currently, yes. But this is something we may have to reconsider as we evolve the language. There may be scenarios where we need to treat a no-break space adjacent to a formatting mark just like we would a normal space. But this is detail I just don't think we're ready for yet (though we will get there for sure).

> Since trailing spacing has no effect in Asciidoc, it won't be a breaking change if we keep them up to the DOM for all blocks. According to my experiments, it would also slightly simplify the grammar for the inline parser.

When I mentioned how Asciidoctor works, I didn't mean to imply that its approach is required for all implementations. Asciidoctor uses a line-oriented parser. Since the trailing space is insignificant (I would argue even in verbatim blocks), Asciidoctor made the decision to strip them eagerly before parsing. (And users are now familiar with this behavior). This drastically simplifies the parsing approach that Asciidoctor uses. But it's not necessarily right for all implementations. At this point, we are definitely faced with a choice about whether to specify when trailing spaces are removed (including the newline), if ever. We'll have to think carefully about the implications. I happen to think that it makes it a lot simpler for the ecosystem to have lines already normalized by the time they go into the extensions. But I haven't really considered the counter argument at length, so I can't make a definitive statement right now.

> we agreed that "end of line" should makes its way up to the DOM. Do you think we should normalize the internal representation for the EOL so processors won't have to deal with its actual encoding in the source document?

I just don't know yet. I still think it is way too early to make that determination. What we do know is that sequence spaces and newlines should not be visible in normal paragraph text in the output document. That I feel confident in saying is a requirement. If you can guarantee that contract, then for now I'd say it doesn't matter what goes on inside your parser.

Best Regards,

-Dan

--
Dan Allen, Vice President | OpenDevise Inc.
Pronouns: he, him, his
Content ∙ Strategy ∙ Community
opendevise.com

Back to the top