Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Whitespace handling



On Wed, 3 Mar 2021 at 11:09, Dan Allen <dan@xxxxxxxxxxxxxx> wrote:
>>> Lex:
>>> "Asciidoc spacing characters" is fine by me, if a bit long, but {asc} can fix that :-)
>> Dan wrote:
>> Excellent. We will certainly define it.
> Sylvain wrote:
> FWIW, it suits me well too ;)

I've been informed by someone well-versed in Unicode that "spacing mark" is already well-defined in Unicode (https://unicode.org/glossary/#spacing_mark) and trying to apply a different meaning to it could prove confusing. I'm not abandoning the option yet, but it did prompt me to consider an alternative.

"invisible characters" (perhaps "invisibles" for short)

The reason I like this is because spaces, tabs, and newlines really are there in the text. They just happen to be invisible. And for people who are less familiar with the nomenclature of ASCII and Unicode, it's intuitive because it's self-describing. Something is clearly there, it's just invisible.


Well, spaces are not actually invisible, they don't make a mark on the screen or paper, but they certainly are visible between words.  To me the common semantic of "invisible" doesn't work in that case, and I'm afraid I doubt most readers will check for the defined AsciiDoc specific meaning of "invisibles" in the glossary.  And there are many other "invisible" characters that have no visible presence that we don't (I would think) intend to behave as spacing after section, list, or beside quote markups, for example just in ASCII there are most of the controls and the DEL character, but some of the controls eg \n\r will be in the set in some cases.
  
Since we are talking about invisible characters specific to AsciiDoc, we would still have to qualify it as AsciiDoc invisible characters.


In which case if it is always qualified we might as well use "AsciiDoc spacing", IIUC its "spacing mark" that is the Unicode term but it doesn't apply to the context we have (trust Unicode to make life complicated).
 
It might read something like this:

"A section title consists of a level marker followed by at least one invisible character followed by a non-empty title."

As I said above:

=== foo

has visible space between the equals and the "foo".

I just realised at this point in my reply, in this location should anything other than the Unicode "space separator" category https://www.compart.com/en/unicode/category/Zs be acceptable?

Maybe we can base a term on "separator", eg AsciiDoc Separator, Separator Markup or something like "Aseparator" which is clearly not a common use word and can't be confused?
 

Just an option to consider.

The main challenge that remains, either way, is that we have two definitions of the invisible/spacing character group. One includes the newline character, the other doesn't. So we might need "in-line invisible character" in the previous example (or "in-line invisible" for short).

Indeed there may need to be more than one defined, named character set.  I would suggest that defined names like this should be non-words like Aseparator, or at least start with capitals like Separator so they are visually different to the same word in common usage.  That capitalisation is a technique in common use in many formal legal and specification documents I have worked with, if its capitalised it means the defined meaning.
 
Cheers
Lex


Best Regards,

-Dan

--
Dan Allen, Vice President | OpenDevise Inc.
Pronouns: he, him, his
Content ∙ Strategy ∙ Community
opendevise.com
_______________________________________________
asciidoc-lang-dev mailing list
asciidoc-lang-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/asciidoc-lang-dev

Back to the top