Top

Formatting (Pretty Printing)

A formatter can be implemented via the IFormatter (src) service. Technically speaking, a formatter is a Token Stream which inserts/removes/modifies hidden tokens (white space, line-breaks, comments).

The formatter is invoked during the serialization phase and when the user triggers formatting in the editor (for example, using the CTRL+SHIFT+F shortcut).

Xtext ships with two formatters:

A declarative formatter can be implemented by subclassing AbstractDeclarativeFormatter (src), as shown in the following example:

public class ExampleFormatter extends AbstractDeclarativeFormatter {

  @Override
  protected void configureFormatting(FormattingConfig c) {
    ExampleLanguageGrammarAccess f = getGrammarAccess();
    
    c.setAutoLinewrap(120);
    
    // find common keywords an specify formatting for them
    for (Pair<Keyword, Keyword> pair : f.findKeywordPairs("("")")) {
      c.setNoSpace().after(pair.getFirst());
      c.setNoSpace().before(pair.getSecond());
    }
    for (Keyword comma : f.findKeywords(",")) {
      c.setNoSpace().before(comma);
    }

    // formatting for grammar rule Line
    c.setLinewrap(2).after(f.getLineAccess().getSemicolonKeyword_1());
    c.setNoSpace().before(f.getLineAccess().getSemicolonKeyword_1());
    
    // formatting for grammar rule TestIndentation
    c.setIndentationIncrement().after(
        f.getTestIndentationAccess().getLeftCurlyBracketKeyword_1());
    c.setIndentationDecrement().before(
        f.getTestIndentationAccess().getRightCurlyBracketKeyword_3());
    c.setLinewrap().after(
        f.getTestIndentationAccess().getLeftCurlyBracketKeyword_1());
    c.setLinewrap().after(
        f.getTestIndentationAccess().getRightCurlyBracketKeyword_3());
    
    // formatting for grammar rule Param
    c.setNoLinewrap().around(f.getParamAccess().getColonKeyword_1());
    c.setNoSpace().around(f.getParamAccess().getColonKeyword_1());
    
    // formatting for Comments 
    cfg.setLinewrap(0, 1, 2).before(g.getSL_COMMENTRule());
    cfg.setLinewrap(0, 1, 2).before(g.getML_COMMENTRule());
    cfg.setLinewrap(0, 1, 1).after(g.getML_COMMENTRule());
  }
}

The formatter has to implement the method configureFormatting(...) which declaratively sets up a FormattingConfig (src).

The FormattingConfig (src) consist of general settings and a set of formatting instructions:

General FormattingConfig Settings

setAutoLinewrap(int) defines the amount of characters after which a line-break should be dynamically inserted between two tokens. The instructions setNoLinewrap().???(), setNoSpace().???() and setSpace(space).???() suppress this behavior locally. The default is 80.

FormattingConfig Instructions

Per default, the declarative formatter inserts one white space between two tokens. Instructions can be used to specify a different behavior. They consist of two parts: When to apply the instruction and what to do.

To understand when an instruction is applied think of a stream of tokens whereas each token is associated with the corresponding grammar element. The instructions are matched against these grammar elements. The following matching criteria exist:

  • after(element): The instruction is applied after the grammar element has been matched. For example, if your grammar uses the keyword ";" to end lines, this can instruct the formatter to insert a line break after the semicolon.
  • before(element): The instruction is executed before the matched element. For example, if your grammar contains lists which separate their values with the keyword ",", you can instruct the formatter to suppress the white space before the comma.
  • around(element): This is the same as before(element) combined with after(element).
  • between(left, right): This matches if left directly follows right in the document. There may be no other tokens in between left and right.
  • bounds(left, right): This is the same as after(left) combined with before(right).
  • range(start, end): The rule is enabled when start is matched, and disabled when end is matched. Thereby, the rule is active for the complete region which is surrounded by start and end.

The term tokens is used slightly different here compared to the parser/lexer. Here, a token is a keyword or the string that is matched by a terminal rule, data type rule or cross-reference. In the terminology of the lexer a data type rule can match a composition of multiple tokens.

The parameter element can be a grammar's AbstractElement (src) or a grammar's AbstractRule (src). All grammar rules and almost all abstract elements can be matched. This includes rule calls, parser rules, groups and alternatives. The semantic of before(element), after(element), etc. for rule calls and parser rules is identical to when the parser would "pass" this part of the grammar. The stack of called rules is taken into account. The following abstract elements can not have assigned formatting instructions:

  • Actions. E.g. {MyAction} or {MyAction.myFeature=current}.
  • Grammar elements nested in data type rules. This is due to to the fact that tokens matched by a data type rule are treated as atomic by the serializer. To format these tokens, please implement a ValueConverter.
  • Grammar elements nested in CrossReference (src).

After having explained how rules can be activated, this is what they can do:

  • setIndentationIncrement() increments indentation by one unit at this position. Whether one unit consists of one tab-character or spaces is defined by IIndentationInformation (src). The default implementation consults Eclipse's IPreferenceStore.
  • setIndentationDecrement() decrements indentation by one unit.
  • setLinewrap(): Inserts a line-wrap at this position.
  • setLinewrap(int count): Inserts count numbers of line-wrap at this position.
  • setLinewrap(int min, int def, int max): If the amount of line-wraps that have been at this position before formatting can be determined (i.e. when a node model is present), then the amount of of line-wraps is adjusted to be within the interval min, max and is then reused. In all other cases def line-wraps are inserted. Example: setLinewrap(0, 0, 1) will preserve existing line-wraps, but won't allow more than one line-wrap between two tokens.
  • setNoLinewrap(): Suppresses automatic line wrap, which may occur when the line's length exceeds the defined limit.
  • setSpace(String space): Inserts the string space at this position. If you use this to insert something else than white space, tabs or newlines, a small puppy will die somewhere in this world.
  • setNoSpace(): Suppresses the white space between tokens at this position. Be aware that between some tokens a white space is required to maintain a valid concrete syntax.

Grammar Element Finders

Sometimes, if a grammar contains many similar elements for which the same formatting instructions ought to apply, it can be tedious to specify them for each grammar element individually. The IGrammarAccess (src) provides convenience methods for this. The find methods are available for the grammar and for each parser rule.

  • findKeywords(String... keywords) returns all keywords that equal one of the parameters.
  • findKeywordPairs(String leftKw, String rightKw): returns tuples of keywords from the same grammar rule. Pairs are matched nested and sequentially. Example: for Rule: '(' name=ID ('(' foo=ID ')'')' | '(' bar=ID ')' findKeywordPairs("("")") returns three pairs.