I was inspired by Chris Okasaki’s blog article about mandatory indentation. Not that indentation could be made mandatory in Erlang – it would break way too much code – but the idea of inserting indentation tokens in the token stream did seem simple enough, that I at least had to try it.

I made a copy of erl_scan.erl (named erl_scan_ind.erl) and made it figure out indentation tokens. Then I added to the Erlang grammar in erl_parse.yrl. All the old rules remain, but some new rules were added to account for indentation tokens. For example:


clause_body -> '->' exprs: '$2'

becomes:


clause_body -> '->' exprs: '$2'
clause_body -> '->' 'IN' exprs 'OUT' : '$3'.

The indentation tokens I used were:

  • ‘IN’ for indent
  • ‘OUT’ for outdent (one for each matching indent)
  • ‘ALIGN’ for when the next line keeps the same indentation
  • ‘END’ when indentation goes back to zero

So a sequence of expressions could be written without commas, based on the following rule:


exprs -> expr : ['$1'].
exprs -> expr ',' exprs : ['$1' | '$3'].
exprs -> expr 'ALIGN' exprs : ['$1' | '$3'].

My test program, which I was eventually able to compile, looked like this:

-module(test).

-compile(export_all).
-scan(indentation).

f(X) ->
    X+2
.

g(X) ->
    X+4
.

h(X) ->
    Y = case X of
          a ->
            {a}
          b ->
            {b}
    end
    Y
.

(Note especially that the final ‘end’ must be aligned with the Y, rather than the ‘case’. Perhaps this could be avoided…?)

The ending dots don’t have to be on their own line. Getting rid of them was too hard for me, since ‘dot’ is the end token for the Erlang grammar.

The -scan(indentation). attribute tells epp to switch to the indentation-sensitive scanner. -scan(normal). tells it to switch to the normal scanner.

I soon realised that I had to normalize the indentation tokens at the end of the scan. A few oddities were introduced, like inserting an ‘OUT’ token before each dot (and corresponding additions to the grammar). But for the most part, the additions to the grammar seemed fairly logical. The parser seems to handle all the old code, even though I should perhaps try recompiling the whole OTP source tree before making such a claim.

The code (based on OTP R12B-1) can be found at http://svn.ulf.wiger.net/indent/trunk

The grammar is still contaminated with some debug statements, which allowed me to print the productions as they were identified. They should of course be removed eventually.

I’m not convinced that this is really a good idea, but at least I had fun doing it.