March 2008


So, I considered the comments (thank you, all), and thought I’d have another go at making the ending ‘dot’ optional.

I decided to introduce another token, ‘GAP’, to denote an empty line. Most likely, the scanner, in its current state, will not be able to handle empty lines with white space in them, etc, and the code is starting to look a bit confused. Oh well…

The toplevel rule for a function now becomes:


form -> function dot : '$1'.
form -> function 'GAP' : '$1'.

and the rule for alternative function clauses is as before:

function_clauses -> function_clause : ['$1'].
function_clauses ->
   function_clause ';' function_clauses : ['$1'|'$3'].
function_clauses -> 
  function_clause 'OUT' : ['$1'].
function_clauses ->
  function_clause 'END' function_clauses : ['$1'|'$3'].

The first two rules are the original rules for indentation-insensitive code. The last two are for the indentation tokens. The ‘OUT’ token is for symmetry, to match the ‘IN’ token after the arrow in function_body. Remember that indentation tokens are normalized in the scanner.

The test program now looks like this:

-module(test).

-compile(export_all).
-scan(indentation).

f(X) ->
    X+2

g(X) ->
    X+4
.

h(X) ->
    Y = case X of
          a ->
            {a}
          b ->
            {b}
    end
    Y

i(a) -> a
i(b) -> b

test() ->
    2 = f(0),
    4 = f(2),
    4 = g(0),
    8 = g(4),
    {a} = h(a),
    {b} = h(b),
    a = i(a),
    b = i(b),
    ok.

A little bit better. The ‘end’ tokens are still needed, though. One thing at a time…

Go straight to Post

I was inspired by Chris Okasaki’s blog article about mandatory indentation. Not that indentation could be made mandatory in Erlang – it would break way too much code – but the idea of inserting indentation tokens in the token stream did seem simple enough, that I at least had to try it.

I made a copy of erl_scan.erl (named erl_scan_ind.erl) and made it figure out indentation tokens. Then I added to the Erlang grammar in erl_parse.yrl. All the old rules remain, but some new rules were added to account for indentation tokens. For example:


clause_body -> '->' exprs: '$2'

becomes:


clause_body -> '->' exprs: '$2'
clause_body -> '->' 'IN' exprs 'OUT' : '$3'.

The indentation tokens I used were:

  • ‘IN’ for indent
  • ‘OUT’ for outdent (one for each matching indent)
  • ‘ALIGN’ for when the next line keeps the same indentation
  • ‘END’ when indentation goes back to zero

So a sequence of expressions could be written without commas, based on the following rule:


exprs -> expr : ['$1'].
exprs -> expr ',' exprs : ['$1' | '$3'].
exprs -> expr 'ALIGN' exprs : ['$1' | '$3'].

My test program, which I was eventually able to compile, looked like this:

-module(test).

-compile(export_all).
-scan(indentation).

f(X) ->
    X+2
.

g(X) ->
    X+4
.

h(X) ->
    Y = case X of
          a ->
            {a}
          b ->
            {b}
    end
    Y
.

(Note especially that the final ‘end’ must be aligned with the Y, rather than the ‘case’. Perhaps this could be avoided…?)

The ending dots don’t have to be on their own line. Getting rid of them was too hard for me, since ‘dot’ is the end token for the Erlang grammar.

The -scan(indentation). attribute tells epp to switch to the indentation-sensitive scanner. -scan(normal). tells it to switch to the normal scanner.

I soon realised that I had to normalize the indentation tokens at the end of the scan. A few oddities were introduced, like inserting an ‘OUT’ token before each dot (and corresponding additions to the grammar). But for the most part, the additions to the grammar seemed fairly logical. The parser seems to handle all the old code, even though I should perhaps try recompiling the whole OTP source tree before making such a claim.

The code (based on OTP R12B-1) can be found at http://svn.ulf.wiger.net/indent/trunk

The grammar is still contaminated with some debug statements, which allowed me to print the productions as they were identified. They should of course be removed eventually.

I’m not convinced that this is really a good idea, but at least I had fun doing it.

Go straight to Post