Indentation-sensitive Erlang

I was inspired by Chris Okasaki’s blog article about mandatory indentation. Not that indentation could be made mandatory in Erlang – it would break way too much code – but the idea of inserting indentation tokens in the token stream did seem simple enough, that I at least had to try it.

I made a copy of erl_scan.erl (named erl_scan_ind.erl) and made it figure out indentation tokens. Then I added to the Erlang grammar in erl_parse.yrl. All the old rules remain, but some new rules were added to account for indentation tokens. For example:


clause_body -> '->' exprs: '$2'

becomes:


clause_body -> '->' exprs: '$2'
clause_body -> '->' 'IN' exprs 'OUT' : '$3'.

The indentation tokens I used were:

  • ‘IN’ for indent
  • ‘OUT’ for outdent (one for each matching indent)
  • ‘ALIGN’ for when the next line keeps the same indentation
  • ‘END’ when indentation goes back to zero

So a sequence of expressions could be written without commas, based on the following rule:


exprs -> expr : ['$1'].
exprs -> expr ',' exprs : ['$1' | '$3'].
exprs -> expr 'ALIGN' exprs : ['$1' | '$3'].

My test program, which I was eventually able to compile, looked like this:

-module(test).

-compile(export_all).
-scan(indentation).

f(X) ->
    X+2
.

g(X) ->
    X+4
.

h(X) ->
    Y = case X of
          a ->
            {a}
          b ->
            {b}
    end
    Y
.

(Note especially that the final ‘end’ must be aligned with the Y, rather than the ‘case’. Perhaps this could be avoided…?)

The ending dots don’t have to be on their own line. Getting rid of them was too hard for me, since ‘dot’ is the end token for the Erlang grammar.

The -scan(indentation). attribute tells epp to switch to the indentation-sensitive scanner. -scan(normal). tells it to switch to the normal scanner.

I soon realised that I had to normalize the indentation tokens at the end of the scan. A few oddities were introduced, like inserting an ‘OUT’ token before each dot (and corresponding additions to the grammar). But for the most part, the additions to the grammar seemed fairly logical. The parser seems to handle all the old code, even though I should perhaps try recompiling the whole OTP source tree before making such a claim.

The code (based on OTP R12B-1) can be found at http://svn.ulf.wiger.net/indent/trunk

The grammar is still contaminated with some debug statements, which allowed me to print the productions as they were identified. They should of course be removed eventually.

I’m not convinced that this is really a good idea, but at least I had fun doing it.

10 thoughts on “Indentation-sensitive Erlang

  1. Interesting hack! I loathe indentation-based syntax (after all, the tab is an invisible character.. not a wise denotation for important structure), but it’s really cool that you were able to do this without a complete rewrite of the parser. Is it possible that we could use a similar trick to add other useful tricks to erlang, like a syntax-level records fix, or literal syntax like { name: “Tom”, age: 26 } for proplists?

  2. Reads a lot like Haskell now… At least with Emacs as a crutch, the Haskell’s indentation rules do help with the visual detection of errors.

  3. Cool experiment!

    “(Note especially that the final ‘end’ must be aligned with the Y, rather than the ‘case’. Perhaps this could be avoided…?)”

    A fairly typical approach would be to use the indentation to allow you to omit the ‘end’ altogether.

  4. “The ending dots don’t have to be on their own line. Getting rid of them was too hard for me, since ‘dot’ is the end token for the Erlang grammar.”

    but it seems so easy; you’d need to generate an END token when indentation goes to zero, or @ eof?
    what am i missing?

  5. I did try to omit the ‘end’, but were unsuccessful in my first attempt. Perhaps there is a clever way to formulate the grammar so that it works, but I have yet to spot it. I welcome any help in the matter.

    The problem with getting rid of the ‘dot’ was that the erlang compiler scans one form at a time, and the form ends with a dot. I guess the scanner could signal end-of-form at either a ‘dot’ or an ‘END’, whichever comes first…

  6. Oh, yes, here was another problem:
    Alternative function clauses are separated by ‘;’, which with indentation could be replaced by an ‘END’. But if ‘END’ terminates the last function clause, we get a syntax error, unless we change the grammar in a way similar as we would if we wanted every function clause to end with a ‘dot’.

    So what’s the solution? Perhaps we should have a token denoting an empty line? Then, successive function clauses could be separated by newlines (without ‘;’), and the end of the function is marked by an empty line?

  7. @thomas lackner:
    Using tab char should definitely be avoided. Just use normal spaces for indentations.

  8. Python required indentation has some value for the final code, and does avoid the need for end-brackets. I have found it both irritating and dangerous in refactoring. It is all too easy to move some code and have it accidentally syntactically correct when it should be indented or outdented. Now the code is pretty and wrong. I prefer pretty printers.

  9. @Orielt – Could you expand on what it is that doesn’t work? I just checked it with Safari/Windows, and it wasn’t immediately obvious to me what would be wrong (posting this comment from Safari, btw).

Leave a Reply

Your email address will not be published. Required fields are marked *