So, maybe I’m thick enough not to realise in advance when I’m barking up the wrong tree, or perhaps I just like to follow things through to the bitter end, just to know what exactly didn’t work…?

I’ve gone one more round with my indentation-sensitive Erlang scanner/parser. For a while, it looked like I was winning, but eventually, I had to admit defeat in the face of funs and record constructors.

Again, the approach I thought I’d take was to add indentation tokens to the normal scanner, and then add rule clauses to the normal grammar to make it understand both indentation-sensitive code, and all the code you’re accustomed to writing. The normal parser is an LR(1) grammar, which means that the additions have to be quite symmetrical in order to work.

This seemed to work in just about all the places that mattered, but I was stumped by funs and record constructors. The main with funs was that you could no longer write them in the conventional way (failing one of my preconditions), and record constructors usually cannot be written without including a few commans in a way that was …uhm, less than obvious.

I include the test module, which pretty much illustrates what works, and what doesn’t. At this point, I doubt that I can achieve better than a 90% solution, which is probably just good enough to eventually drive people crazy.

It’s been fun, though. I’m have no particular craving for indentation-sensitive syntax myself, so I thought I’d tackle this as a learning experience. Not for the first time, I feel like concluding that retrofitting concepts onto existing programming languages is usually very difficult to do well.

The code is at http://svn.ulf.wiger.net/indent/branches/0.3

-module(test).

-record(r,{a,b}).

-compile(export_all).
-scan(indentation).

f(X) ->
    X+2


g(X) ->
    X+4
.          % dot can either be 'outdented' or
           % terminating last line

g1(X) ->
    X+4.


h(X) ->
    Y = case X of
          a ->
            {a}
          b ->
            {b}
    end          % end is optional
    Y

h1(X) ->
    Y = case X of
          a ->
            {a}
          b ->
            {b}
    Y

h2(X) ->
    case X of
        a ->
            1
        b ->
            2

h3(X) ->
    case X of
        a -> 1;  % must use semicolon here :-(
        b -> 2
             

i(a) -> a
i(b) -> b


j(A,B) ->
    {A
    B}

j1(A,B,C) ->
    {A
     B     % indents must be greater than 1;
     C}    % else they count as aligned

k() ->
    S = "a string "
        "spanning multiple "
        "lines"
    {S}


l0() ->
    fun(1) -> 1; (2) -> 2 end.

l1() ->
    fun            % must break line here and indent.  :-(
      (1) ->
            1
      (2) ->
            2

%%% This, alas, doesn't work :-(
%%%
%%% l2() ->
%%%     fun(1) ->
%%%             1;
%%%        (2) ->
%%%             2
%%%     end

m() ->
    #r{a = 1, b = 2}.

m1() ->
    R = #r{a = 1}
    R#r.a

%%% indentation syntax works poorly for record assignment.
%%% Both commas are needed :-(
m2() ->
    R = #r{a = 1,
           b = 2},
    R

test() ->
    2 = f(0),
    4 = f(2),
    4 = g(0),
    4 = g1(0),
    8 = g(4),
    {a} = h(a),
    {b} = h(b),
    {a} = h1(a),
    {b} = h1(b),
    1 = h2(a),
    2 = h2(b),
    1 = h3(a),
    2 = h3(b),
    a = i(a),
    b = i(b),
    {a,b} = j(a,b),
    {a,b,c} = j1(a,b,c),
    {"a string spanning multiple lines"} = k(),
    F0 = l0(), 1 = F0(1), 2 = F0(2),
    F1 = l1(), 1 = F1(1), 2 = F1(2),
    {r,1,2} = m(),
    1 = m1(),
    ok.