Extending the Erlang shell (part 2)

In my previous post, I mentioned the difference between forms and expressions in Erlang. This article will show how to handle macros and module declarations from within the Erlang shell.

Bypassing the expression parser

The Erlang shell fetches Erlang expressions from the line editor by calling io:parse_erl_exprs(Prompt). The first thing we need to do is to change this, so that we get a lower-level representation. Luckily, io:scan_erl_exprs(Prompt) exists. It allows us to preempt the expression parser, and potentially do something different.

I chose to use the ‘@’ character to break out of the expression parser. For one thing, it stands out, and clearly doesn’t belong to any Erlang expression. The token scanner expects a full stop to follow, so the shortest example of this new mode would be:


1> @.
lists:reverse([1,2,3]).
@
[3,2,1]
2>

When a line starting with ‘@’ is detected, the user is given an empty line, for further input. Data is then collected until we enter another ‘@’ on an empty line.

By default, we’re still parsing expressions, and the shell will treat this just as it would treat a normal multi-line shell command.

So far, we haven’t gained much, except perhaps freedom from the repeating prompt. We can still use e.g. v(1) and e(1) to fetch the value of this command, or to repeat it.

Defining a module from the shell

Now for something truly useful. Let’s use an example from Joe Armstrong’s book, Programming Erlang


2> @ type(module).
-module(geometry).
-export([area/1]).
area({rectangle, Width, Ht}) -> Width * Ht;
area({circle, R})            -> 3.14159 * R * R.
@
{module,geometry}
3>
 geometry:area({rectangle, 10, 5}).
50

We had to add a type modifier to the command, in order to let it know that we intended to enter forms, rather than expressions. The shell collects the input, and then compiles and loads the module. There are three types: module, exprs (default), and def. The def type is for teaching the shell to remember record and macro definitions.

Now for a slightly more advanced example. Mats Cronqvist won the Obfuscated Erlang Competition 2007 with this little beauty:


4> @ type(module).
-module(xXx).
-author('mats cronqvist').
-export([x/1]).

-define(x,x(.
-define(Xx,X.
-define(X,X)

->. ?x?X?Xx.
@

{module,xXx}

All well and good, but what does it actually translate to? We can find out, by adding another modifier: return(pretty). While we’re at it, we will use shorthand notation for referring to the previous command, so we don’t have to type it in again:


5> @(4), return(pretty).
"-file(\"shell_tmp.erl\", 1).\n-module(xXx).\n-author('mats cronqvist').\n-export([x/1]).\nx(X) ->\n X.\n"
6>
io:format("~s~n", [v(5)]).
-file("shell_tmp.erl", 1).
-module(xXx).
-author('mats cronqvist').
-export([x/1]).
x(X) ->
    X.

ok

There are three return types: result (default), parsed, and pretty. The parsed return type is primarily for those who work with parse transform and code generation tools. The way to test what parsed forms correspond to a given code snippet today, is to write a test module, compile it with debug info, and then use beam_lib to extract the code. Most likely, the shell will truncate the output, so you will then have to explicitly print it again to see it all.

Supporting other parsers

Since we parameterized the parsing step already, it is a small step to allow for other parsers as well. The mod(Mod) modifier allows us to call another parser than the standard Erlang parser. The shell will call Mod:shell_parse(Text, Bindings, Env), and it expects the following behaviour:


Mod:shell_parse(Text::string(), Bindings, Env) ->
    {erl_eval, Exprs} |
    {erl_forms, Forms} |
    {erl_defines, Forms, Macros} |
    {result, Result, NewBindings} |
    {error, Reason}

Here’s an example using scheme syntax. It’s a simple 8-line addition to Luke Gorrie’s Lersp code.


Eshell V5.5.4 (abort with ^G)
1>
@ mod(lersp).
(+ 3 5)
@
8

Back to expressions: Variable scoping

The default mode is exprs, which means expression parsing, just like the shell normally does. However, all variables used while in “extended mode” are local:


2> X = 17.
17
3>
@.
X.
@
** exited: {1,erl_lint,{unbound_var,'X'}} **

(The error messages could be cleaned up a bit, but hey – this is a prototype.)

When re-running expressions in the shell, this can actually be an advantage, as we can refrain from having to forget and re-bind variables each time.

If we really do want to use existing variables, we can, through the import(V1,...) modifier. Note that we’re being a bit un-Erlangish here. The modifier doesn’t have a fixed number of arguments. Example:


4> @ import(X).
X.
@
17

Instead of importing variables, we can declare them on the command line:


6> @ X = 3, Y = 5.
X + Y.
@
8

When recalling an extended command, we can change some of the declarations:


7> @(6), Y = 7.
10

We can also export variables bound within the expression list:


8> @(7), export(Y).
10
9>
Y.
7

Defining macros in the shell

In part 1, I showed how records can be defined in the shell. We have now added the ability to define and use macros with local scope within an extended shell command. What if I want to tell the shell to remember macros just like it does records? QuickCheck, for example, uses macros extensively.


10> mr(os:getenv("EQC") ++ "/include/eqc.hrl").
['DELAY',
 'FORALL',
 'FORCE',
 'IMPLIES',
 'LAZY',
 'LET',
 'SHRINK',
 'SHRINKWHILE',
 'SIZED',
 'SUCHTHAT',
 'SUCHTHATMAYBE',
 'WHENFAIL']
11>
@ type(module), report, export_all.
-module(x).
-import(eqc_gen,[int/0,list/1]).
prop_lists_reverse() ->
  ?FORALL(L, list(int()),
    ?FORALL(M, list(int()),
      lists:reverse(L++M) ==
        lists:reverse(L) ++ lists:reverse(M))).
@
{module,x}
12>
eqc:quickcheck(x:prop_lists_reverse()).
....Failed! After 5 tests.
[0]
[1]
false

One may note that we didn’t have an -include() line in our code. The pre-processor found the macro definitions anyway, thanks to a small hack in epp. The changes I made to epp.erl were (a) to allow epp to process an already open file, and (b) to accept an anonymous function that resolves macros not defined in the module source.

We also used a new shell command, mr(Hrl), to read a source file and extract macro definitions, similarly to what we could already do with records. We cannot easily provide a function, md(Name, Tokens) to define a macro in the shell, since the shell parses the command as an expression (remember, macros are not expressions.)

Instead, we can use the type(def) modifier, and define the macro in an extended command:


13> @ type(def).
-define(HELLO, "hello world").
-record(r, {a,b}).
@
[{records,[r]},{macros,['HELLO']}]
14>
@.
#r{a = ?HELLO}.
@
#r{a = "hello world",b = undefined}

As you can see, records can also be defined this way, which is convenient, since it lets us copy and paste normal record definitions into the shell, without having to rewrite them.

Finally

I don’t work for the OTP team, so these are just my private hacks. One can of course imagine plenty of improvements.

The code is available here:
svn co http://svn.ulf.wiger.net/ext_shell/trunk

Extending the Erlang shell (part 1)

I started playing around with different ways of extending the Erlang shell. The reasons, as if a reason were needed to start hacking around, were as follows:

  • When writing his book, Joe Armstrong received a lot of frustrated comments about why it wasn’t possible to declare a module directly in the Erlang shell
  • Having watched Richard Carlsson’s presentation on the array module at EUC 07, I thought I’d try to implement a way to clean up some of the output from the shell.
  • John Hughes commented that it would be really nice (for QuickCheck users) to be able to use macros in the shell.

Understanding the shell

The Erlang shell is line-oriented and expression-oriented. That last bit may warrant some explanation.

Erlang has two levels of grammar: forms and expressions. A form is a module-level construct, and is either a module attribute (e.g. ‘-module(m).’, ‘-record(r, {a,b}).’, etc.) or a function declaration. Expressions are one level underneath: the arguments of a function, and the function body, consist of expressions.

Macros are neither forms nor expressions. They are pre-processor directives, and are understood only by the Erlang pre-processor, epp. Once the form or expression reaches the parser, macros have already been replaced by their corresponding tokens. A macro definition does not have to be syntactically complete on its own, so we may not even be able to trick the parser into understanding it.

This all makes it a bit tricky to support record- and macro definitions in the shell. Recall that neither are expressions, and the shell only deals with expressions.

Still, the shell (sort of) understands records already. It provides a set of built-in functions for importing, defining and listing records:


rd(R,D) -- define a record
rf() -- remove all record information
rf(R) -- remove record information about R
rl() -- display all record information
rl(R) -- display record information about R
rp(Term) -- display Term using the shell's record information
rr(File) -- read record information from File (wildcards allowed)
rr(F,R) -- read selected record information from file(s)
rr(F,R,O) -- read selected record information with options

A quirk that can be particularly annoying at times is that, when inserting a multi-line command, the shell repeats the prompt at each continuation line:


3> F = fun(X) ->
3>           X + 2
3>     end.
#Fun<erl_eval .6.72228031>

This makes it difficult to copy a command from the shell, and reuse it later.

Extended pretty-printing

The Erlang pretty-printer uses a few tricks to try to make the output a bit easier on the eye. It tries to print lists of bytes as text strings, and, if records have been defined in the shell, formats records using record syntax. It also tries to indent tagged tuples for better readability.


5> {tag, "element 2 of the tuple", "element 3 of the tuple", "element 4 of the tuple"}.
{tag,"element 2 of the tuple",
     "element 3 of the tuple",
     "element 4 of the tuple"}
6>
rd(r, {a,b}).
r
7> #r{a = 17, b = [x]}.
#r{a = 17,b = [x]}

This is done using an undocumented function called io_lib_pretty:print(Term, Column, LineLength, Depth, RecDefFun). The fun RecDefFun is called when the pretty-printer finds a tagged record, in order to determine whether it is a known record.

My extension

I made the function io_lib_pretty:print/5 a bit more generic, so that it could be provided a function that recognizes any type of data structure and describes a compact representation. I also extended the shell with functions to load such funs (output filters):


fa(Tag, Fun) - registers a filter fun
fl() - lists filter funs
fr(Tag) - removes a filter fun

To illustrate, let’s look at the dict module. It is a fine module, implementing a functional hash table with O(1) access. However, when pretty-printed, a dict data structure looks pretty ugly:


1> dict:from_list([{N,a} || N < - lists:seq(1,5)]). {dict,5,
     16,
     16,
     8,
     80,
     48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],
       [[3|a]],
       [],
       [],
       [],
       [],
       [[2|a]],
       [[5|a]],
       [],
       [],
       [],
       [[1|a]],
       [[4|a]],
       [],
       [],
       []}}}

An output filter that recognizes a dict structure would look like this:


fun(#dict{}=D) ->
       {custom,dict,true,dict:dict_to_list(D)};
   (_) ->
       no
end

And using the new shell functionality, we would see this:


Eshell V5.5.4 (abort with ^G)
1>
rr(code:which(dict)).
[dict]
2>
fa(dict,fun(#dict{}=D) -> {custom,dict,true,dict:dict_to_list(D)}; (_) -> no end).
true
3>
dict:from_list([{N,a} || N <- lists:seq(1,5)]).
<|dict:[{3,a},{2,a},{5,a},{1,a},{4,a}]|>

The shell will still apply the record pretty printing filter, if none of the user-provided filters match, but the user-provided filters are tried first, in alphabetical order (by tag). The following filter might be used to clean up the output from xmerl:


4> rr(code:which(xmerl)).
[xmerl_event,
...,
xmlText]
7>
fa(xml,fun(#xmlElement{}=E) -> {custom,xml,true,xmerl_lib:simplify_element(E)}; (_) -> no end).
true
8>
xmerl_scan:file("/home/uwiger/dev/flex2/samples/restaurant/build.xml").
{<|xml:{project,[{name,"restaurant"},{default,"war"},{basedir,"."}],
                ["\n\t\t\n\t",
                 {target,[{name,"compile"}],
                         ["\n\t\t",
                          {javac,[{srcdir,"WEB-INF/src"},
                                  {destdir,"WEB-INF/classes"},
                                  {includes,"**/*.java"}],
                                 []},
                          "\n\t"]},
                 "\n\t\n\t",
                 {target,[{name,"war"},
                          {depends,"compile"},
                          {description,"Creates a WAR for running the web services"}],
                         ["\n\t\t",
                          {jar,[{destfile,"restaurant.war"},
                                {basedir,"${basedir}"},
                                {includes,"WEB-INF/**, recentReviews.jsp"}],
                               []},
                          "\n\t"]},
                 "\n\n"]}|>,
 []}

For reference, the build.xml file looks like this:


<project name="restaurant" default="war" basedir=".">
  <target name="compile">
    <javac srcdir="WEB-INF/src" destdir="WEB-INF/classes" includes="**/*.java"/>
  </target>
  <target name="war" depends="compile" description="Creates a WAR for running the web services">
    <jar destfile="restaurant.war" basedir="${basedir}" includes="WEB-INF/**, recentReviews.jsp"/>
  </target>
</project>

Next article will describe how to enter module declarations, macros and alternative syntax in the Erlang shell: