John Hughes: Testing with QuickCheck

From a Functional Programming seminar at Ericsson, 21 February 2008

(See The full article for more details).

John Hughes
“Testing with QuickCheck” (Download) (Slides)

What is Erlang-Style Concurrency?

Every now and then, a blog article appears on how to do “Erlang-Style Concurrency” in language X. I think this is a very good thing, as it indicates that people are starting to take a long hard look at concurrency.

However, there doesn’t seem to be any authoritative definition of the term “Erlang-Style Concurrency”. I thought I’d at least try to give my view.

Joe Armstrong writes the following in his book, Programming Erlang:

“In Erlang:

  • Creating and destroying processes is very fast.
  • Sending messages between processes is very fast.
  • Processess behave the same way on all operating systems.
  • We can have very large numbers of processes.
  • Processes share no memory and are completely independent.
  • The only way for processes to interact is through message passing.”

I guess this is about as concise and authoritative definition as any. However, it’s not the whole story.
I would add the following items:

  • Message passing is asynchronous.
  • Processes can monitor each other.
  • It is possible to selectively receive messages.
  • Remote processes appear largely the same as local processes.

So this roughly describes how concurrency works in Erlang. Are all these bullets necessary for Erlang-style Concurrency? Perhaps not.

I would pick the following items as necessary for Erlang-style Concurrency:

  • Fast process creation/destruction
  • Ability to support >> 10 000 concurrent processes with largely unchanged characteristics.
  • Fast asynchronous message passing.
  • Copying message-passing semantics (share-nothing concurrency).
  • Process monitoring.
  • Selective message reception.

A few words to explain the above:

Speed and scalability
In order for concurrency to be useful as a fundamental modelling paradigm, the programmer must feel that processes are cheap enough you can efficiently create as many processes as the problem calls for. If there is any single defining characteristic of Erlang-style Concurrency, it is that you should be able to model your application after the natural concurrency patterns present in your problem. If creating a process is perceived as expensive, programmers will reuse existing processes instead; if message passing is perceived to be expensive, various techniques will be invented to avoid sending messages. These workarounds are nearly always detrimental.

There will always be limits. Erlang was designed for agent-style concurrency; not for massive data parallelism. You can “only” have ca 120 million concurrent processes in Erlang, provided you have enough memory. Personally, I’ve verified that Erlang performs consistently up to 20 million concurrent processes (I didn’t have enough memory to go further.) The cost of creating an Erlang process (ca 5-10 us) will be prohibitive for some applications, but it’s at least an order of magnitude cheaper than creating UNIX processes or POSIX threads. It’s reasonable to expect Erlang-style Concurrency to use roughly the same process granularity as Erlang does.

Asynchronous message passing
It has been argued that synchronous message passing is a better building block than asynchronous message passing. It is probably true that synchronous message passing is much easier to reason about, but asynchronous communication (“send and pray”) feels more intuitive in a distributed environment (in which case environments based on synchronous message passing will resort to some form of asynchronous communication as well). Anyway, it is reasonable to argue that asynchronous communication is a defining characteristic of Erlang-style Concurrency.

Copying semantics
Note that this doesn’t necessarily mean that messages must be copied, but they must act as if they were. There are several reasons why this is important:

  • For reliability reasons, processes must not share memory
  • In the distributed case, copying is inevitable, and we want as similar semantics as possible between local and distributed message passing.

It is perhaps too strong to require that processes share nothing, as this would rule out Erlang-style concurrency in most existing languages (even languages like Scala make it possible to share data between processes). Let’s just say that the Erlang style is to make it easier to use share-nothing concurrency than to use sharing.

Process monitoring
This is the enabler of “out-of-band” error handling, or the “let it crash” philosophy, which is very typical for Erlang. The idea is that you program for the correct case, and rely on supervising processes for error handling. This has turned out to be a beautiful way of handling fault tolerance in large systems.

Selective message reception
There are many ways to achieve selective message reception, but for complex multi-way concurrency, you have to support at least one of them. This is something often overlooked when approaching concurrency. It’s not obviously needed in simple programs, but when it is needed, you will suffer complexity explosion if you don’t have it.

I would suggest that ability to support selective message reception is sufficient for Erlang-style concurrency. It can be done either by matching on a single message queue (Erlang, OSE Delta), or by using different mailboxes/channels (Haskell, .NET, UNIX sockets). Pick one, and find a suitable example to experiment with. You may use my presentation “Structured Network Programming” (also attached as pdf) as inspiration, if you like.

Extending the Erlang shell (part 2)

In my previous post, I mentioned the difference between forms and expressions in Erlang. This article will show how to handle macros and module declarations from within the Erlang shell.

Bypassing the expression parser

The Erlang shell fetches Erlang expressions from the line editor by calling io:parse_erl_exprs(Prompt). The first thing we need to do is to change this, so that we get a lower-level representation. Luckily, io:scan_erl_exprs(Prompt) exists. It allows us to preempt the expression parser, and potentially do something different.

I chose to use the ‘@’ character to break out of the expression parser. For one thing, it stands out, and clearly doesn’t belong to any Erlang expression. The token scanner expects a full stop to follow, so the shortest example of this new mode would be:


1> @.
lists:reverse([1,2,3]).
@
[3,2,1]
2>

When a line starting with ‘@’ is detected, the user is given an empty line, for further input. Data is then collected until we enter another ‘@’ on an empty line.

By default, we’re still parsing expressions, and the shell will treat this just as it would treat a normal multi-line shell command.

So far, we haven’t gained much, except perhaps freedom from the repeating prompt. We can still use e.g. v(1) and e(1) to fetch the value of this command, or to repeat it.

Defining a module from the shell

Now for something truly useful. Let’s use an example from Joe Armstrong’s book, Programming Erlang


2> @ type(module).
-module(geometry).
-export([area/1]).
area({rectangle, Width, Ht}) -> Width * Ht;
area({circle, R})            -> 3.14159 * R * R.
@
{module,geometry}
3>
 geometry:area({rectangle, 10, 5}).
50

We had to add a type modifier to the command, in order to let it know that we intended to enter forms, rather than expressions. The shell collects the input, and then compiles and loads the module. There are three types: module, exprs (default), and def. The def type is for teaching the shell to remember record and macro definitions.

Now for a slightly more advanced example. Mats Cronqvist won the Obfuscated Erlang Competition 2007 with this little beauty:


4> @ type(module).
-module(xXx).
-author('mats cronqvist').
-export([x/1]).

-define(x,x(.
-define(Xx,X.
-define(X,X)

->. ?x?X?Xx.
@

{module,xXx}

All well and good, but what does it actually translate to? We can find out, by adding another modifier: return(pretty). While we’re at it, we will use shorthand notation for referring to the previous command, so we don’t have to type it in again:


5> @(4), return(pretty).
"-file(\"shell_tmp.erl\", 1).\n-module(xXx).\n-author('mats cronqvist').\n-export([x/1]).\nx(X) ->\n X.\n"
6>
io:format("~s~n", [v(5)]).
-file("shell_tmp.erl", 1).
-module(xXx).
-author('mats cronqvist').
-export([x/1]).
x(X) ->
    X.

ok

There are three return types: result (default), parsed, and pretty. The parsed return type is primarily for those who work with parse transform and code generation tools. The way to test what parsed forms correspond to a given code snippet today, is to write a test module, compile it with debug info, and then use beam_lib to extract the code. Most likely, the shell will truncate the output, so you will then have to explicitly print it again to see it all.

Supporting other parsers

Since we parameterized the parsing step already, it is a small step to allow for other parsers as well. The mod(Mod) modifier allows us to call another parser than the standard Erlang parser. The shell will call Mod:shell_parse(Text, Bindings, Env), and it expects the following behaviour:


Mod:shell_parse(Text::string(), Bindings, Env) ->
    {erl_eval, Exprs} |
    {erl_forms, Forms} |
    {erl_defines, Forms, Macros} |
    {result, Result, NewBindings} |
    {error, Reason}

Here’s an example using scheme syntax. It’s a simple 8-line addition to Luke Gorrie’s Lersp code.


Eshell V5.5.4 (abort with ^G)
1>
@ mod(lersp).
(+ 3 5)
@
8

Back to expressions: Variable scoping

The default mode is exprs, which means expression parsing, just like the shell normally does. However, all variables used while in “extended mode” are local:


2> X = 17.
17
3>
@.
X.
@
** exited: {1,erl_lint,{unbound_var,'X'}} **

(The error messages could be cleaned up a bit, but hey – this is a prototype.)

When re-running expressions in the shell, this can actually be an advantage, as we can refrain from having to forget and re-bind variables each time.

If we really do want to use existing variables, we can, through the import(V1,...) modifier. Note that we’re being a bit un-Erlangish here. The modifier doesn’t have a fixed number of arguments. Example:


4> @ import(X).
X.
@
17

Instead of importing variables, we can declare them on the command line:


6> @ X = 3, Y = 5.
X + Y.
@
8

When recalling an extended command, we can change some of the declarations:


7> @(6), Y = 7.
10

We can also export variables bound within the expression list:


8> @(7), export(Y).
10
9>
Y.
7

Defining macros in the shell

In part 1, I showed how records can be defined in the shell. We have now added the ability to define and use macros with local scope within an extended shell command. What if I want to tell the shell to remember macros just like it does records? QuickCheck, for example, uses macros extensively.


10> mr(os:getenv("EQC") ++ "/include/eqc.hrl").
['DELAY',
 'FORALL',
 'FORCE',
 'IMPLIES',
 'LAZY',
 'LET',
 'SHRINK',
 'SHRINKWHILE',
 'SIZED',
 'SUCHTHAT',
 'SUCHTHATMAYBE',
 'WHENFAIL']
11>
@ type(module), report, export_all.
-module(x).
-import(eqc_gen,[int/0,list/1]).
prop_lists_reverse() ->
  ?FORALL(L, list(int()),
    ?FORALL(M, list(int()),
      lists:reverse(L++M) ==
        lists:reverse(L) ++ lists:reverse(M))).
@
{module,x}
12>
eqc:quickcheck(x:prop_lists_reverse()).
....Failed! After 5 tests.
[0]
[1]
false

One may note that we didn’t have an -include() line in our code. The pre-processor found the macro definitions anyway, thanks to a small hack in epp. The changes I made to epp.erl were (a) to allow epp to process an already open file, and (b) to accept an anonymous function that resolves macros not defined in the module source.

We also used a new shell command, mr(Hrl), to read a source file and extract macro definitions, similarly to what we could already do with records. We cannot easily provide a function, md(Name, Tokens) to define a macro in the shell, since the shell parses the command as an expression (remember, macros are not expressions.)

Instead, we can use the type(def) modifier, and define the macro in an extended command:


13> @ type(def).
-define(HELLO, "hello world").
-record(r, {a,b}).
@
[{records,[r]},{macros,['HELLO']}]
14>
@.
#r{a = ?HELLO}.
@
#r{a = "hello world",b = undefined}

As you can see, records can also be defined this way, which is convenient, since it lets us copy and paste normal record definitions into the shell, without having to rewrite them.

Finally

I don’t work for the OTP team, so these are just my private hacks. One can of course imagine plenty of improvements.

The code is available here:
svn co http://svn.ulf.wiger.net/ext_shell/trunk

Extending the Erlang shell (part 1)

I started playing around with different ways of extending the Erlang shell. The reasons, as if a reason were needed to start hacking around, were as follows:

  • When writing his book, Joe Armstrong received a lot of frustrated comments about why it wasn’t possible to declare a module directly in the Erlang shell
  • Having watched Richard Carlsson’s presentation on the array module at EUC 07, I thought I’d try to implement a way to clean up some of the output from the shell.
  • John Hughes commented that it would be really nice (for QuickCheck users) to be able to use macros in the shell.

Understanding the shell

The Erlang shell is line-oriented and expression-oriented. That last bit may warrant some explanation.

Erlang has two levels of grammar: forms and expressions. A form is a module-level construct, and is either a module attribute (e.g. ‘-module(m).’, ‘-record(r, {a,b}).’, etc.) or a function declaration. Expressions are one level underneath: the arguments of a function, and the function body, consist of expressions.

Macros are neither forms nor expressions. They are pre-processor directives, and are understood only by the Erlang pre-processor, epp. Once the form or expression reaches the parser, macros have already been replaced by their corresponding tokens. A macro definition does not have to be syntactically complete on its own, so we may not even be able to trick the parser into understanding it.

This all makes it a bit tricky to support record- and macro definitions in the shell. Recall that neither are expressions, and the shell only deals with expressions.

Still, the shell (sort of) understands records already. It provides a set of built-in functions for importing, defining and listing records:


rd(R,D) -- define a record
rf() -- remove all record information
rf(R) -- remove record information about R
rl() -- display all record information
rl(R) -- display record information about R
rp(Term) -- display Term using the shell's record information
rr(File) -- read record information from File (wildcards allowed)
rr(F,R) -- read selected record information from file(s)
rr(F,R,O) -- read selected record information with options

A quirk that can be particularly annoying at times is that, when inserting a multi-line command, the shell repeats the prompt at each continuation line:


3> F = fun(X) ->
3>           X + 2
3>     end.
#Fun<erl_eval .6.72228031>

This makes it difficult to copy a command from the shell, and reuse it later.

Extended pretty-printing

The Erlang pretty-printer uses a few tricks to try to make the output a bit easier on the eye. It tries to print lists of bytes as text strings, and, if records have been defined in the shell, formats records using record syntax. It also tries to indent tagged tuples for better readability.


5> {tag, "element 2 of the tuple", "element 3 of the tuple", "element 4 of the tuple"}.
{tag,"element 2 of the tuple",
     "element 3 of the tuple",
     "element 4 of the tuple"}
6>
rd(r, {a,b}).
r
7> #r{a = 17, b = [x]}.
#r{a = 17,b = [x]}

This is done using an undocumented function called io_lib_pretty:print(Term, Column, LineLength, Depth, RecDefFun). The fun RecDefFun is called when the pretty-printer finds a tagged record, in order to determine whether it is a known record.

My extension

I made the function io_lib_pretty:print/5 a bit more generic, so that it could be provided a function that recognizes any type of data structure and describes a compact representation. I also extended the shell with functions to load such funs (output filters):


fa(Tag, Fun) - registers a filter fun
fl() - lists filter funs
fr(Tag) - removes a filter fun

To illustrate, let’s look at the dict module. It is a fine module, implementing a functional hash table with O(1) access. However, when pretty-printed, a dict data structure looks pretty ugly:


1> dict:from_list([{N,a} || N < - lists:seq(1,5)]). {dict,5,
     16,
     16,
     8,
     80,
     48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],
       [[3|a]],
       [],
       [],
       [],
       [],
       [[2|a]],
       [[5|a]],
       [],
       [],
       [],
       [[1|a]],
       [[4|a]],
       [],
       [],
       []}}}

An output filter that recognizes a dict structure would look like this:


fun(#dict{}=D) ->
       {custom,dict,true,dict:dict_to_list(D)};
   (_) ->
       no
end

And using the new shell functionality, we would see this:


Eshell V5.5.4 (abort with ^G)
1>
rr(code:which(dict)).
[dict]
2>
fa(dict,fun(#dict{}=D) -> {custom,dict,true,dict:dict_to_list(D)}; (_) -> no end).
true
3>
dict:from_list([{N,a} || N <- lists:seq(1,5)]).
<|dict:[{3,a},{2,a},{5,a},{1,a},{4,a}]|>

The shell will still apply the record pretty printing filter, if none of the user-provided filters match, but the user-provided filters are tried first, in alphabetical order (by tag). The following filter might be used to clean up the output from xmerl:


4> rr(code:which(xmerl)).
[xmerl_event,
...,
xmlText]
7>
fa(xml,fun(#xmlElement{}=E) -> {custom,xml,true,xmerl_lib:simplify_element(E)}; (_) -> no end).
true
8>
xmerl_scan:file("/home/uwiger/dev/flex2/samples/restaurant/build.xml").
{<|xml:{project,[{name,"restaurant"},{default,"war"},{basedir,"."}],
                ["\n\t\t\n\t",
                 {target,[{name,"compile"}],
                         ["\n\t\t",
                          {javac,[{srcdir,"WEB-INF/src"},
                                  {destdir,"WEB-INF/classes"},
                                  {includes,"**/*.java"}],
                                 []},
                          "\n\t"]},
                 "\n\t\n\t",
                 {target,[{name,"war"},
                          {depends,"compile"},
                          {description,"Creates a WAR for running the web services"}],
                         ["\n\t\t",
                          {jar,[{destfile,"restaurant.war"},
                                {basedir,"${basedir}"},
                                {includes,"WEB-INF/**, recentReviews.jsp"}],
                               []},
                          "\n\t"]},
                 "\n\n"]}|>,
 []}

For reference, the build.xml file looks like this:


<project name="restaurant" default="war" basedir=".">
  <target name="compile">
    <javac srcdir="WEB-INF/src" destdir="WEB-INF/classes" includes="**/*.java"/>
  </target>
  <target name="war" depends="compile" description="Creates a WAR for running the web services">
    <jar destfile="restaurant.war" basedir="${basedir}" includes="WEB-INF/**, recentReviews.jsp"/>
  </target>
</project>

Next article will describe how to enter module declarations, macros and alternative syntax in the Erlang shell:

erlhive coming along

I presented erlhive at this year’s Erlang User Conference. Now there is a sourceforge project, http://erlhive.sourceforge.net for it.

For a web framework, I admit that we’re still a bit weak on the “web” part, but Joe has moved his web page template system into erlhive, and has a yaws-based authenticating front-end. We’re working on a basic multi-user blog as a first demo.

It’s a fun division of labour: Joe works on the front-end, and I work on the back-end. One of the cool additions to the back-end is the erlhive_shell, a version of the erlang shell that allows you to do only the stuff that’s legal in erlhive code. It offers the usual command history, record formatting, etc. of the erlang shell, and does tab completion very nicely – expanding erlhive usernames and accessible modules.

I also added built-in commands for tracing in the erlhive_shell. Calling trace(calls) turns on call tracing, and trace(off) turns it off. While on, a complete call trace is presented for each expression evaluated in the shell.

A fun project for a volunteer might be to write some JavaScript making the shell accessible through a web browser…

Rdbms beta-testing

I thought I’d use this blog to post temporary versions of the new ‘rdbms’.

The documentation for ‘rdbms’ is incomplete, but what there is, you can read here:
rdbms.html

The old documentation, while now slightly outdated, actually describes some things
better than the new docs (e.g. the semantics of referential integrity):
rdbms-1_5.pdf

For those who have trouble using sourceforge(*), you can download a copy of rdbms from here:
rdbms-1.991.tgz

The code has been “de-jungerlified”, in that the Makefiles have been replaced with standalone counterparts; ‘make all’ should work… at least on unix.

(*) Normally, you should fetch ‘rdbms’ from http://jungerl.sourceforge.net