Hello Erlang!

Erlang Logo

It’s been a while since I’ve posted anything, again. So here’s something!

I first heard about Erlang a few years back now but I was having too much fun learning PHP at the time and didn’t look into it very far. I’ve noticed lately that Erlang seems to be popping up more often, so I thought it was about time I gave it a look.

Erlang, created by Joe Armstrong and originating from Ericsson, is an open-source language designed with concurrency, distribution and the communication between independent, isolated processes as the main focus. Spawning processes is a quick and cheap operation in Erlang and a single Erlang application could run across multiple machines just as easily as it could run on a single machine.

Here are some observations from the point of view of a PHP/Zend developer. I won’t even mention performance, one of the generally more talked about features of Erlang. I haven’t run any benchmarks, and I’m not particularly inclined to do so. All I’ll say is, it’s fast.

This is by no means an “Introduction to Erlang”, it’s just some ramblings. If you want a proper thought-through and organised introduction, check the links at the end of this post.

The Erlang Mentality

Erlang does seem to require you to re-learn how you think about problems to a degree. Or at least I’m having to. The mantra of “lots of small processes” needs a bit of hammering home for me. I mean, tiny processes. As far as I can tell, if each process only had a single task, then that’d probably be just fine. It’s all about distribution, doing things concurrently, asynchronously, and mostly, it’s all about the communication between isolated processes.

Another Erlang mantra is “let it crash”. Since processes in Erlang are all isolated, the idea is that it’s better just to let the process crash and log as much as you can about why it crashed than it is to attempt a recovery, potentially with broken data.

In the case of an IRC bot, on receiving each line from the IRC server, I would spawn a parser process, pass it the line, and it would go off and parse the line, sending back a native Erlang term representing the parsed IRC packet. Since the parser processes only have a single responsibility, “couldn’t parse line” and “crashed while trying to parse line” are essentially the same message. So we don’t bother with identifying that the line couldn’t be parsed, we just let the process die and log why it died.

Functional Language

Way back when I first started in Object Orientated development, I started with Objective-C. I didn’t stick with it. I found it quite confusing at the time, but thinking back, that was probably because the Cocoa/Objective-C example of “Currency Converter” is also MVC, which just confused matters further! I do remember mention of “message passing” in reference to Objective-C’s Smalltalk inspiration, but it meant nothing to me since Smalltalk is “a bit before my time” and I couldn’t see any message passing, all I could see were function/method calls, so I ignored it and muddled on.

Once it clicked with me that what Erlang calls “processes” are in fact “objects”, what I had read years previously about message passing suddenly made sense and I realised that Erlang’s more object orientated than any other language I’ve used! Oh, and that everything I thought I knew about object-orientated programming is wrong. Ok, it’s not wrong, just pretty useless to me in Erlang!

It’s isolation and message-passing that’s key in OOP, just as it is in Erlang. I was very pleased when I later found this interview with Ralph Johnson and Joe Armstrong on the state of Object Orientated Programming. It’s nice when one’s gut instinct is reinforced by the people who actually know what they’re talking about.

Side effects are pretty important in Erlang too, so it’s not as functional as all that anyway. You send messages to processes, you can print to the shell. That’s just the way it works.

Ugly Syntax

When I first started looking at screens of Erlang code, I thought to myself, “It ain’t pretty”. I totally take it back. Erlang’s syntax borrows heavily from Prolog, rather than the C-like “curly bracket” languages (of which PHP is a member), so if that’s what you’re used to then yeah, it may seem a bit weird at first. But it makes sense to me both from a logical point of view and a natural language point of view too. I found it surprisingly easy to switch to Erlang’s mode of syntax, to the point where I keep inadvertently using it in other languages! I thought the semicolon-return habit would be the hardest one to break.

Yes, occasionally it bites you when you’re copying and pasting code. But I’ve always been anti-copy-paste so in my opinion, you, I, deserve it for copying-and-pasting code in the first place! Anything that helps us break that habit is A Good Thing. Sure, it has it’s moments when it can be difficult to read, but no worse than some of the PHP or Javascript that I’ve seen! Even in those cases, it tends to be a use of whitespace issue, and that’s subjective. Often what’s “messy” to one developer is “concise” to another.

Despite this, when it comes to pure code alone, no comments and no real use of whitespace to speak of, then I have found Erlang code to be a lot easier to follow than the equivalent software in a typical object orientated language. So far I’ve seen and understood the inner workings of an HTTP server, OSC server and MQTT broker quite quickly. In the latter cases, including an introduction to the protocols themselves!

Pattern Matching

Wow, I love it. It’s just a beautiful way to express software in my opinion. But again, a bit of a shock from a PHP point of view. I think the best way for me to describe it is “Switch()es everywhere!”, except that you can switch() on an entire data structure. A bit like PHP switch abuse, except sanctified!

Take an simple example function that converts HTTP status codes to message strings called status_string/1.

The /1 specifies the function’s arity (but is not part of the name). Arity isn’t something that’s given much thought in PHP, it’s the number of arguments that the function takes. So a function called “status_string” that takes no arguments and a function called “status_string” that takes one argument are different functions. If you wanted to emulate optional arguments, you’d implement wrapper functions recursively for each optional argument, like Java.

status_string( 200 ) -> "OK";
status_string( 403 ) -> "Forbidden";
status_string( 404 ) -> "Not Found";
status_string( 500 ) -> "Internal Server Error";
status_string( Unknown ) when is_integer( Unknown ) ->
    Error = { unknown_status_code, Unknown },
    io:format( "Oh No! ~p~n", [ Error ] ),
    throw( Error ).

It’s like having a built-in switch() on every function definition! It gets right to the point. Less syntax, more power. You’ll notice the last clause on that function doesn’t match against a fixed value, it’s an unbound variable. This means that the data that occupies that position in the structure (in this case just a single value, the entire argument) gets put into that variable. Something a switch() can’t do!

We can also use guards, which is the “when is_integer( Unknown )” bit. Admittedly they’re a little more complicated than at first glance and you can only use certain built-in functions in them but it gives us a “catch all integer” in this case, and you can use them to ensure an argument is within a certain range and things like that. They allow you to fine-tune your patterns, negating the requirement of extra validation steps inside the function. In the case of status_string/1, if you were to pass, say a string into it, then we would just let it crash with a “badmatch” error. Since none of the clauses allow for an argument that isn’t an integer. The function doesn’t accept that input, rather than the function accepting and rejecting that input. Conceptually, that makes a lot of sense to me!

It doesn’t stop there though! You can do pattern matches against bitstreams! I’m not going to go into it here since it’s quite fiddly, but I will say that I’m finding it a great introduction to binary protocols!

Yes, I can see it being open to abuse, and I think it would make my soul cry if I saw patterns matching against tuples of tuples, lists of tuples and tuples of lists. But as long as you follow the Erlang principal of “lots of small processes” then the structures you’re dealing with shouldn’t get very big anyway.

String Handling

Virtually non-existant. I won’t lie to you. There’s no string data type, they’re just lists of integers (Latin1). Which makes sense, but it’s a bit of a shock when you’ve come from the world of PHP. There is the string module that accommodates a lot of the things you’d want to do and since strings are just integer lists, you can also use the lists module to handle them in other regards.

Personally, I find it a bit confusing sometimes knowing what module a function is in. It requires me to stop and think about exactly what it is I want to do with the string. To tokenise a string, you’d use string:tokens/2, but if you wanted to strip certain characters, then you’d use something more like lists:dropwhile/2, with a callback function that checks each character one at a time, or a “list comprehension” that would essentially do the same thing.

There do seem to be a variety of different techniques for dealing with strings including some UTF types in bitstream handling that I haven’t delved into too far yet. I’m still not sure how you’re supposed to tell the difference between a string and a list of strings however. I guess the answer is that I probably don’t need them.

Serial

Apparently there’s no native ability to use a serial port. I was quite surprised. But then when I thought about it, it does make sense. Erlang uses a VM, it’s big on isolation. Why would it get it’s hands dirty dealing with hardware level stuff? It tackles the high-up problems, it’s not meant for fiddling with hardware, all that is abstracted from it.

As such, it has absolutely no problem with handing stuff off to external code. The external code can be written in a more suitable language, typically C, and Erlang can maintain it’s isolation. You can open “ports” to external system processes that run outside Erlang, and it also supports NIFs (Natively Implemented Functions) which allow you to implement an Erlang function in C like it was a Built-In Function (BIF). A bit like a PHP Extension though the impression I’m getting is they should only be a very last resort solution! So all in all I guess it’s not really Erlang’s problem, get C to deal with it!

Undocumented Functions

There seem to be an awful lot of undocumented functions, as you’d expect. But what’s unclear is whether or not you’re allowed to use them. I’m sure I read somewhere that you absolutely shouldn’t and that they can change with no notice. Which seems totally sensible. But then I’ve seen examples where people are using them, such as for accepting sockets asynchronously, and claiming to be using them in production systems. So I really don’t know where I stand on that one yet. I’ll stick with the documented ones I think!

Conclusion

I love Erlang. It’s not for everyone that’s for sure, but for the kind of software I’m most interested in, back end server type stuff, it’s seems perfectly suited. I don’t think I would build a website with Erlang, but I would definitely use it to build a webservice.

Resources

I can whole-heartedly recommend Erlang and OTP in Action, it’s well written and is easy to hop around from section to section (provided you have the basics down). It’s proving to be a great pick-up-put-down reference as well as an easy and engaging continuous read. It also includes a real world example application that starts out as a simple cache server and evolves as topics such as distribution and resource discovery are introduced, all the way to packaging up and deploying a release.

Free eBook about Erlang: Learn You Some Erlang

Interview with Ralph Johnson and Joe Armstrong on the state of OOP

Beebole Erlang Web Applications (with Mochiweb) screencast

Part one of a ten part lecture given by Joe Armstrong