Thursday, June 13, 2013

The beauty of FP (and Scala)

I recently came across a coding problem that beautifully demonstrates the benefits of functional programming and - in this case - Scala. The following (Python) code shows a simplified version of the problem.

numbers1 = [1,2,3]
numbers2 = [1,2,3]
pairs = [(n1,n2) for n1 in numbers1 for n2 in number2]
processed = [process(n1,n2) for n1,n2 in pairs]
for p in processed: print p,  

As you can see it is trivial code that pairs the numbers in both lists, processes the pairs and prints out the results of the process function. Obviously, one could simply write

for n1 in numbers1:
  for n2 in numbers2:
    print process(n1,n2) 

and it would be better code but the previous example resembles the structure of the real, more complex code that has several additional processing steps.

To come back to the previous example, there are two possible problems: 1) let's assume it is not a few numbers but many 2) and that the process function is computationally expensive. For large number-lists memory consumption becomes an issue (since all pairs are stored in a list) and the processing will be time consuming.

Let's address the first problem first. We can easily replace list comprehensions by generators and would then iterate over number-pairs instead of storing them and their results. Easy.

pairs = ((n1,n2) for n1 in numbers1 for n2 in number2)
processed = (process(n1,n2) for n1,n2 in pairs)
for p in processed: print p,  

Problem two is harder. Let's assume the order of the results is of no importance and on a multi-core computer it would be nice to distribute the processing over the cores. But as far as I am aware there is no really elegant way to achieve this with Python. I'll show you soon what I mean by "elegant". Let's switch to Scala.

  
  val numbers1 = List(1,2,3)
  val numbers2 = List(1,2,3)
  val pairs = for(n1 <- numbers1; n2 <- numbers2) yield (n1,n2)
  val processed = pairs map { case (n1,n2) => process(n1,n2)}
  processed foreach (print)

As you can see the Scala code does the same as the first Python program. To change from list comprehensions to generators/iterators in the Scala code it is sufficient to transform the first list into an iterator (using toIterator).

  
  val numbers1 = List(1,2,3).toIterator
  val numbers2 = List(1,2,3)
  val pairs = for(n1 <- numbers1; n2 <- numbers2) yield (n1,n2)
  val processed = pairs map {case (n1,n2) => process(n1,n2)}
  processed foreach (print)

In the Python version we had to replace two list comprehensions by generators. In the Scala version we need to modify only one list. pairs and processed automatically become iterators because their inputs are iterators. In languages without generators this simple change would require substantial restructuring of the code.

Let's assume memory is not the problem but we want to speed things up. We simply add .par after pair to convert the list to a parallel collection and the processing is now performed concurrently. Beautiful!

    
  val numbers1 = List(1,2,3)
  val numbers2 = List(1,2,3)
  val pairs = for(n1 <- numbers1; n2 <- numbers2) yield (n1,n2)
  val processed = pairs.par map {case (n1,n2) => process(n1,n2)}
  processed foreach (print)

Can we have both - speed and low memory consumption? Not without effort. A parallel collection cannot be an iterator and vice versa. We could use futures but it wouldn't be trivial. However, this example shows how easy it is to tune functional code or parts of code to be memory efficient or fast/concurrent.

Tuesday, June 4, 2013

You shall not use tabs

There have been many debates whether spaces or tabs are better for indentation to structure code. There are good arguments for both sides but no clear winner, which indicates that in the end it doesn't really matter (The hotter the argument the less the value).

Where everyone agrees is that they should be used consistently and that the layout of code is important. Mixing spaces and tabs will lead to messed up layouts and will cause problems in languages where indentation has meaning, e.g. Python.

I have a slight preference for space over tabs for three reasons:

  • Spaces are absolute while tabs are relative. When switching editors the indentation does not change. The true intent/indentation is preserved.
  • Many IDEs support the automatic conversion from tabs to spaces but usually not the other way around.
  • Spaces constitute the lowest common denominator. Every editor must support spaces. Tabs are optional.

None of the above reasons is especially strong but if it has to be spaces or tabs, I vote for tabs.

ps. For more coding laws see here.

Sunday, June 2, 2013

The holy laws of coding

Well, here we go. My holy laws of coding. Let there be irritation, indignation and infuriation.

  1. You shall not use tabs
  2. You shall not use bad names
  3. You shall not ignore the idioms and style guide
  4. You shall not write more than 80 characters per line

  5. You shall not write more than 10 lines per function
  6. You shall not write more than 10 functions per class/module/file
  7. You shall not duplicate code

  8. You shall not use a file when you can use a data structure
  9. You shall not ignore the libraries
  10. You shall not miss to refactor your code

To help coding novices, I designed 10 "holy laws", which I think are useful guides to write "good" code (whatever THAT is). I prefer to call them laws because rules are too easily broken but they can and should be broken IF there is a good reason.

Why are they important? What are the benefits? More details will follow in subsequent posts but as an outline: I think there are three main issues beginners struggle with that these laws try to address.

  • Readability
  • Complexity
  • Efficiency

Readability: Due to lack of experience beginners tend to write code that is difficult to read by other and themselves. The first four laws aim to improve readability.

I believe the mind of an experienced programmer works a bit like that of an experienced chess player. An advanced player sees a position and with one look acquires a rough understanding of its possibilities and flaws. Similarly, an advanced programmer often can get an understanding what a piece of code does, just be glancing at it. However, if one swaps chess pieces with jelly beans or writes code that violates the common style, understanding is much harder to gain.

Complexity: What causes beginners and multi-million dollar projects to fail is unmanaged complexity. A function with thousand lines of code or hundred functions with 10 lines: to the machine it is the same but to the human mind it is entirely different. Complex problems not broken into small and independent pieces are a certain recipe for disaster. Rules 5 to 7 help to manage complexity and violating them will limit the difficulty of problems one can solve.

Efficiency: Programming novices tend to ignore available libraries or data structures and stick to simple file-manipulation approaches that are inefficient and error prone. Rules 8 to 10 aim to increase efficiency and robustness, and also foster the progression of skills.

Finally have a look at How To Write Unmaintainable Code for great examples of bad code ;)