Category: functional programming

Functional Programming with Python – Part 2 – Useful python constructs

Posted by – March 1, 2010

In Functional Programming with Python – Part 1, I focused on providing an overview. In this post I shall focus on the core python language constructs supporting functional programming. If you are experienced pythonista, you may choose to skip this post (and wait for the next post in this series :) )

Sequences in python are not immutable :

When using sequences in python the thing to be noted is that sequences are not immutable. This provides you with the following options.

a) Use immutable sequence types : This is only possible by defining different types for sequences than the ones built into the language.
b) Ignore all methods on the sequences which modify them
c) Waive functional programming tenet of immutability

My preferred option when explicitly focusing on writing functional code is to use b)

Sequence types in python

The most commonly used sequence types in python are :

  • Tuple : Immutable, Ordered, Indexed collection represented as (val1, val2)
  • List : Ordered collection represented as [val1, val2]
  • Dictionary : A key indexed collection represented as { key1: val1, key2 : val2 }
  • Set : An unordered collection allowing no duplicates. Represented as set([val1,val2])
  • str and unicode : While primarily meant to serv as ascii and unicode strings, these data structures also act as sequences of characters.Represented as “abcd” or ‘abcd’

Of the above only tuples are immutable.

There are other sequences used less often as well. An example is frozenset which is an immutable set.

Simple iteration
The for construct allows you to loop through a sequence. eg.

1
2
3
one_to_five = [1,2,3,4,5]
for num in one_to_five :
    print num

Iterators on dictionaries

Unlike tuples, lists and sets where iterators essentially traverse through the sequence constituents, there are a number of different iterators on dictionaries. These are :

1
2
3
4
5
6
7
8
9
d = {1:"One", 2: "Two", 3:"Three"}
# keys
for val in d : print val
# an alternative for keys
for val in d.keys() : print val
# values
for val in d.values() : print val
# key, value tuples
for key,val in d.items() : print key, val

Slices

Slices allow creation of a subset on a sequence. They take the syntax [start:stop:step] with the caveat that a negative value for start or stop indicates an index from the end of the sequence measured backwards, whereas a negative step indicates a step in the reverse direction. The following should quickly indicate the use of slices

1
2
3
4
5
6
7
8
9
10
seq=[0,1,2,3,4,5,6,7]

#first 3
print seq[:3]
# last 3
print seq[-3:]
# 2nd to second last
print seq[1:-1]
# reverse the sequence
print seq[::-1]

The iterator protocol
Since python is an object oriented language as well, it provides strong support for allowing iteration over an object internals. Any object in python can behave like a sequence by providing the following :

a. It must implement the __iter__() method which in turn supports the iterator protocol
b. The iterator protocol requires the returned object, an iterator, to support the following ie. an __iter__() method returning self. and a next() method which returns the next element in the sequence, or raise a StopIteration in case the end of iteration is reached.

I’ve tested iterators which do not themselves have an __iter__() method and it still works, but I still do not give in to the temptation of not defining the next() method alone since that would be inconsistent with the documented python specifications

Let us examine a simple class indicating a range. Note that this is just for demonstration since python itself has better constructs for a range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# An iterator object to sequentially offer the next number
# in a sequence. Raises StopIteration on reaching the end
class RangeIterator(object):
    def __init__(self,begin,end):
        self._next = begin - 1
        self.end = end
    def next(self):
        self._next += 1
        if self._next < self.end :
            return self._next
        else :
            raise StopIteration
           
# A class which allows sequential iteration over a range of
# values (begin inclusive, end exclusive)
class Range(object):
    def __init__(self,begin,end):
        self.begin = begin
        self.end = end
    def __iter__(self):
        return RangeIterator(self.begin, self.end)

oneToFive = Range(1,5)

In the above example, __iter__() on Range returns an instance of a RangeIterator.

The way this capability is used is as follows :

1
2
3
4
5
for number in oneToFive :
    print 'Next number is : %s' % number

print 'Is 2 in oneToFive', 2 in oneToFive
print 'Is 9 in oneToFive', 9 in oneToFive

The output you get with it is :

1
2
3
4
5
6
Next number is : 1
Next number is : 2
Next number is : 3
Next number is : 4
Is 2 in oneToFive True
Is 9 in oneToFive False

Generators

In the above situation, the state necessary for the iteration (ie.begin and end attributes) need to be stored. This was stored as a class member. Python also provides a function based construct called a generator. In case of a generator, the function should just do a yield instead of a return. Python implicitly provides a next() method which resumes where the last yield left off and implicitly raises a StopIteration when the function completes. So representing the above class as a function,

1
2
3
4
5
6
7
8
9
10
11
def my_range(begin, end):
    current = begin
    while current < end :
        yield current
        current += 1
       
for number in my_range(1,5) :
    print 'Next number is : %s' % number

print 'Is 2 in oneToFive', 2 in my_range(1,5)
print 'Is 9 in oneToFive', 9 in my_range(1,5)

I do not document the results since these are identical to the earlier code using a RangeIterator.

Note: I used my_range() and not range() since there already exists another range() already provided by python

An interesting point to realise is that in both the above situations, the next element to be returned in the sequence was being computed dynamically. To put it in the terminology better consistent with functional programming, it was being lazily evaluated. While python has no mechanism of lazy evaluation of functions, the ability of functions or objects to lazily evaluate the return values are sufficiently adequate to get the benefits of lazy evaluation at least from the perspective of cpu utilisation only on demand, and minimal memory utilisation.

List comprehensions

List comprehensions are one of the most powerful functional programming constructs in python. To quote from python documentation (even as I leave out a very important part of the full quote .. to be covered later),

Each list comprehension consists of an expression followed by a for clause, then zero or more for or if clauses. The result will be a list resulting from evaluating the expression in the context of the for and if clauses which follow it. If the expression would evaluate to a tuple, it must be parenthesized.

A sample list comprehension is one which returns a sequence of even numbers between 0 and 9 :

1
2
3
4
# A list comprehension that returns even numbers between 0 and 9
even_comprehension = (num for num in range(10) if num % 2 == 0)
print type(even_comprehension)
print tuple(even_comprehension)

The output one gets on running the code above is

1
2
type 'generator'
(0, 2, 4, 6, 8)

Note that the list comprehension returns a generator which then returns a sequence containing 0, 2, 4, 6 and 8.

The for and if statements can be deeply nested.

lambda

Lambdas are anonymous functions with a constraint. Their body can only be a single expression which is also the return value of the function. I will demonstrate their usage in the next section.

map, filter and reduce

Three of the functional programming constructs which probably have aroused substantial discussions within the python community are map, filter and reduce.

map takes a sequence, applies a function of each of its value and returns a sequence of the result of the function. Thus if one were to use map with a function which computes the square on a sequence, the result would be a sequence of the squares. Thus

1
2
def square(num): return num * num
print tuple(map(square,range(5)))

results into

1
(0, 1, 4, 9, 16)

I mentioned earlier I will demonstrate usage of a lambda. In this case I could use a lambda by defining the square function anonymously in place as follows :

1
print tuple(map(lambda num : num * num,range(5)))

filter takes a predicate and returns only the elements in the sequence for whom the predicate evaluates to true.

Thus

1
print filter(lambda x : x % 2 == 0, range(5))

results in the following output

1
[0, 2, 4]

Finally the most controversial of them all – reduce. Starting with an initial value, reduce reduces the sequence down to a single value by applying a function on each of the elements in the sequence along with the current manifestation of the reduced value. Thus if I wanted to compute the sum of squares of numbers 0 through 4, I could use the following

1
print reduce(lambda reduced,num : reduced + (num * num), range(5), 0)

In the above example, 0 as the right most argument is the initial value. range(5) is the sequence of numbers from 0 thru 4. Finally the anonymous lambda takes two parameters – the first is always the current value of the reduced value (or initial value in case it is being invoked for the very first time) and the second parameter is the element in the sequence. The return value of the function is the value which will get passed to the subsequent invocation of the reduce function with the next element in the sequence as the new reduced value. The reduced value as returned finally by the function is then the returned value from reduce

The functional programming folks are likely to find this an extremely natural expression. Yet reduce resulted in a substantial debate within the python community. With the result that reduce is now being removed from python 3.0. (Strictly speaking it is being removed from python core but will be just an import statement away as a part of the functools package). See The fate of reduce() in Python 3000. Why so controversial ? Simply because the usage of map, filter, reduce above could be rewritten as

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#map
seq = []
for num in range(5) :
    seq = seq + [num * num]
print seq

#filter
seq = []
for num in range(5) :
    if num % 2 == 0 :
        seq = seq + [num]
print seq

#reduce
total = 0
for num in range(5) :
    total = total + (num * num)
print total

Just look at the two blocks of code and figure out which is easier to read (especially look at reduce). One of the areas where a large number of pythonistas and lispists are likely to disagree is the tradeoffs between brevity and easy readability. Python code is often english like (well, at least to the extent that a programming language can be) and some pythonistas do not like the terse syntax of lisp thats hard to follow.

I earlier mentioned that I left out a part of the description of list comprehensions from python documentation. Here’s that part of the quote.

List comprehensions provide a concise way to create lists without resorting to use of map(), filter() and/or lambda. The resulting list definition tends often to be clearer than lists built using those constructs.

Having said that I do believe many including myself will continue to use map, filter, reduce

Other helpful functions in python core that are helpful for functional programming are :

  • all : Returns True if all elements in a sequence are true
  • any : Returns True if any element in a sequence is true
  • enumerate : Returns a sequence containing tuples of element index and element
  • max : Returns the maximum value in a sequence
  • min : Returns the minimum value in a sequence
  • range : Return a sequence for given start, end and step values
  • sorted : Returns a sorted form of the sequence. It is possible to specify comparator functions, or key value for sorting, or change direction of sort
  • xrange : Same as range except it generates the next sequence element only on demand (lazy evaluation) thus helping conserve memory or work with infinite sequences.

We’ve seen many of the constructs that are typically useful for functional programming. I left out one big part – the itertools package. This is not a part of python core (its a package which is available with a default python installation). Its a large library and substantially helps functional programming. That along with some more sample python programs shall be the focus of my next blog post in the series. At this point in time I anticipate at least a few more parts after that to focus on a) Immutability b) Concurrency and c) Sample usage.

Hope you found this useful and keep the feedback coming so that I can factor it into the subsequent posts.

Functional Programming with Python – Part 1

Posted by – February 23, 2010

Lately there has been a substantial increase in interest and activity in Functional Programming. Functional Programming is sufficiently different from the conventional mainstream programming style called Imperative Programming to warrant some discussion on what it is, before we delve into the specifics of how it can be used in Python.

What is Functional Programming?

To quote from Wikipedia,

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state.

For beginners, one of the most fluent starter pages I would recommend for the history and specifics of functional programming is Functional Programming For The Rest of Us. This is a must read article which provides the reader with a good overview without getting too much into the nitty gritties of functional programming.

Since a detailed discussion on functional programming (henceforth referred to FP) is beyond the scope of this post, I will just briefly summarise the most critical elements of FP.

  • Functions as the basic building blocks : Unsurprisingly FP requires the construction and usage of functions as the basic building units. In the simplest terms “int add(int x, int y) { return x + y; }” is a simple addition function written in ‘C’. It takes two parameters x and y, adds them, and returns the result. This is a rather obvious and simple case but I stated it since I would like to refer back to it subsequently in this post.
  • Functional Programming prefers functions without side effects : The add function above is a good example of a function without side effects. A function is said to be without side effects if the only changes it makes are those that are manifested in the return values. In other words such a function cannot change any global variables, write to the console, update the database etc. A fairly related term is referential transparency. A function is said to be referentially transparent if its invocation can be substituted by the return value in a program without impacting the program in any other way.
  • Immutability : Pure functional programming often requires you to deal with immutable data structures. Thus the value of any variable is not open to modification (Thus they are called values and not variables). This aspect complements the functions without side effects. Thus the way most changes to state are implemented are not by modifying an object in place (which is how imperative programming deals with it) but by cloning the data structure with some of the values getting modified and the modified data structure being returned by the function. Java programmers are aware of the immutability of the String instances wherein any modifications to the string result in a new String instance being created. Imagine the same happening to all the datatypes across the program.

Benefits of Functional Programming :

Some of the nice benefits (I am tempted to say side effects) of functional programming are :

  • Superior ability to deal with concurrency (multi threading) : Threaded programs are nasty to write. And nastier to debug. In an imperative environment, you not only have to deal with data structures being modified in place by some other parts of the program, in a threaded environment such modifications can happen using peer threads, even as your current thread whose logic you are focusing on is attempting to exercise that logic. Its an extremely unpredictable environment which has resulted in a number of how-to’s for safe threaded programming using constructs such as locks, mutexes etc. Functional programming deals with the issue far more elegantly. Instead of controlling and managing unpredictability, it takes it out completely. Because a data structure once constructed will not be modified and because the source of the modifications can be clearly located to the function which instantiated the datastructure, the unpredictability of data changing right under you is gone. This can be a little expensive to manage and FP does sometimes come up with some compromises (or cool features depending on how you view it) such as Software Transactional Memory but a discussion on that is completely beyond the scope of this post.
  • Easier testing and debugging : Because modifications to data are contained and because a function communicates with the context outside it only via its return values, testing and debugging become far easier. You essentially need to focus on testing each function individually. Similarly during debugging you need to be able to quickly locate the function likely to have the problem, after which you can easily focus on the function to be able to quickly resolve the issue. Mocking out functions can also help testing each function in isolation. In general because of fewer side effects, testing under functional programming is often a lot easier, and the importance of having to do “integration” testing and “module” testing is lesser since testing functions in isolation is likely to identify most issues, far more than in typical imperative programming.

Why Python?

Python is not the best functional programming language. But it was not meant to be. Python is a multi paradigm language. Want to write good old ‘C’ style procedural code? Python will do it for you. C++/Java style object oriented code? Python is at your service as well. Functional Programming ? As this series of posts is about to demonstrate – Python can do a decent job at it as well. Python is probably the most productive language I have worked with (across a variety of different types of programming requirements). Add to that the fact that python is a language thats extremely easy to learn, suffers from excellent readability, has fairly good web frameworks such as django, has excellent mathematical and statistical libraries such as numpy, and cool network oriented frameworks such as twisted. Python may not be the right choice if you want to write 100% FP. But if you want to learn more of FP or use FP techniques along with other paradigms Python’s capabilities are screaming to be heard.

Sample Program :

I debated whether I should introduce various elements of functional programming using python in detail and then put it all together in a sample program all in future blog posts of this series, or whether I should start with a sample program which cover various aspects of function programming in this post and then explain various aspects in much more detail in future posts. For better or for worse, I have chosen the latter option. That means I shall be explaining one sample program and shall leave it to future posts in this series to get into greater details.

The sample program I have chosen is that of a simple calculator. A typical calculator supports simple unary or binary mathematical operators and performs floating point operations. Without much ado we now get into the sample program.

Immutable Data :

1
2
3
4
5
from collections import namedtuple

Context = namedtuple('Context','stack, current, op')
def default_context():
    return Context([],0.0,None)

Python is not particularly strong at immutable data. However one of the data structures, a tuple is immutable. A namedtuple is another data structure which supports both tuple like access through indices or through named elements in the tuple. For the calculator I shall need a Context which contains a stack for storing any incomplete operations, an attribute current reflecting the current value being shown on the screen and an op which might reflect a pending operation which is typically required for binary operators where the second value still needs to be provided. While namedtuple is a reasonable construct for simple tuple like objects, it would be helpful to have immutable objects as well – but thats to be covered in a future post.

Simple Functions

1
2
3
4
5
6
def add(x,y): return x + y
def sub(x,y): return x - y
def mult(x,y): return x * y
def div(x,y): return x/ y
def reverse_sign(x): return -1 * x
def pow(x,y): return x ** y

There’s not much to describe here. The functions should be self explanatory. For purpose of emphasis I would like to note that in the above code, “add” is now an entry in the namespace which is a reference to a function. This reference can be passed around, assigned to other entries. Thus the code below should work (though it does not form a part of the calculator program). This is to demonstrate how python treats attributes and functions virtually identically consistent with the Uniform access principle.

1
2
3
4
otheradd = add
add = sub
assert otheradd(7,3) == 10
assert add(7,3) == 4

Currying
Currying is a treatment afforded in functional programming which allows a function of n parameters to be treated as a sequence of n sequential functions each of one parameter.

1
2
3
from functools import partial

square = partial(pow,y=2)

Here partial is a function reference. square now refers to another function with its y parameter value being anchored to 2

Invoking functions dynamically

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
unary_functions = {'!' : reverse_sign, '@' : square }

def handle_unary_op(ctx,x):
    return ctx._replace(current = unary_functions[x](ctx.current), op = None)

binary_functions = {'+' : add, '-' : sub, '*' : mult, '/' : div}

def handle_binary_op(ctx,x):
    return ctx._replace(op = binary_functions[x])

def handle_float(ctx,x):
    if not ctx.op :
        return ctx._replace(current = x)
    else :
        return ctx._replace(current = ctx.op(ctx.current,x), op = None)

Note that I created a unary_functions dict (or dictionary or hashmap) where the key is the character which represents the function and the value is the reference to the function.

Also note that in the handle_unary_opfunction, I invoke ctx._replace method. On a named tuple it creates another tuple based on the existing namedtuple data, but with some of the values modified as specified in the keyword paramters passed to _replace. After looking up the appropriate unary function ie. unary_functions[x], I also invoke it on the current value ie. unary_functions[x](ctx.current). I also defined another dict for binary operators. The handle_binary_op method reflects how the op in the context is set to the appropriate binary function that should be triggered after the subsequent value is known.

Finally the handle_float function either sets the current value to the incoming value or in case the current operator is already set it applies the binary operator to the current value and the incoming value and replaces current with the computed value.

Additional Code
When I wrote the calculator program, I wrote the functionality to introduce braces. However that functionality is not particularly important in this explanation. So it is being listed here for completeness.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def start_brace(ctx):
    newstack = ctx.stack
    newstack.append((ctx.current,ctx.op))
    return ctx._replace(
                stack = newstack,
                current = 0.0, op = None)

def end_brace(ctx):
    stack = ctx.stack
    current = ctx.current
    oldcurrent, oldop = stack.pop()
   
    oldctx = Context(stack,oldcurrent,oldop)
    return process_key(oldctx,current)

tokens = { '(': start_brace, ')' : end_brace}

def handle_tokens(ctx,x):
    return tokens[x](ctx)

Processing one key
I must confess I started off using key to represent the keystrokes, but along the way the key can also represent a complete floating point number (not just a single keystroke). Thus the key parameter can refer to a single character operator or a sequence of characters representing a floating point number

1
2
3
4
5
6
7
8
9
10
11
12
13
function_groups = {
    tuple(unary_functions.keys()) : handle_unary_op,
    tuple(binary_functions.keys()) : handle_binary_op,
    tuple(tokens.keys()) : handle_tokens
}

def process_key(ctx,key):
    if isinstance(key,(types.FloatType,types.IntType, types.LongType)) :
        return handle_float(ctx,key)
    elif isinstance(key,(types.StringType)) :
        for function_class in function_groups :
            if key in function_class : return function_groups[function_class](ctx,key)
    return ctx

In this case I set up a dictionary where the key is a tuple of all the keys representing a particular class of a function. Note that the .keys() method is a method which returns a list of all the keys in a dictionary. However since list is mutable, it cannot get used as a key into the overall hashmap, hence I convert it into a tuple.

The process_key function takes the incoming key, passes it handle_float if it is a number, or treats it as an operator. If it is the latter it searches for it in all the keys of each operator groups, and if it finds a match, it locates the corresponding handler function from the map and invokes it. Finally in case no match is found it ignores the key.

Processing a sequence of keys

1
2
def process_keys(keys):
    return reduce(lambda ctx,key : process_key(ctx,key), keys, default_context())

Here you see a reduce function being invoked. This belongs to the family of map and filter functions which are used extensively in functional programming. I shall attempt to briefly explain it here, but this family of functions in addition to a number of others will again be dealt with in a future blog post.

To interpret the usage read the above reduce statement right to left. Thus we start with a default context, and for each key in the sequence of keys, we invoke a lambda (thats like an anonymous function), which calls process key with the context and the key. Note that the first parameter to the lambda is either the initial value (the default context) or the return value of the last process_key (which is also a context) and the key is each key in the keys sequence injected sequentially.

To further make it easy I re-represent the same function below differently which is much more readable and easier to understand. This shows one more strength of python. Because of its focus on readability, it actually can be used to write functional programs are much more readable by a large mass of programmers than most of the functional programming languages themselves (readability being subjectively interpreted by me as what is most natural for english or similar language speaking people).

1
2
3
4
5
def process_keys(keys):
    ctx = default_context()
    for key in keys :
        ctx = process_key(ctx,key)
    return ctx

Usage
As a mechanism to conduct some rudimentary tests on the code written so far, the following code is introduced. Here you can get an overall feel of the program.

1
2
3
4
5
6
if __name__ == "__main__" :
    assert process_keys((2,'+', 3)).current == 5
    assert process_keys((2, '!', '+', 5)).current == 3
    assert process_keys((2, '@')).current == 4
    assert process_keys((2,'+',3,'*',5)).current == 25
    assert process_keys((2,'+','(',3,'*',5,')')).current == 17

Some more slightly advanced Functional Programming

Finally to tickle your interest even more, here’s a slightly more advanced usage of functional programming constructs. Here I shall add all numbers between 1 through 10.

1
2
3
4
5
6
7
8
9
10
from itertools import chain

def plus_num_seq(n):
    count = 1
    while count <= n :
        yield '+', count
        count += 1

keys = list(chain.from_iterable(plus_num_seq(10)))[1:]
assert process_keys(keys) == 55

The plus_num_seq is a generator. Note the usage of the yield statement. Thus it will continuously generate tuples with the first element of the tuple being the ‘+’ character and the second being the number with the number varying from values 1 through n. The chain.from_iterable flattens the generated list (it thus has 20 items for n = 10, each alternate one being the ‘+’ character starting with the first items). Since we do not need the very first ‘+’ character, I removed it using the [1:] slice operator.

Just like reduce this style of code is quite typical of functional programming. Thats something I shall detail upon much more in future posts.

Hope you enjoyed the post. Keep the feedback coming so I can better structure the subsequent posts based on the feedback.

Note: The full source for the calculator calculator.py can be accessed here.

Concise python code

Posted by – January 6, 2010

I love python for a number of reasons. Brevity is one of them. Readability is another. However writing readable concise code can take some time to getting used to. This post discusses some code snippets especially in the context of some of the comparisons done between python and clojure – another language which excels at brevity and shares yet another feature with python, ie. people either love or hate its syntax, they are rarely indifferent to it.

Please note : The intent of this post is not to compare LOC of python with other languages. It is meant to demonstrate python code implementation which helps describe the capabilities of python in terms of writing clean, concise, readable code. This post continuously refers to other posts and may require you to read the referred posts in conjunction with this post to get maximum value.

Code and ceremony

The post Java vs Clojure – Lets talk ceremony! describes a simple problem which requires 28 LOC in Java and 1 or 9 in clojure. It is a simple logic which creates a list, populates it with four values (two of them identical) and creates a list with unique items. There are two alternative clojure implementations, one in 1 line of code and another in 9 – the former using a built in construct in the language which especially helps to write a more concise logic, and the latter by writing code similar to the Java code. Without any further ado, here are the two alternative python implementations

a. By using built in capabilities – In this case we use the set collection

1
print set(("foo","bar","baz","foo"))

b. Now here’s where we will not use a built in operator which automatically chooses the unique elements but again write code similar to the java code. This code is as follows :

1
print reduce(lambda x,y:  x + [y] if y not in x else x ,("foo","bar","baz","foo"),[])

There – both of them are exactly one line long.

Project Euler solutions
The post Python vs Clojure – Evolving refers to many sample python implementations solving some of the problems documented on project euler site. We shall revisit these python implementations here.

a. Find all numbers dividable by 3 or 5 in 0 < x < 1000:
In this case the clojure code is 4 lines long where as the python implementation is 3 lines long

1
print sum(i for i in range(1,1000) if i%3 == 0 or i%5 == 0)

Again the implementation is exactly 1 line long – no lambdas, no filters etc.

b. Get the sum of all even Fibonaccis below 4 mill.

The suggested python implementation is about 17 lines long. The implementation below 7.

1
2
3
4
5
6
7
8
9
from itertools import takewhile

def fib():
    x,y = 1,1
    while True :
        x,y = y,x+y
        yield x

print sum(i for i in takewhile(lambda x : x < 4000000,fib()) if i%2 == 0)

c. Finding Palindromes
The suggested python solution uses about 23 lines of code, to 6 in clojure and takes 15 seconds almost 300% slower than the clojure code (emphasis from the post referred to).

Here’s another alternative implementation.

1
print max(x * y for x in xrange(100,1000) for y in xrange(100,1000)  if str(x*y) == str(x*y)[::-1])

Again if you notice – exactly one line of code. And incidentally on my notebook it clocks in about 0.85 seconds which makes the clojure implementation about 470% slower than the python implementation. (emphasis mine)

Top Rank Per Group

In yet another post Python vs Clojure – Reloaded, the author discusses the Rosetta Top Rank Per Group implementation of python. Unsurprisingly the python implementation uses 11 lines of code (not including the initialisation) compared to clojure’s 6.

One of the commenters suggests an eminently readable and simple python solution in about 6 lines of code. The code snippet is as follows

1
2
3
4
5
6
7
8
9
10
11
from itertools import groupby
from operator import itemgetter

data = [dict(name="Tyler Bennett", id="E10297", salary=32000, department="D101"), ...]

department = itemgetter('department')
for dep, persons in groupby(sorted(data, key=department), department):
    print "\nDepartment", dep
    print " Employee Name   Employee ID     Salary          Department"      
    for person in sorted(persons, key=itemgetter('salary'):
        print "%(name)-15s %(id)-15s %(salary)-15s %(department)-15s"

An alternative far less readable solutions which shaves off one more line but certainly not preferred to the solution above (am just listing it since it shaves off the step of having to create an intermediate data structure) is

1
2
3
4
5
6
7
8
9
10
import operator
from itertools import groupby
   
data = [('Employee Name', 'Employee ID', 'Salary', 'Department'),....]

print reduce(lambda x,y : x + y,
         (('\nDepartment %s\n  Employee Name   Employee ID     Salary          Department  ' % dept +
           reduce(lambda x,y : x + ("\n  " + "%-15s " * len(data[0]) % y),
                  sorted(recs, key=lambda rec: -rec[-2])[:3],''))
         for dept, recs in groupby(sorted(data[1:],key=operator.itemgetter(3)), lambda x : x[3])),'')

Some might wonder whether I am joining multiple lines of code into one just to reduce the line count. The code is consistent with the idiomatic usage of python and also with the way python list comprehensions are documented in the formal python proposal for list comprehensions PEP 202 : List Comprehensions. Besides python is a whitespace sensitive language and the compiler disallows multiple lines of code in one line without using the ‘;’ separator which I have most definitely not used in the suggested solutions.

PS: There’s of course an interesting image at the end of the post Python vs Clojure – Reloaded indicating the perception of how clojure code speaks for itself ;) . The reason I mention it is that I believe if one is expressing a strong opinion as the one reflected by the picture, one better be far more careful and thorough with verifying the veracity of the assertions appropriateness of the samples compared to. Let me clearly state that I do not know that the clojure loc compared to is an appropriate target so do not infer this post to be a Python vs Clojure LOC as much as a post which emphasises that Python is a good vehicle to write concise readable code. FWIW I am learning clojure and look forward to building more competence with it.

Update: There is another nice post by Stephan Schmidt – Anatomy of a Flawed Clojure vs. Scala LOC Comparison which does reflect an opinion in the context of Scala and a post authored on the same blog as the ones I refer to above.

A brush with Functional Programming and Scala

Posted by – April 13, 2009

Initial Struggles

A few months ago I was working with using Python for complex data processing scenarios. This was stuff I was very well versed in Java, but was working hard at to make sure that when I wrote code in Python, it would at least pass as half decent pythonic code. My initial struggles with idiomatic Python, soon became a strong interest in functional programming. I initially considered it to probably have a niche applicability in traditional business transaction processing space and of primary importance in algorithmic code, and thats the state where I almost left it. Curiously even after reaching that conclusion I did persist in attempting to figure out how it could be used in a manner where it would be helpful to have functional programming constructs in typical transaction processing scenarios.

Is functional programming useful in traditional business transaction processing ?

One of the aspects of the struggle was that I didn’t want to have the functional programming code merely replace the existing code. I wanted to figure out if it would be possible to replace it in a manner where it would make a substantial positive difference, and that hunt proved a little elusive for a while. Another aspect was the fact that the traditional OO code depended so intensively on state, that to actually try to figure out how the same would work under Functional Programming was really requiring a substantial leap in terms of thought, a leap I had to really struggle with. Finally my language of choice was Python, a dynamically typed language, and some of the literature surrounding functional programming really assumed static typing, especially that related to patterns and constructs in Haskell (eg. Monads). Such patterns were difficult to apply since the underlying expected static typing support was simply at odds with a language like Python.

It was after persisting for a few weeks, that I started to see how one could apply these techniques effectively. I also started figuring out how to Pythonise constructs such as Monads. Once I was past the initial rather steep hurdles, the going became a lot easier, and the progress much much faster. In at least a subset of some typical financial transaction processing scenarios I did get some extremely encouraging (to myself I call them jaw dropping) results. The resultant code suddenly seemed smaller, far far more intuitive to understand and extremely easy to change.

Working with Object Oriented and Functional Programming simultaneously

I am not sure whether it was due to my abilities and experience or due to my inabilities to visualise, I decided that pure functional programming was simply not a choice in this context and thus one would need to combine traditional OO and FP constructs. Having reached that decision, it became a little easier to identify the areas which could be better leveraged by FP. However I believe I still have some ways to go before being able to be confident of writing OO and FP together the way each of them should be.

Scala

My focus on functional programming and fondness to Java also led me to investigate Scala. I made the mistake of actually reading the reference documentation on its web site first. My initial reactions were that it simply was a weird syntax language which was unlikely to be of help in reducing development time even when writing FP code. Thankfully, I did not give up at that stage and continued to read a lot more. I soon realised that type inferencing and other aspects (eg. no getter setters) really resulted in some substantial savings compared to traditional Java code (it probably cuts down the code size to half). And the syntax really wasn’t so cryptic after all, it just required a day to get used to. Moreover one had the entire arsenal of the the FP constructs along with traditional OO constructs to deploy. I did write some processing code and was very happy at the results I got. In this current state, I am excited about learning more about Scala as a language that would be an alternative to Java. It gives me some of the capabilities of more productive languages such as Python, while continuing to leverage the performance and stability of the JVM. Note that Scala’s compatibility with the JVM should be thought of differently from that of dynamic language implementations such as Jython on the JVM. Given Scala’s static typing capabilities, its runtime structures are far more similar to Java and thus are able to retain somewhat similar performance and memory footprint characteristics as traditional Java programs. I suspect the same is unlikely to be found for other dynamic languages on the JVM.

Concluding Thoughts

Prior to this entire exercise, in my mind there were two primary individual preferences to programming languages. For most scenarios, my preferred choice was Python especially due to the sheer speed with which one could write and maintain code, code that was really compact and readable. And if Python runtime performance was not going to be good enough, use Java. Coming out of this exercise, it does seem that there is a high potential that I might want to switch my preference to Scala instead of Java. I have already validated the positive benefits of functional programming and Scala coding in processing scenarios (those that do a lot of computation and processing in the background). I do need to validate the same for typical web based applications, but on the face of things, I believe I have already identified areas where FP can help in typical web application scenarios as well. And now I have two multi paradigm (OO and FP) languages to be able to leverage FP in alongside OO – Python and Scala.

This is something thats definitely going to be my interest area for the next few months and I intend to write many more specific posts on the topic. Should that be of any interest keep listening on this blog’s RSS feed and my twitter stream.