Author:


Functional Programming with Python – Part 2 – Useful python constructs

Posted by – March 1, 2010

In Functional Programming with Python – Part 1, I focused on providing an overview. In this post I shall focus on the core python language constructs supporting functional programming. If you are experienced pythonista, you may choose to skip this post (and wait for the next post in this series :) )

Sequences in python are not immutable :

When using sequences in python the thing to be noted is that sequences are not immutable. This provides you with the following options.

a) Use immutable sequence types : This is only possible by defining different types for sequences than the ones built into the language.
b) Ignore all methods on the sequences which modify them
c) Waive functional programming tenet of immutability

My preferred option when explicitly focusing on writing functional code is to use b)

Sequence types in python

The most commonly used sequence types in python are :

  • Tuple : Immutable, Ordered, Indexed collection represented as (val1, val2)
  • List : Ordered collection represented as [val1, val2]
  • Dictionary : A key indexed collection represented as { key1: val1, key2 : val2 }
  • Set : An unordered collection allowing no duplicates. Represented as set([val1,val2])
  • str and unicode : While primarily meant to serv as ascii and unicode strings, these data structures also act as sequences of characters.Represented as “abcd” or ‘abcd’

Of the above only tuples are immutable.

There are other sequences used less often as well. An example is frozenset which is an immutable set.

Simple iteration
The for construct allows you to loop through a sequence. eg.

1
2
3
one_to_five = [1,2,3,4,5]
for num in one_to_five :
    print num

Iterators on dictionaries

Unlike tuples, lists and sets where iterators essentially traverse through the sequence constituents, there are a number of different iterators on dictionaries. These are :

1
2
3
4
5
6
7
8
9
d = {1:"One", 2: "Two", 3:"Three"}
# keys
for val in d : print val
# an alternative for keys
for val in d.keys() : print val
# values
for val in d.values() : print val
# key, value tuples
for key,val in d.items() : print key, val

Slices

Slices allow creation of a subset on a sequence. They take the syntax [start:stop:step] with the caveat that a negative value for start or stop indicates an index from the end of the sequence measured backwards, whereas a negative step indicates a step in the reverse direction. The following should quickly indicate the use of slices

1
2
3
4
5
6
7
8
9
10
seq=[0,1,2,3,4,5,6,7]

#first 3
print seq[:3]
# last 3
print seq[-3:]
# 2nd to second last
print seq[1:-1]
# reverse the sequence
print seq[::-1]

The iterator protocol
Since python is an object oriented language as well, it provides strong support for allowing iteration over an object internals. Any object in python can behave like a sequence by providing the following :

a. It must implement the __iter__() method which in turn supports the iterator protocol
b. The iterator protocol requires the returned object, an iterator, to support the following ie. an __iter__() method returning self. and a next() method which returns the next element in the sequence, or raise a StopIteration in case the end of iteration is reached.

I’ve tested iterators which do not themselves have an __iter__() method and it still works, but I still do not give in to the temptation of not defining the next() method alone since that would be inconsistent with the documented python specifications

Let us examine a simple class indicating a range. Note that this is just for demonstration since python itself has better constructs for a range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# An iterator object to sequentially offer the next number
# in a sequence. Raises StopIteration on reaching the end
class RangeIterator(object):
    def __init__(self,begin,end):
        self._next = begin - 1
        self.end = end
    def next(self):
        self._next += 1
        if self._next < self.end :
            return self._next
        else :
            raise StopIteration
           
# A class which allows sequential iteration over a range of
# values (begin inclusive, end exclusive)
class Range(object):
    def __init__(self,begin,end):
        self.begin = begin
        self.end = end
    def __iter__(self):
        return RangeIterator(self.begin, self.end)

oneToFive = Range(1,5)

In the above example, __iter__() on Range returns an instance of a RangeIterator.

The way this capability is used is as follows :

1
2
3
4
5
for number in oneToFive :
    print 'Next number is : %s' % number

print 'Is 2 in oneToFive', 2 in oneToFive
print 'Is 9 in oneToFive', 9 in oneToFive

The output you get with it is :

1
2
3
4
5
6
Next number is : 1
Next number is : 2
Next number is : 3
Next number is : 4
Is 2 in oneToFive True
Is 9 in oneToFive False

Generators

In the above situation, the state necessary for the iteration (ie.begin and end attributes) need to be stored. This was stored as a class member. Python also provides a function based construct called a generator. In case of a generator, the function should just do a yield instead of a return. Python implicitly provides a next() method which resumes where the last yield left off and implicitly raises a StopIteration when the function completes. So representing the above class as a function,

1
2
3
4
5
6
7
8
9
10
11
def my_range(begin, end):
    current = begin
    while current < end :
        yield current
        current += 1
       
for number in my_range(1,5) :
    print 'Next number is : %s' % number

print 'Is 2 in oneToFive', 2 in my_range(1,5)
print 'Is 9 in oneToFive', 9 in my_range(1,5)

I do not document the results since these are identical to the earlier code using a RangeIterator.

Note: I used my_range() and not range() since there already exists another range() already provided by python

An interesting point to realise is that in both the above situations, the next element to be returned in the sequence was being computed dynamically. To put it in the terminology better consistent with functional programming, it was being lazily evaluated. While python has no mechanism of lazy evaluation of functions, the ability of functions or objects to lazily evaluate the return values are sufficiently adequate to get the benefits of lazy evaluation at least from the perspective of cpu utilisation only on demand, and minimal memory utilisation.

List comprehensions

List comprehensions are one of the most powerful functional programming constructs in python. To quote from python documentation (even as I leave out a very important part of the full quote .. to be covered later),

Each list comprehension consists of an expression followed by a for clause, then zero or more for or if clauses. The result will be a list resulting from evaluating the expression in the context of the for and if clauses which follow it. If the expression would evaluate to a tuple, it must be parenthesized.

A sample list comprehension is one which returns a sequence of even numbers between 0 and 9 :

1
2
3
4
# A list comprehension that returns even numbers between 0 and 9
even_comprehension = (num for num in range(10) if num % 2 == 0)
print type(even_comprehension)
print tuple(even_comprehension)

The output one gets on running the code above is

1
2
type 'generator'
(0, 2, 4, 6, 8)

Note that the list comprehension returns a generator which then returns a sequence containing 0, 2, 4, 6 and 8.

The for and if statements can be deeply nested.

lambda

Lambdas are anonymous functions with a constraint. Their body can only be a single expression which is also the return value of the function. I will demonstrate their usage in the next section.

map, filter and reduce

Three of the functional programming constructs which probably have aroused substantial discussions within the python community are map, filter and reduce.

map takes a sequence, applies a function of each of its value and returns a sequence of the result of the function. Thus if one were to use map with a function which computes the square on a sequence, the result would be a sequence of the squares. Thus

1
2
def square(num): return num * num
print tuple(map(square,range(5)))

results into

1
(0, 1, 4, 9, 16)

I mentioned earlier I will demonstrate usage of a lambda. In this case I could use a lambda by defining the square function anonymously in place as follows :

1
print tuple(map(lambda num : num * num,range(5)))

filter takes a predicate and returns only the elements in the sequence for whom the predicate evaluates to true.

Thus

1
print filter(lambda x : x % 2 == 0, range(5))

results in the following output

1
[0, 2, 4]

Finally the most controversial of them all – reduce. Starting with an initial value, reduce reduces the sequence down to a single value by applying a function on each of the elements in the sequence along with the current manifestation of the reduced value. Thus if I wanted to compute the sum of squares of numbers 0 through 4, I could use the following

1
print reduce(lambda reduced,num : reduced + (num * num), range(5), 0)

In the above example, 0 as the right most argument is the initial value. range(5) is the sequence of numbers from 0 thru 4. Finally the anonymous lambda takes two parameters – the first is always the current value of the reduced value (or initial value in case it is being invoked for the very first time) and the second parameter is the element in the sequence. The return value of the function is the value which will get passed to the subsequent invocation of the reduce function with the next element in the sequence as the new reduced value. The reduced value as returned finally by the function is then the returned value from reduce

The functional programming folks are likely to find this an extremely natural expression. Yet reduce resulted in a substantial debate within the python community. With the result that reduce is now being removed from python 3.0. (Strictly speaking it is being removed from python core but will be just an import statement away as a part of the functools package). See The fate of reduce() in Python 3000. Why so controversial ? Simply because the usage of map, filter, reduce above could be rewritten as

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#map
seq = []
for num in range(5) :
    seq = seq + [num * num]
print seq

#filter
seq = []
for num in range(5) :
    if num % 2 == 0 :
        seq = seq + [num]
print seq

#reduce
total = 0
for num in range(5) :
    total = total + (num * num)
print total

Just look at the two blocks of code and figure out which is easier to read (especially look at reduce). One of the areas where a large number of pythonistas and lispists are likely to disagree is the tradeoffs between brevity and easy readability. Python code is often english like (well, at least to the extent that a programming language can be) and some pythonistas do not like the terse syntax of lisp thats hard to follow.

I earlier mentioned that I left out a part of the description of list comprehensions from python documentation. Here’s that part of the quote.

List comprehensions provide a concise way to create lists without resorting to use of map(), filter() and/or lambda. The resulting list definition tends often to be clearer than lists built using those constructs.

Having said that I do believe many including myself will continue to use map, filter, reduce

Other helpful functions in python core that are helpful for functional programming are :

  • all : Returns True if all elements in a sequence are true
  • any : Returns True if any element in a sequence is true
  • enumerate : Returns a sequence containing tuples of element index and element
  • max : Returns the maximum value in a sequence
  • min : Returns the minimum value in a sequence
  • range : Return a sequence for given start, end and step values
  • sorted : Returns a sorted form of the sequence. It is possible to specify comparator functions, or key value for sorting, or change direction of sort
  • xrange : Same as range except it generates the next sequence element only on demand (lazy evaluation) thus helping conserve memory or work with infinite sequences.

We’ve seen many of the constructs that are typically useful for functional programming. I left out one big part – the itertools package. This is not a part of python core (its a package which is available with a default python installation). Its a large library and substantially helps functional programming. That along with some more sample python programs shall be the focus of my next blog post in the series. At this point in time I anticipate at least a few more parts after that to focus on a) Immutability b) Concurrency and c) Sample usage.

Hope you found this useful and keep the feedback coming so that I can factor it into the subsequent posts.

Functional Programming with Python – Part 1

Posted by – February 23, 2010

Lately there has been a substantial increase in interest and activity in Functional Programming. Functional Programming is sufficiently different from the conventional mainstream programming style called Imperative Programming to warrant some discussion on what it is, before we delve into the specifics of how it can be used in Python.

What is Functional Programming?

To quote from Wikipedia,

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state.

For beginners, one of the most fluent starter pages I would recommend for the history and specifics of functional programming is Functional Programming For The Rest of Us. This is a must read article which provides the reader with a good overview without getting too much into the nitty gritties of functional programming.

Since a detailed discussion on functional programming (henceforth referred to FP) is beyond the scope of this post, I will just briefly summarise the most critical elements of FP.

  • Functions as the basic building blocks : Unsurprisingly FP requires the construction and usage of functions as the basic building units. In the simplest terms “int add(int x, int y) { return x + y; }” is a simple addition function written in ‘C’. It takes two parameters x and y, adds them, and returns the result. This is a rather obvious and simple case but I stated it since I would like to refer back to it subsequently in this post.
  • Functional Programming prefers functions without side effects : The add function above is a good example of a function without side effects. A function is said to be without side effects if the only changes it makes are those that are manifested in the return values. In other words such a function cannot change any global variables, write to the console, update the database etc. A fairly related term is referential transparency. A function is said to be referentially transparent if its invocation can be substituted by the return value in a program without impacting the program in any other way.
  • Immutability : Pure functional programming often requires you to deal with immutable data structures. Thus the value of any variable is not open to modification (Thus they are called values and not variables). This aspect complements the functions without side effects. Thus the way most changes to state are implemented are not by modifying an object in place (which is how imperative programming deals with it) but by cloning the data structure with some of the values getting modified and the modified data structure being returned by the function. Java programmers are aware of the immutability of the String instances wherein any modifications to the string result in a new String instance being created. Imagine the same happening to all the datatypes across the program.

Benefits of Functional Programming :

Some of the nice benefits (I am tempted to say side effects) of functional programming are :

  • Superior ability to deal with concurrency (multi threading) : Threaded programs are nasty to write. And nastier to debug. In an imperative environment, you not only have to deal with data structures being modified in place by some other parts of the program, in a threaded environment such modifications can happen using peer threads, even as your current thread whose logic you are focusing on is attempting to exercise that logic. Its an extremely unpredictable environment which has resulted in a number of how-to’s for safe threaded programming using constructs such as locks, mutexes etc. Functional programming deals with the issue far more elegantly. Instead of controlling and managing unpredictability, it takes it out completely. Because a data structure once constructed will not be modified and because the source of the modifications can be clearly located to the function which instantiated the datastructure, the unpredictability of data changing right under you is gone. This can be a little expensive to manage and FP does sometimes come up with some compromises (or cool features depending on how you view it) such as Software Transactional Memory but a discussion on that is completely beyond the scope of this post.
  • Easier testing and debugging : Because modifications to data are contained and because a function communicates with the context outside it only via its return values, testing and debugging become far easier. You essentially need to focus on testing each function individually. Similarly during debugging you need to be able to quickly locate the function likely to have the problem, after which you can easily focus on the function to be able to quickly resolve the issue. Mocking out functions can also help testing each function in isolation. In general because of fewer side effects, testing under functional programming is often a lot easier, and the importance of having to do “integration” testing and “module” testing is lesser since testing functions in isolation is likely to identify most issues, far more than in typical imperative programming.

Why Python?

Python is not the best functional programming language. But it was not meant to be. Python is a multi paradigm language. Want to write good old ‘C’ style procedural code? Python will do it for you. C++/Java style object oriented code? Python is at your service as well. Functional Programming ? As this series of posts is about to demonstrate – Python can do a decent job at it as well. Python is probably the most productive language I have worked with (across a variety of different types of programming requirements). Add to that the fact that python is a language thats extremely easy to learn, suffers from excellent readability, has fairly good web frameworks such as django, has excellent mathematical and statistical libraries such as numpy, and cool network oriented frameworks such as twisted. Python may not be the right choice if you want to write 100% FP. But if you want to learn more of FP or use FP techniques along with other paradigms Python’s capabilities are screaming to be heard.

Sample Program :

I debated whether I should introduce various elements of functional programming using python in detail and then put it all together in a sample program all in future blog posts of this series, or whether I should start with a sample program which cover various aspects of function programming in this post and then explain various aspects in much more detail in future posts. For better or for worse, I have chosen the latter option. That means I shall be explaining one sample program and shall leave it to future posts in this series to get into greater details.

The sample program I have chosen is that of a simple calculator. A typical calculator supports simple unary or binary mathematical operators and performs floating point operations. Without much ado we now get into the sample program.

Immutable Data :

1
2
3
4
5
from collections import namedtuple

Context = namedtuple('Context','stack, current, op')
def default_context():
    return Context([],0.0,None)

Python is not particularly strong at immutable data. However one of the data structures, a tuple is immutable. A namedtuple is another data structure which supports both tuple like access through indices or through named elements in the tuple. For the calculator I shall need a Context which contains a stack for storing any incomplete operations, an attribute current reflecting the current value being shown on the screen and an op which might reflect a pending operation which is typically required for binary operators where the second value still needs to be provided. While namedtuple is a reasonable construct for simple tuple like objects, it would be helpful to have immutable objects as well – but thats to be covered in a future post.

Simple Functions

1
2
3
4
5
6
def add(x,y): return x + y
def sub(x,y): return x - y
def mult(x,y): return x * y
def div(x,y): return x/ y
def reverse_sign(x): return -1 * x
def pow(x,y): return x ** y

There’s not much to describe here. The functions should be self explanatory. For purpose of emphasis I would like to note that in the above code, “add” is now an entry in the namespace which is a reference to a function. This reference can be passed around, assigned to other entries. Thus the code below should work (though it does not form a part of the calculator program). This is to demonstrate how python treats attributes and functions virtually identically consistent with the Uniform access principle.

1
2
3
4
otheradd = add
add = sub
assert otheradd(7,3) == 10
assert add(7,3) == 4

Currying
Currying is a treatment afforded in functional programming which allows a function of n parameters to be treated as a sequence of n sequential functions each of one parameter.

1
2
3
from functools import partial

square = partial(pow,y=2)

Here partial is a function reference. square now refers to another function with its y parameter value being anchored to 2

Invoking functions dynamically

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
unary_functions = {'!' : reverse_sign, '@' : square }

def handle_unary_op(ctx,x):
    return ctx._replace(current = unary_functions[x](ctx.current), op = None)

binary_functions = {'+' : add, '-' : sub, '*' : mult, '/' : div}

def handle_binary_op(ctx,x):
    return ctx._replace(op = binary_functions[x])

def handle_float(ctx,x):
    if not ctx.op :
        return ctx._replace(current = x)
    else :
        return ctx._replace(current = ctx.op(ctx.current,x), op = None)

Note that I created a unary_functions dict (or dictionary or hashmap) where the key is the character which represents the function and the value is the reference to the function.

Also note that in the handle_unary_opfunction, I invoke ctx._replace method. On a named tuple it creates another tuple based on the existing namedtuple data, but with some of the values modified as specified in the keyword paramters passed to _replace. After looking up the appropriate unary function ie. unary_functions[x], I also invoke it on the current value ie. unary_functions[x](ctx.current). I also defined another dict for binary operators. The handle_binary_op method reflects how the op in the context is set to the appropriate binary function that should be triggered after the subsequent value is known.

Finally the handle_float function either sets the current value to the incoming value or in case the current operator is already set it applies the binary operator to the current value and the incoming value and replaces current with the computed value.

Additional Code
When I wrote the calculator program, I wrote the functionality to introduce braces. However that functionality is not particularly important in this explanation. So it is being listed here for completeness.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def start_brace(ctx):
    newstack = ctx.stack
    newstack.append((ctx.current,ctx.op))
    return ctx._replace(
                stack = newstack,
                current = 0.0, op = None)

def end_brace(ctx):
    stack = ctx.stack
    current = ctx.current
    oldcurrent, oldop = stack.pop()
   
    oldctx = Context(stack,oldcurrent,oldop)
    return process_key(oldctx,current)

tokens = { '(': start_brace, ')' : end_brace}

def handle_tokens(ctx,x):
    return tokens[x](ctx)

Processing one key
I must confess I started off using key to represent the keystrokes, but along the way the key can also represent a complete floating point number (not just a single keystroke). Thus the key parameter can refer to a single character operator or a sequence of characters representing a floating point number

1
2
3
4
5
6
7
8
9
10
11
12
13
function_groups = {
    tuple(unary_functions.keys()) : handle_unary_op,
    tuple(binary_functions.keys()) : handle_binary_op,
    tuple(tokens.keys()) : handle_tokens
}

def process_key(ctx,key):
    if isinstance(key,(types.FloatType,types.IntType, types.LongType)) :
        return handle_float(ctx,key)
    elif isinstance(key,(types.StringType)) :
        for function_class in function_groups :
            if key in function_class : return function_groups[function_class](ctx,key)
    return ctx

In this case I set up a dictionary where the key is a tuple of all the keys representing a particular class of a function. Note that the .keys() method is a method which returns a list of all the keys in a dictionary. However since list is mutable, it cannot get used as a key into the overall hashmap, hence I convert it into a tuple.

The process_key function takes the incoming key, passes it handle_float if it is a number, or treats it as an operator. If it is the latter it searches for it in all the keys of each operator groups, and if it finds a match, it locates the corresponding handler function from the map and invokes it. Finally in case no match is found it ignores the key.

Processing a sequence of keys

1
2
def process_keys(keys):
    return reduce(lambda ctx,key : process_key(ctx,key), keys, default_context())

Here you see a reduce function being invoked. This belongs to the family of map and filter functions which are used extensively in functional programming. I shall attempt to briefly explain it here, but this family of functions in addition to a number of others will again be dealt with in a future blog post.

To interpret the usage read the above reduce statement right to left. Thus we start with a default context, and for each key in the sequence of keys, we invoke a lambda (thats like an anonymous function), which calls process key with the context and the key. Note that the first parameter to the lambda is either the initial value (the default context) or the return value of the last process_key (which is also a context) and the key is each key in the keys sequence injected sequentially.

To further make it easy I re-represent the same function below differently which is much more readable and easier to understand. This shows one more strength of python. Because of its focus on readability, it actually can be used to write functional programs are much more readable by a large mass of programmers than most of the functional programming languages themselves (readability being subjectively interpreted by me as what is most natural for english or similar language speaking people).

1
2
3
4
5
def process_keys(keys):
    ctx = default_context()
    for key in keys :
        ctx = process_key(ctx,key)
    return ctx

Usage
As a mechanism to conduct some rudimentary tests on the code written so far, the following code is introduced. Here you can get an overall feel of the program.

1
2
3
4
5
6
if __name__ == "__main__" :
    assert process_keys((2,'+', 3)).current == 5
    assert process_keys((2, '!', '+', 5)).current == 3
    assert process_keys((2, '@')).current == 4
    assert process_keys((2,'+',3,'*',5)).current == 25
    assert process_keys((2,'+','(',3,'*',5,')')).current == 17

Some more slightly advanced Functional Programming

Finally to tickle your interest even more, here’s a slightly more advanced usage of functional programming constructs. Here I shall add all numbers between 1 through 10.

1
2
3
4
5
6
7
8
9
10
from itertools import chain

def plus_num_seq(n):
    count = 1
    while count <= n :
        yield '+', count
        count += 1

keys = list(chain.from_iterable(plus_num_seq(10)))[1:]
assert process_keys(keys) == 55

The plus_num_seq is a generator. Note the usage of the yield statement. Thus it will continuously generate tuples with the first element of the tuple being the ‘+’ character and the second being the number with the number varying from values 1 through n. The chain.from_iterable flattens the generated list (it thus has 20 items for n = 10, each alternate one being the ‘+’ character starting with the first items). Since we do not need the very first ‘+’ character, I removed it using the [1:] slice operator.

Just like reduce this style of code is quite typical of functional programming. Thats something I shall detail upon much more in future posts.

Hope you enjoyed the post. Keep the feedback coming so I can better structure the subsequent posts based on the feedback.

Note: The full source for the calculator calculator.py can be accessed here.

Google Buzz Test Drive Report

Posted by – February 10, 2010

Feature Notes :

Note: Google Buzz had not yet been enabled for me through normal GMail Desktop access. These notes are based on my experience with the same by switching my Firefox user agent to iPhone and accessing the iPhone Google Buzz UI.

  • Has a very facebook / friendfeed like interface.
    • Comments : People can enter comments against a post which appear sequentially below the post.
    • Like : Has the Like feature also supported by facebook / friendfeed
    • Inline images and videos : Images and videos appear inline as a part of the stream.
    • No 140 character limitation : Or at least none that I observed
  • Each conversation becomes a web page. eg. the comment stream against my very first google buzz post.
    • The same conversation is also posted to your inbox. The conversation in the inbox gets dynamically updated as the conversation continues. A conversation that has been read earlier gets marked as unread when more comments are entered against it.
    • The conversation web page becomes an alternative location for participation. If say the conversation link is forwarded to others, they can immediately comment on (or choose to “Like”) the conversation from the web page itself.
  • Private Channeled Messaging. You can mark posts as private but directed to a group of people. These groups are maintained and managed using Google Contacts. To the best of my understanding the future conversation is also constrained to the same group. Such messages are obviously not visible publicly. This is an excellent feature for closed group interactions.
  • Directed messaging is supported using the familiar twitter ‘@’ style. However in this case you use ‘@’ followed by the entire email address. Any posts made with ‘@’ addressing also get delivered to the user’s inbox.
  • Geolocation integration is supported. Thus you can view the public posts being made by people nearby you. Since I used firefox, it used firefox for geolocation support. I imagine and presume it would be using alternative superior mechanisms for mobile based conversations.

Comparison to Twitter :

One of the reasons I test drove Google Buzz was since I was wondering if it was a better Twitter. In many ways it is. But I don’t think thats likely to be universally true. There is the 140 character simplicity to twitter which is its identifying feature. In many ways Google Buzz is a Friendfeed clone – not a twitter clone (with added inbox integration). I always found Friendfeed much superior to twitter for conversations and I suspect those who shared that perception will find Google Buzz far superior to twitter too. However it will run into the same difficulty that Friendfeed did. If the twitter user doesn’t correlate his twitter account to the friendfeed account, it is very laborious in friendfeed to add a user you follow as an imaginary user in order to have his tweets appear in your friends stream.

I think where Buzz will take on Twitter .. and big time is the really large number of people who sign on to twitter, maybe make a few tweets and then say I don’t get it and go away. I suspect most of them will “get it” on Google Buzz – especially if they have any Facebook experience (which is a large large set of people). In that sense while Twitter appealed to only a small set of people, Buzz at least has the potential to appeal to the mainstream .. and in that sense go for the jugular – Facebook’s that is.

Comparison to Facebook :

Since facebook over a period of time has cloned its update stream to be like that of Friendfeed, it should come as no surprise that the Buzz stream is likely to look and feel quite similar. However two big differences stand out. The first is the way the social graph is managed. The social graph unlike facebook is asymmetric. ie. two persons don’t have to agree to be friends to follow each other. In case of Buzz, the social graph is managed through google contacts. You can follow people who have public google profiles or follow anyone in your contacts. Thus if a person doesn’t have a public google profile or his email address is not known to you, you may not be able to follow him/her. You don’t need the other person’s permission to follow. (At least thats the impression I formed, though I could stand corrected). The other big difference is the email integration. The entire stream and conversations are integrated into the gmail account. For most knowledge workers, I think these differences are likely to make for a more positive experience. In that sense – buzz is extremely well suited to be the Facebook of knowledge workers. Whether it can compete with the trivial, frivolous, chatty social networking capabilities of Twitter or Facebook is something I couldn’t form a clear opinion on.

On carving a new market :

Based on my earlier statements, it should be obvious that I believe buzz will carve new markets which did not use Facebook / Twitter / Friendfeed earlier. These markets are :

  • People who tried twitter but didn’t get it
  • People who primarily use email for communications instead of facebook / twitter (for both Personal and Business Networking)
  • People who would like to use Facebook / Twitter like capabilities from a knowledge working / business perspective.
  • People who are just too nervous to use anything new on the net may still find the email integration a continuum of their email exchange rather than a ‘new web site’.

On Privacy :

I think Buzz definitely ranks superior on privacy, given its capability to conduct private directed closed group conversations. As an example, I can imagine many google apps installations which could use buzz for intra company conversations even as they clearly restrict even the public streams not to be visible outside the domain. While I haven’t seen any comment from Google on whether buzz would be available for google apps, I imagine its only a matter of time since it is so particularly well suited for that environment.

On google wave :

Google wave in many ways was a damp squib. It imposed a new usability paradigm even as some of the capabilities such as near real time updates were clearly superb. Google buzz reverses the wave approach by offering a different conversation model which dovetails not only into similar usability expectations as defined by the competition but goes one step beyond by integrating into the killer web application – web mail. I wouldn’t be surprised if it came to light later that Google buzz is a google wave backend with a completely new usability experience.

A note on integration :

While its only a statement of intent at this stage, google has setup a site for the Google Buzz API. However the promise is awesome in terms of multi standard support for variety of standards and protocols support including RSS, Atom, PubSubHubbub, Social Graph API, OAuth, WebFinger and Salmon. It does seem that google will want to support a variety of other alternative and mashup use cases around Google Buzz which might allow for more interesting uses to emerge (as it happened for Facebook and Twitter). Clearly a space to watch with some interest in months to come.

In summary :

Google buzz makes the strongest case for appealing to the power email users and those particularly focused on privacy (eg. knowledge working for businesses). Its strong differentiator is its mail integration. It does offer adequate features to compete with Facebook / Twitter for their existing markets as well. And joins the party with a large number of people – who already have a google account.

Software development is about the middle and not just the end

Posted by – January 14, 2010

Software development often gets similar treatment as other disciplines when it comes to matters of management. Often at a cost. And the cost comes from the fact that, the learnings and best practices gleaned from a number of disciplines are hard to apply in the field of software development, without understanding and dealing with some of the essential differences. And those differences and their implications are what this post explores.

There was an interesting post recently Why program with continuous time?.

Today I read a confusion that I’ve heard many times before about continuous time, which I’d like to bring up, in part because I love continuous time, and in part because there’s a much broader underlying principle of software design.

I don’t see why the lack of continuous streams leaves a “gap”. In the end all streams are discrete

“In the end”, yes. Just as in the end, numbers are displayed as ascii numerals. However, programming is more about the middle than the end, i.e., more about composition than about output. … Similarly, continuity in space and in time is better for composition/modularity, leaving discreteness to the output step.

The post above is completely unrelated to management. Its talking about a programming style called Functional Reactive Programming. But an idea it eloquently brings out is programming is more about the middle than the end, i.e., more about composition than about output. And thats a thought that I would like to expand upon.

While arguably in all disciplines the middle is also important, I would emphasise that it is particularly more so in software development. And that stems from a number of uncertainties that abound in software. The requirements are often not available in adequate detail, and to the extent available, are not stable. Moreover solutions to these problems are not all thought through upfront. Not because people don’t do their homework – its simply because it is actually extremely difficult to visualise all the moving parts, and then anticipate and resolve all the challenges. In fact many of the challenges don’t show up until you start programming. Thus programming often is an exercise in identifying surprises, resolving them, working out algorithms, discovering flaws with them, fixing these flaws only to find more surprises. Rinse and repeat. Basically its a significant exercise in uncertainty management. In such a situation, estimation results in underestimates since these are based on visible not actual challenges and surprises. The solution that does get applied with some success is to use historic estimation errors to compensate current estimates. (Note that thats not always obvious since the estimate could have the compensation already built in).

To add to the problem there is usually no Bridge 2.0, Lathe 2.0 or a Marketing Campaign 2.0 (and to the extent there might be some, these are far less dependent upon the quality of the work that went into the 1.0). Some programmers are continuously focused on 1.0, while many are thinking about the 10.0 even as they work on the 1.0. Optimising for 1.0 alone may not be the right way to approach an initiative. In the good old design methodology one attempted to anticipate and build for a lot of future changes. This requires a lot of extra complexity and development in terms of layering, parameterisation, open extensibility etc. In conventional agile methodologies the focus is much more on the 1.0, but there is a deliberate and definite cost introduced in terms of creation of a large number of automated test cases. These automated test cases is the assurance premium to be paid, in order to provide the comfort to undertake the necessary refactoring challenges when moving from 1.0 to 2.0. Thus whichever way one looks at it, the 1.0 is always suboptimised to allow for a more optimal 2.0, 3.0 etc. (note: I am referring to 1.0 figuratively as the current initiative and higher version numbers to future initiatives building on the current one). But selection of the necessary suboptimisations, with a view to optimise the overall sequence, requires a substantial understanding of the software development processes, the code, the tools etc.

Finally programming has a far far higher human element involved than other disciplines. After all there’s hardly any raw material dependency and the tooling is often a desktop, the cloud and set of open source libraries. Thus it is subject to the vagaries and frailties of human behaviour. Coordination, Communication, Productivity, are all held hostage by this fickle human element. While programming is not unique in this respect, achieving high team productivity is as much a function of morale, motivation, and communication as perhaps any other discipline. Thus progress is often non linearly proportional to such soft factors.

I believe there’s some benefit in evaluating software development management styles with how sports teams are managed. In sports, team performances are rarely consistent. Moreover even though team playing skills are critical, raw talent is also very important. But the analogy is most appropriate when one evaluates the game. The game is often a series of surprises reacted to by immediate on the fly action. Thats where adaptability and ability to counter attack surprises, can be effectively brought to bear. And in sports as in software development, player morale and motivation needs to be enhanced even as the player arrogance or dysfunctional behaviours are contained.

Since different sports are structured differently let me clarify the roles here. In some cases the manager and coach are different. In others they are vested in the same person. In my mind the manager’s role is resource management, broader strategy planning, impediment removal and stakeholder communication. The role of the coach is to enhance team skills, ensure fitness and training, and in case of many sports actively guide the team in refining the on field strategy when the game is in progress. Now lets consider the players. In most cases the players are living the game. They are in the middle. A substantial portion of their interesting professional time is spent in the middle. And when they are doing that, they are continuously dealing with a set of constant surprises, even as they broadly attempt to achieve the deliverables required for the end. A slight difference between sports and software development is that in case of sports the time is fixed and the score is variable. In case of software development at least in the non agile world, the score (feature set) is fixed and the time therefore becomes the variable. But the interesting takeaway from this analogy is that once the game starts, the ends become the background and the game dominates. The players and the developers are all living in the middle.

Why is this particularly relevant ? It is so because it helps us understand the implication of management styles in software development teams performance. If one wants to optimise – one needs to very well understand the middle and not just the ends. And (to come to the point) thats where I have a big issue with software developers, who end up imagining that as they rise up in the hierarchy, they somehow can stop worrying about coding and quality assurance, and instead start focusing only on managing people and deadlines. Thats a mistake. Since the game is played in the middle, you need to be able to help your team play the game. And you need to do it as a coach or a captain. For that, so long as you want to manage an ongoing game, you need to keep on playing and keep on being either sufficiently conversant about whats happening in the middle or be the captain and play the game with the team. Thinking that somehow you’ve grown in stature enough to not worry about the tactics, and focus on things increasingly strategic might actually make you unhelpful. The game requires skill and continuous adaptation – and you will need to be able to play, until you one day decide, you are no longer interested in managing the game but only the periods between.

The non playing managers who attempt to manage the game on the other hand display some interesting smells. They stop understanding the sport. They start believing that since they had played some particular sport many years ago, its easy for them to now manage any other sport. They are unable to appreciate the actual game play that happens. So when required to score a particular number of goals in 90 minutes (say in soccer), they start pushing in more team members (alas, this is a luxury not available to sports teams). Now on field coordination starts getting more difficult. The newer members being relatively junior are the ones who start getting intercepted more often (thus the team keeps on losing the ball more frequently). If the team luck is all run out, the same managers also attempt to run the tactics and gameplay selection. This is based on some fairly old set of experiences. The spectators (customers) start seeing far more confusion on the field. And the objectives change from playing the game well to just scoring the required number of goals. And in the end the spectators may get to see the number of goals promised, but only through a confused dissatisfying game, and that too sometimes only after many bouts of extra time. And soon enough once the non playing managers realise they are unable to effectively control the game play, they try to manage other variables. But really whats starting to happen is that they are hurting the gameplay, and sometimes instead of managing the game, they start managing the spectators with the game on autoplay. Suboptimisation maximised. Alas, sometimes the spectators also get used to this and start forgetting what a good game used to be like.

To summarise, ends are good to have, ends are important, ends provide the background urgency. But software development is not so much about managing the end goals as much as managing the middle. If you want to build good software and help your team build good software, make sure you continue to focus on how software is built not just the quantitative targets. Make sure you live and enjoy the middle. More often than not, the ends you achieve eventually could be amongst the best possible ends that were practical – but more importantly the customers will also go home satisfied. So long as you keep your eye on the ball and don’t lose sight of the middle. And if you want to view it a bit philosophically, in the end we are all dead. Only the middle matters.

Authors Note : This post has been substantially revised from its earlier version from a syntax, grammar and punctuation perspective. And the philosophical note (the last line) was added later.

Concise python code

Posted by – January 6, 2010

I love python for a number of reasons. Brevity is one of them. Readability is another. However writing readable concise code can take some time to getting used to. This post discusses some code snippets especially in the context of some of the comparisons done between python and clojure – another language which excels at brevity and shares yet another feature with python, ie. people either love or hate its syntax, they are rarely indifferent to it.

Please note : The intent of this post is not to compare LOC of python with other languages. It is meant to demonstrate python code implementation which helps describe the capabilities of python in terms of writing clean, concise, readable code. This post continuously refers to other posts and may require you to read the referred posts in conjunction with this post to get maximum value.

Code and ceremony

The post Java vs Clojure – Lets talk ceremony! describes a simple problem which requires 28 LOC in Java and 1 or 9 in clojure. It is a simple logic which creates a list, populates it with four values (two of them identical) and creates a list with unique items. There are two alternative clojure implementations, one in 1 line of code and another in 9 – the former using a built in construct in the language which especially helps to write a more concise logic, and the latter by writing code similar to the Java code. Without any further ado, here are the two alternative python implementations

a. By using built in capabilities – In this case we use the set collection

1
print set(("foo","bar","baz","foo"))

b. Now here’s where we will not use a built in operator which automatically chooses the unique elements but again write code similar to the java code. This code is as follows :

1
print reduce(lambda x,y:  x + [y] if y not in x else x ,("foo","bar","baz","foo"),[])

There – both of them are exactly one line long.

Project Euler solutions
The post Python vs Clojure – Evolving refers to many sample python implementations solving some of the problems documented on project euler site. We shall revisit these python implementations here.

a. Find all numbers dividable by 3 or 5 in 0 < x < 1000:
In this case the clojure code is 4 lines long where as the python implementation is 3 lines long

1
print sum(i for i in range(1,1000) if i%3 == 0 or i%5 == 0)

Again the implementation is exactly 1 line long – no lambdas, no filters etc.

b. Get the sum of all even Fibonaccis below 4 mill.

The suggested python implementation is about 17 lines long. The implementation below 7.

1
2
3
4
5
6
7
8
9
from itertools import takewhile

def fib():
    x,y = 1,1
    while True :
        x,y = y,x+y
        yield x

print sum(i for i in takewhile(lambda x : x < 4000000,fib()) if i%2 == 0)

c. Finding Palindromes
The suggested python solution uses about 23 lines of code, to 6 in clojure and takes 15 seconds almost 300% slower than the clojure code (emphasis from the post referred to).

Here’s another alternative implementation.

1
print max(x * y for x in xrange(100,1000) for y in xrange(100,1000)  if str(x*y) == str(x*y)[::-1])

Again if you notice – exactly one line of code. And incidentally on my notebook it clocks in about 0.85 seconds which makes the clojure implementation about 470% slower than the python implementation. (emphasis mine)

Top Rank Per Group

In yet another post Python vs Clojure – Reloaded, the author discusses the Rosetta Top Rank Per Group implementation of python. Unsurprisingly the python implementation uses 11 lines of code (not including the initialisation) compared to clojure’s 6.

One of the commenters suggests an eminently readable and simple python solution in about 6 lines of code. The code snippet is as follows

1
2
3
4
5
6
7
8
9
10
11
from itertools import groupby
from operator import itemgetter

data = [dict(name="Tyler Bennett", id="E10297", salary=32000, department="D101"), ...]

department = itemgetter('department')
for dep, persons in groupby(sorted(data, key=department), department):
    print "\nDepartment", dep
    print " Employee Name   Employee ID     Salary          Department"      
    for person in sorted(persons, key=itemgetter('salary'):
        print "%(name)-15s %(id)-15s %(salary)-15s %(department)-15s"

An alternative far less readable solutions which shaves off one more line but certainly not preferred to the solution above (am just listing it since it shaves off the step of having to create an intermediate data structure) is

1
2
3
4
5
6
7
8
9
10
import operator
from itertools import groupby
   
data = [('Employee Name', 'Employee ID', 'Salary', 'Department'),....]

print reduce(lambda x,y : x + y,
         (('\nDepartment %s\n  Employee Name   Employee ID     Salary          Department  ' % dept +
           reduce(lambda x,y : x + ("\n  " + "%-15s " * len(data[0]) % y),
                  sorted(recs, key=lambda rec: -rec[-2])[:3],''))
         for dept, recs in groupby(sorted(data[1:],key=operator.itemgetter(3)), lambda x : x[3])),'')

Some might wonder whether I am joining multiple lines of code into one just to reduce the line count. The code is consistent with the idiomatic usage of python and also with the way python list comprehensions are documented in the formal python proposal for list comprehensions PEP 202 : List Comprehensions. Besides python is a whitespace sensitive language and the compiler disallows multiple lines of code in one line without using the ‘;’ separator which I have most definitely not used in the suggested solutions.

PS: There’s of course an interesting image at the end of the post Python vs Clojure – Reloaded indicating the perception of how clojure code speaks for itself ;) . The reason I mention it is that I believe if one is expressing a strong opinion as the one reflected by the picture, one better be far more careful and thorough with verifying the veracity of the assertions appropriateness of the samples compared to. Let me clearly state that I do not know that the clojure loc compared to is an appropriate target so do not infer this post to be a Python vs Clojure LOC as much as a post which emphasises that Python is a good vehicle to write concise readable code. FWIW I am learning clojure and look forward to building more competence with it.

Update: There is another nice post by Stephan Schmidt – Anatomy of a Flawed Clojure vs. Scala LOC Comparison which does reflect an opinion in the context of Scala and a post authored on the same blog as the ones I refer to above.

Post conference recap : The 4th Indicthreads.com conference on Java Technology

Posted by – December 14, 2009

The annual indicthreads.com java technology conference is Pune’s best and possibly one of India’s finest conferences on matters related to Java technologies. I looked forward to attending the same and was not disappointed a bit. The last one was held about 3 days ago on Dec 11th and 12th, and this post reviews my experiences at the same.

As with any other conference usually something or the other isn’t quite working well in the morning, so I soon discovered we had a difficulty with the wireless network being swamped by the usage. There were some important downloads that needed to be completed, so my early morning was spent attempting to get these done .. which meant I missed most of Harshad Oak’s opening session on Java Today.

The next one i attended was Groovy & Grails as a modern scripting language for Web applications by Rohit Nayak. However I soon discovered that it (at least initially) seemed to be a small demo on how to build applications using grails. Since that was something I was familiar with, I moved to the alternative track in progress.

The one I switched to even as it was in progress was Java EE 6: Paving the path for the future by Arun Gupta. Arun had come down from Santa Clara to talk about the new Java EE6 spec and its implementation by Glassfish. Arun talked about a number of additional or changed features in Java EE6 in sufficient detail for anyone who got excited by them to go explore these in further detail. These included web fragments, web profile, EJB 3.1 lite, increased usage of annotations leading to web.xml now being optional, and a number of points on specific JSRs now a part of Java EE6. Some of the things that excited me more about Glassfish were, (a) OSGi modularisation and programmatic control of specific containers (eg Servlet, JRuby/Rails etc.), embeddability, lightweight monitoring. However the one that excited me the most was the support for hot deployment of web apps for development mode by allowing the IDEs to automatically notify the running web app which in turn automatically reloaded the modified classes (even as the sessions continued to be valid). The web app restart cycle in addition to the compile cycle was alway one of my biggest gripes with Java (second only to its verbosity) and that seemed to be going away.

I subsequently attended Getting started with Scala by Mushtaq Ahmed from Thoughtworks. Mushtaq is a business analyst and not a professional programmer, but has been keenly following the developments in Scala for a couple of years (and as I later learnt a bit with Clojure as well). Unlike a typical language capability survey, he talked only about using the language for specific use cases, a decision which I thought made the presentation extremely useful and interesting. The topics he picked up were (a) Functional Programming, (b) DSL building and (c) OOP only if time permitted. He started with an example of programming/modeling the Mars Rover movements and using functions and higher order functions to do the same. Looking back I think he spent lesser time on transitioning from the requirements into the code constructs and in terms of what he was specifically setting out to do in terms of higher order functions. However the demonstrated code was nevertheless interesting and showed some of the power of Scala when used to write primarily function oriented code. The next example he picked up was a Parking Lot attendant problem where he started with a Java code which was a typical implementation of the strategy pattern. He later took it through 7-8 alternative increasingly functional implementations using Scala. This one was much easier to understand and yet again demonstrated the power of Scala quite well in terms of functional programming. Onto DSLs, Mushtaq wrote a simple implementation of a “mywhile” which was a classical “while” loop as an example of using Scala for writing internal DSLs. Finally he demonstrated the awesome power of using the built in support for parser combinators for writing an external DSL, and also showed how a particular google code of summer problem could be solved using Scala (again for writing an external DSL). A very useful and thoroughly enjoyable talk.

The brave speaker for the post lunch session was Rajeev Palanki who dealt both with overall IBM directions on Java and a little about MyDeveloperworks site. In his opinion he thought Java was now (post JDK 1.4) on the plateau of productivity after all the early hype and IBM now focused on Scaling up, Scaling down (making it easier to use at the lower end), Open Innovation (allow for more community driven innovation) and Real Time Java. He emphasised IBMs support to make Java more predictable for real time apps and stated that Java was now usable for Mission Critical applications referring to the fact that Java was now used in a USS Destroyer. He referred to IBMs focus on investing in Java Tooling that worked across different JRE implementations. Tools such as GCMV, MAT, and Java Diagnostic Collector. Finally he talked about the IBM MyDeveloperWorks site at one stage referring to it as the Facebook for Geeks.

The next session was Overview of Scala Based Lift Web Framework by Vikas Hazarati, Director, Technology at Xebia. Another thoroughly enjoyable session. Vikas dealt with a lot of aspects related to the Lift web framework including various aspects related to the mapper, the snippets, usage of actors for comet support etc. I was especially intrigued by Snippets which act as a bridge between the UI and the business logic have a separate abstraction for themselves in the framework and how the construct and functionality in that layer is treated so differently from other frameworks.

I subsequently attended Concurrency: Best Practices by Pramod Nagaraja who works on the IBM JRE and owns the java.nio packages (I think I heard him say owns). He talked about various aspects and best practices related to concurrency and one of the better aspects of the talk was how seemingly safe code can also end up being unsafe. However he finished his session well in time for me to quickly run over and attend the latter half of the next presentation.

Arun Gupta conducted the session Dynamic Languages & Web Frameworks in GlassFish which referred to the support for various non java environments in Glassfish including those for Grails/Groovy, Rails/JRuby, Django/Python et. al. The impression I got was Glassfish is being extremely serious about support for the non java applications as well and is dedicating substantial efforts to make Glassfish the preferred platform for such applications as well. Arun’s blog Miles to go … is most informative for a variety of topics related to Glassfish for both Java and non Java related aspects.

The last talk I attended during the day was Experiences of Fully Distributed Scrum between San Francisco and Gurgaon by Narinder Kumar, again from Xebia. Since a few in the audience were still not aware of agile methodologies (Gasp!), Narinder gave a high level overview of the same before proceeding down the specific set of challenges his team had faced in implementing scrum in a scenario where one team was based in Gurgaon, India and another in San Fransciso, US. To be explicit, he wasn’t describing the typical scrum of scrum approaches but was instead describing a mechanism wherein the entire set of distributed teams would be treated as a single team with a single backlog and common ownership. This required some adjustments such as a meeting where only one person from one of the locations and all from another would take part in a scrum meeting in situations where there were no overlapping working hours. There were a few other such adjustments to the process also described. The presentation ended with some strong metrics which represented how productivity was maintained even as the activities moved from a single location to a distributed model. Both during the presentation and subsequently Narinder described some impressive associations with senior Scrum visionaries and also some serious interest in their modified approach from some important companies. However one limitation I could think of the model was, that it was probably better geared to work where you had developers only in one of the two locations (offshoring). I perceived the model as a little difficult to work if developers were located across all locations (though that could end up being just my view).

The second day started with a Panel Discussion on the topic Turning the Corner between Arun Gupta, Rohit Nayak, Dhananjay Nene (thats yours truly) and moderated by Harshad Oak. It was essentially a discussion about how we saw some of the java and even many non java related technologies evolving over the next few years. I think suffice to say one of the strong agreements clearly was the arrival of Java the polyglot platform as compared to Java the language.

The next session was Developing, deploying and monitoring Java applications using Google App Engine by Narinder Kumar. A very useful session describing the characteristics, opportunities and challenges with using Google App Engine as the deployment platform for Java based applications. One of the take away from the sessions was that subject to specific constraints, it was possible to use GAE as the deployment platform without creating substantial lockins since many of the Java APIs were supported by GAE. However there are a few gotchas along the way in terms of specific constraints eg. using Joins etc.

I must confess at having been a little disappointed with Automating the JEE deployment process by Vikas Hazrati. He went to great depths in terms of what all considerations a typical J2EE deployment monitoring tool should take care of, and clearly demonstrated having spent a lot of time in thinking through many of the issues. However the complexities he started addressing started to get into realms which only a professional J2EE deployment tool writer would get into. That made the talk a little less interesting for me. Besides there was another interesting talk going on simultaneously which I was keen on attending as well.

The other talk I switched to half way was Create Appealing Cross-device Applications for Mobile Devices with Java ME and LWUIT by Biswajit Sarkar (who’s also written a book on the same topic). While keeping things simple, Biswajit explained the capabilities of Java ME. He also described LWUIT which allowed creation of largely similar UI across different mobile platforms. He explained that while the default Java ME used native rendering leading to differing look and feel across mobile handsets just like Java AWT, using LWUIT allowed for a Java Swing like approach where the rendering was performed by the LWUIT library (did he say around 300kb??) thus allowing for a more uniform look and feel. He also showed sample programs and how they worked using LWUIT.

Allahbaksh Asadullah then conducted the session on Implementing Search Functionality With Lucene & Solr, where he talked about the characteristics and usage of Lucene and Solr. It was very explicitly addressed at the very beginners to the topic (an audience I could readily identify myself with) and walked us through the various characteristics of search, the different abstractions, how these abstractions are modeled through the API and how some of these could be overridden to implement custom logic.

How Android is different from other systems – An exploration of the design decisions in Android by Navin Kabra was a session I skipped. However I had attended a similar session by him earlier so hopefully I did not miss much.

However Navin did contribute occasionally into the next session Java For Mobile Devices – Building a client application for the Android platform by Rohit Nayak. Rohit demonstrated an application he is working on along with a lot of the code that forms the application using Eclipse and the Android plugin. A useful insight into how an Android application is constructed.

As the event drew to a close, the prizes were announced including those for the Indicthreads Go Green initiative. A thoroughly enjoyable event, leaving me even more convinced to make sure to attend the next years session making it a third in a row.

Rinse and Repeat with TDD

Posted by – November 25, 2009

I must confess having had a awed love relationship with TDD (Test Driven Development). This is the style where you first write the test cases and then write the code to make them pass. I’ve always felt awed by it. I have always firmly believed that it is the right way to offer sustainable quality. As someone who hasn’t been particularly awed by separate QA teams, I have always imagined TDD to be the silver bullet to offer controlled, sustained, high quality. Well, the silver bullet is a bit of an exaggeration, but sounded nice while I was on a roll :) . A number of times I wrote automated test cases post facto. But when the rubber met the road – I didn’t ever implement TDD.

Its probably a function of style. I like to write a piece of code and then keep on testing it against different scenarios and often keep on remoulding it substantially. My focus is at this stage on the code structure – the design aspects. Trying to do it with TDD always created a mess. The amount of intense refactoring I do always meant I soon got tired of also simultaneously refactoring the test cases and gave up on them soon.

Anyways, I finally think I found a formula that works – at least for me. I do not write production code in the first pass. I just write prototype code. I keep on playing with it, shaping it, challenging it, cajoling it, recasting it. Until I’m satisfied with the overall structure and internal design. Now when thats done, I rewrite it. Completely. Umm, I copy/paste maybe 40-50% of the code. But this time I write it on a completely new fresh set of files. And I write the test cases before I write the code. I find I can now really write tons of test cases and code quite rapidly. And with the tremendous satisfaction that at the end of the day, I now have the control harness to be able to keep on adding new features without having to continuously worry if I broke something else. With substantially detailed test cases, now I can actually feel quite confident that if I did break something – I’ll get to know it. Not next month, not next week, not even tomorrow – in the next few minutes.

So this approach seems to be working for me – the initial efforts have delivered great results and I’m quite satisfied. Now just wondering what to call it – for lack of a better term I shall term it “Rinse and Repeat with TDD”.

So if TDD works for you – great. You have one of the most important coding disciplines nailed down. But if you’re wanting to try to do TDD but haven’t been able to successfully implement it – you may just want to check out the Rinse and Repeat with TDD style. Who knows, you might find it useful too.

Five important trends on the enterprise architect’s radar

Posted by – November 2, 2009

It is no secret that the internet architectures are influencing enterprise architectures. This post attempts to summarise some of the recent trends in the internet space, which seem to be carrying some momentum sufficient enough to influence the enterprise. So without further ado, these trends are :

  1. REST : The Representational State Transfer architecture style builds on the essential elements of those constructs which made the internet so globally scalable. A detailed explanation of the rationale and strengths of REST are completely beyond the scope of this article. If your job requires you to be continuously aware of emergent trends and whether they fit your enterprise architecture needs – this is the one must explore trend.

    Impact : Web based architectures, Service Oriented Architectures, wide availability and immediate usability of data and processing requests (resources) through simple HTTP URIs and minimal integration effort

  2. Interoperable Cloud : The interoperable cloud is the ability to create a private cloud and also leverage a public cloud. This has been made possible by offerings such as the Ubuntu Enterprise Cloud which allows you to build a private cloud or use a public cloud such as Amazon EC2 while being able to access them using the same set of APIs thanks to open source efforts such as Eucalyptus. This allows you the flexibility of initially using either a private or public cloud and then subsequently shifting to the other, or being able to use both simultaneously.

    Impact : Large servers vs cluster of commodity servers, virtualisation, elastic deployments, flexible hardware procurement / provisioning, infrastructure management in organisational hierarchy.

  3. NoSQL : While I am unhappy with the name, it has stuck. This refers to a set of options now available to store your data unconstrained by many RDBMS requirements (eg. flexible schema, key value pairs etc.). Some of the databases also allow you to store data in a distributed manner over a number of servers with an intent to support high availability in write intense scenarios even as they may require you to move towards eventual consistency. These options increase your manouverability / flexibility as an architect even as they require you to meet a different set of challenges.

    Impact : Relational databases, data storage strategies, data distribution strategies, vertical vs. horizontal scalability, transactionalisation, consistency and availability

  4. Polyglotism : Developer costs now occupy an increasing percentage of total costs, development time is being an increasingly dominant factor for time to market, and ability of software to change and adapt quickly to newer demands is now a critical success metric. One of the solutions is to write different parts of the software in a different languages most appropriately suited for concise and rapid coding as well as supporting quick reaction changes to each part appropriately. Thus it is conceivable to have some of the business rules written in a dsl written using jruby and some of the algorithms written in clojure in a software built on the JEE platform.

    Impact :Development culture and processes, minimum developer skill and scalability, risk management for managing required vs. available skills.

  5. Decentralised processing : Thanks to many developments which are leading to increasingly distributed processing including REST and NoSQL, applications will need to be a set of collaborating network based components (we’ve heard this before with distributed objects as well). However especially given some of the lesser guarantees that such architectures can provide around immediate guaranteed processing, latency issues, distributed control and asynchronous processing, a particular piece of business logic may get satisfied in a staggered fashion across a number of collaborating components. This may increase challenges in terms of currency of available data even as it helps actually deliver on the vision of distributed objects and simplifies individual component development. While asynchronous capabilities such as those supported by MQ series and the like have been used in the enterprise for ages, I do anticipate increasing use of lighter messaging constructs such as PubSubHubbub within the enterprise.

    Impact : Application partitioning, network based components, difficulty in supporting fully synchronous workflows.