January, 2010


14
Jan 10

Software development is about the middle and not just the end

Software development often gets similar treatment as other disciplines when it comes to matters of management. Often at a cost. And the cost comes from the fact that, the learnings and best practices gleaned from a number of disciplines are hard to apply in the field of software development, without understanding and dealing with some of the essential differences. And those differences and their implications are what this post explores.

There was an interesting post recently Why program with continuous time?.

Today I read a confusion that I’ve heard many times before about continuous time, which I’d like to bring up, in part because I love continuous time, and in part because there’s a much broader underlying principle of software design.

I don’t see why the lack of continuous streams leaves a “gap”. In the end all streams are discrete

“In the end”, yes. Just as in the end, numbers are displayed as ascii numerals. However, programming is more about the middle than the end, i.e., more about composition than about output. … Similarly, continuity in space and in time is better for composition/modularity, leaving discreteness to the output step.

The post above is completely unrelated to management. Its talking about a programming style called Functional Reactive Programming. But an idea it eloquently brings out is programming is more about the middle than the end, i.e., more about composition than about output. And thats a thought that I would like to expand upon.

While arguably in all disciplines the middle is also important, I would emphasise that it is particularly more so in software development. And that stems from a number of uncertainties that abound in software. The requirements are often not available in adequate detail, and to the extent available, are not stable. Moreover solutions to these problems are not all thought through upfront. Not because people don’t do their homework – its simply because it is actually extremely difficult to visualise all the moving parts, and then anticipate and resolve all the challenges. In fact many of the challenges don’t show up until you start programming. Thus programming often is an exercise in identifying surprises, resolving them, working out algorithms, discovering flaws with them, fixing these flaws only to find more surprises. Rinse and repeat. Basically its a significant exercise in uncertainty management. In such a situation, estimation results in underestimates since these are based on visible not actual challenges and surprises. The solution that does get applied with some success is to use historic estimation errors to compensate current estimates. (Note that thats not always obvious since the estimate could have the compensation already built in).

To add to the problem there is usually no Bridge 2.0, Lathe 2.0 or a Marketing Campaign 2.0 (and to the extent there might be some, these are far less dependent upon the quality of the work that went into the 1.0). Some programmers are continuously focused on 1.0, while many are thinking about the 10.0 even as they work on the 1.0. Optimising for 1.0 alone may not be the right way to approach an initiative. In the good old design methodology one attempted to anticipate and build for a lot of future changes. This requires a lot of extra complexity and development in terms of layering, parameterisation, open extensibility etc. In conventional agile methodologies the focus is much more on the 1.0, but there is a deliberate and definite cost introduced in terms of creation of a large number of automated test cases. These automated test cases is the assurance premium to be paid, in order to provide the comfort to undertake the necessary refactoring challenges when moving from 1.0 to 2.0. Thus whichever way one looks at it, the 1.0 is always suboptimised to allow for a more optimal 2.0, 3.0 etc. (note: I am referring to 1.0 figuratively as the current initiative and higher version numbers to future initiatives building on the current one). But selection of the necessary suboptimisations, with a view to optimise the overall sequence, requires a substantial understanding of the software development processes, the code, the tools etc.

Finally programming has a far far higher human element involved than other disciplines. After all there’s hardly any raw material dependency and the tooling is often a desktop, the cloud and set of open source libraries. Thus it is subject to the vagaries and frailties of human behaviour. Coordination, Communication, Productivity, are all held hostage by this fickle human element. While programming is not unique in this respect, achieving high team productivity is as much a function of morale, motivation, and communication as perhaps any other discipline. Thus progress is often non linearly proportional to such soft factors.

I believe there’s some benefit in evaluating software development management styles with how sports teams are managed. In sports, team performances are rarely consistent. Moreover even though team playing skills are critical, raw talent is also very important. But the analogy is most appropriate when one evaluates the game. The game is often a series of surprises reacted to by immediate on the fly action. Thats where adaptability and ability to counter attack surprises, can be effectively brought to bear. And in sports as in software development, player morale and motivation needs to be enhanced even as the player arrogance or dysfunctional behaviours are contained.

Since different sports are structured differently let me clarify the roles here. In some cases the manager and coach are different. In others they are vested in the same person. In my mind the manager’s role is resource management, broader strategy planning, impediment removal and stakeholder communication. The role of the coach is to enhance team skills, ensure fitness and training, and in case of many sports actively guide the team in refining the on field strategy when the game is in progress. Now lets consider the players. In most cases the players are living the game. They are in the middle. A substantial portion of their interesting professional time is spent in the middle. And when they are doing that, they are continuously dealing with a set of constant surprises, even as they broadly attempt to achieve the deliverables required for the end. A slight difference between sports and software development is that in case of sports the time is fixed and the score is variable. In case of software development at least in the non agile world, the score (feature set) is fixed and the time therefore becomes the variable. But the interesting takeaway from this analogy is that once the game starts, the ends become the background and the game dominates. The players and the developers are all living in the middle.

Why is this particularly relevant ? It is so because it helps us understand the implication of management styles in software development teams performance. If one wants to optimise – one needs to very well understand the middle and not just the ends. And (to come to the point) thats where I have a big issue with software developers, who end up imagining that as they rise up in the hierarchy, they somehow can stop worrying about coding and quality assurance, and instead start focusing only on managing people and deadlines. Thats a mistake. Since the game is played in the middle, you need to be able to help your team play the game. And you need to do it as a coach or a captain. For that, so long as you want to manage an ongoing game, you need to keep on playing and keep on being either sufficiently conversant about whats happening in the middle or be the captain and play the game with the team. Thinking that somehow you’ve grown in stature enough to not worry about the tactics, and focus on things increasingly strategic might actually make you unhelpful. The game requires skill and continuous adaptation – and you will need to be able to play, until you one day decide, you are no longer interested in managing the game but only the periods between.

The non playing managers who attempt to manage the game on the other hand display some interesting smells. They stop understanding the sport. They start believing that since they had played some particular sport many years ago, its easy for them to now manage any other sport. They are unable to appreciate the actual game play that happens. So when required to score a particular number of goals in 90 minutes (say in soccer), they start pushing in more team members (alas, this is a luxury not available to sports teams). Now on field coordination starts getting more difficult. The newer members being relatively junior are the ones who start getting intercepted more often (thus the team keeps on losing the ball more frequently). If the team luck is all run out, the same managers also attempt to run the tactics and gameplay selection. This is based on some fairly old set of experiences. The spectators (customers) start seeing far more confusion on the field. And the objectives change from playing the game well to just scoring the required number of goals. And in the end the spectators may get to see the number of goals promised, but only through a confused dissatisfying game, and that too sometimes only after many bouts of extra time. And soon enough once the non playing managers realise they are unable to effectively control the game play, they try to manage other variables. But really whats starting to happen is that they are hurting the gameplay, and sometimes instead of managing the game, they start managing the spectators with the game on autoplay. Suboptimisation maximised. Alas, sometimes the spectators also get used to this and start forgetting what a good game used to be like.

To summarise, ends are good to have, ends are important, ends provide the background urgency. But software development is not so much about managing the end goals as much as managing the middle. If you want to build good software and help your team build good software, make sure you continue to focus on how software is built not just the quantitative targets. Make sure you live and enjoy the middle. More often than not, the ends you achieve eventually could be amongst the best possible ends that were practical – but more importantly the customers will also go home satisfied. So long as you keep your eye on the ball and don’t lose sight of the middle. And if you want to view it a bit philosophically, in the end we are all dead. Only the middle matters.

Authors Note : This post has been substantially revised from its earlier version from a syntax, grammar and punctuation perspective. And the philosophical note (the last line) was added later.


6
Jan 10

Concise python code

I love python for a number of reasons. Brevity is one of them. Readability is another. However writing readable concise code can take some time to getting used to. This post discusses some code snippets especially in the context of some of the comparisons done between python and clojure – another language which excels at brevity and shares yet another feature with python, ie. people either love or hate its syntax, they are rarely indifferent to it.

Please note : The intent of this post is not to compare LOC of python with other languages. It is meant to demonstrate python code implementation which helps describe the capabilities of python in terms of writing clean, concise, readable code. This post continuously refers to other posts and may require you to read the referred posts in conjunction with this post to get maximum value.

Code and ceremony

The post Java vs Clojure – Lets talk ceremony! describes a simple problem which requires 28 LOC in Java and 1 or 9 in clojure. It is a simple logic which creates a list, populates it with four values (two of them identical) and creates a list with unique items. There are two alternative clojure implementations, one in 1 line of code and another in 9 – the former using a built in construct in the language which especially helps to write a more concise logic, and the latter by writing code similar to the Java code. Without any further ado, here are the two alternative python implementations

a. By using built in capabilities – In this case we use the set collection

1
print set(("foo","bar","baz","foo"))

b. Now here’s where we will not use a built in operator which automatically chooses the unique elements but again write code similar to the java code. This code is as follows :

1
print reduce(lambda x,y:  x + [y] if y not in x else x ,("foo","bar","baz","foo"),[])

There – both of them are exactly one line long.

Project Euler solutions
The post Python vs Clojure – Evolving refers to many sample python implementations solving some of the problems documented on project euler site. We shall revisit these python implementations here.

a. Find all numbers dividable by 3 or 5 in 0 < x < 1000:
In this case the clojure code is 4 lines long where as the python implementation is 3 lines long

1
print sum(i for i in range(1,1000) if i%3 == 0 or i%5 == 0)

Again the implementation is exactly 1 line long – no lambdas, no filters etc.

b. Get the sum of all even Fibonaccis below 4 mill.

The suggested python implementation is about 17 lines long. The implementation below 7.

1
2
3
4
5
6
7
8
9
from itertools import takewhile

def fib():
    x,y = 1,1
    while True :
        x,y = y,x+y
        yield x

print sum(i for i in takewhile(lambda x : x < 4000000,fib()) if i%2 == 0)

c. Finding Palindromes
The suggested python solution uses about 23 lines of code, to 6 in clojure and takes 15 seconds almost 300% slower than the clojure code (emphasis from the post referred to).

Here’s another alternative implementation.

1
print max(x * y for x in xrange(100,1000) for y in xrange(100,1000)  if str(x*y) == str(x*y)[::-1])

Again if you notice – exactly one line of code. And incidentally on my notebook it clocks in about 0.85 seconds which makes the clojure implementation about 470% slower than the python implementation. (emphasis mine)

Top Rank Per Group

In yet another post Python vs Clojure – Reloaded, the author discusses the Rosetta Top Rank Per Group implementation of python. Unsurprisingly the python implementation uses 11 lines of code (not including the initialisation) compared to clojure’s 6.

One of the commenters suggests an eminently readable and simple python solution in about 6 lines of code. The code snippet is as follows

1
2
3
4
5
6
7
8
9
10
11
from itertools import groupby
from operator import itemgetter

data = [dict(name="Tyler Bennett", id="E10297", salary=32000, department="D101"), ...]

department = itemgetter('department')
for dep, persons in groupby(sorted(data, key=department), department):
    print "\nDepartment", dep
    print " Employee Name   Employee ID     Salary          Department"      
    for person in sorted(persons, key=itemgetter('salary'):
        print "%(name)-15s %(id)-15s %(salary)-15s %(department)-15s"

An alternative far less readable solutions which shaves off one more line but certainly not preferred to the solution above (am just listing it since it shaves off the step of having to create an intermediate data structure) is

1
2
3
4
5
6
7
8
9
10
import operator
from itertools import groupby
   
data = [('Employee Name', 'Employee ID', 'Salary', 'Department'),....]

print reduce(lambda x,y : x + y,
         (('\nDepartment %s\n  Employee Name   Employee ID     Salary          Department  ' % dept +
           reduce(lambda x,y : x + ("\n  " + "%-15s " * len(data[0]) % y),
                  sorted(recs, key=lambda rec: -rec[-2])[:3],''))
         for dept, recs in groupby(sorted(data[1:],key=operator.itemgetter(3)), lambda x : x[3])),'')

Some might wonder whether I am joining multiple lines of code into one just to reduce the line count. The code is consistent with the idiomatic usage of python and also with the way python list comprehensions are documented in the formal python proposal for list comprehensions PEP 202 : List Comprehensions. Besides python is a whitespace sensitive language and the compiler disallows multiple lines of code in one line without using the ‘;’ separator which I have most definitely not used in the suggested solutions.

PS: There’s of course an interesting image at the end of the post Python vs Clojure – Reloaded indicating the perception of how clojure code speaks for itself ;) . The reason I mention it is that I believe if one is expressing a strong opinion as the one reflected by the picture, one better be far more careful and thorough with verifying the veracity of the assertions appropriateness of the samples compared to. Let me clearly state that I do not know that the clojure loc compared to is an appropriate target so do not infer this post to be a Python vs Clojure LOC as much as a post which emphasises that Python is a good vehicle to write concise readable code. FWIW I am learning clojure and look forward to building more competence with it.

Update: There is another nice post by Stephan Schmidt – Anatomy of a Flawed Clojure vs. Scala LOC Comparison which does reflect an opinion in the context of Scala and a post authored on the same blog as the ones I refer to above.