Software development is about the middle and not just the end

Posted by – January 14, 2010

Software development often gets similar treatment as other disciplines when it comes to matters of management. Often at a cost. And the cost comes from the fact that, the learnings and best practices gleaned from a number of disciplines are hard to apply in the field of software development, without understanding and dealing with some of the essential differences. And those differences and their implications are what this post explores.

There was an interesting post recently Why program with continuous time?.

Today I read a confusion that I’ve heard many times before about continuous time, which I’d like to bring up, in part because I love continuous time, and in part because there’s a much broader underlying principle of software design.

I don’t see why the lack of continuous streams leaves a “gap”. In the end all streams are discrete

“In the end”, yes. Just as in the end, numbers are displayed as ascii numerals. However, programming is more about the middle than the end, i.e., more about composition than about output. … Similarly, continuity in space and in time is better for composition/modularity, leaving discreteness to the output step.

The post above is completely unrelated to management. Its talking about a programming style called Functional Reactive Programming. But an idea it eloquently brings out is programming is more about the middle than the end, i.e., more about composition than about output. And thats a thought that I would like to expand upon.

While arguably in all disciplines the middle is also important, I would emphasise that it is particularly more so in software development. And that stems from a number of uncertainties that abound in software. The requirements are often not available in adequate detail, and to the extent available, are not stable. Moreover solutions to these problems are not all thought through upfront. Not because people don’t do their homework – its simply because it is actually extremely difficult to visualise all the moving parts, and then anticipate and resolve all the challenges. In fact many of the challenges don’t show up until you start programming. Thus programming often is an exercise in identifying surprises, resolving them, working out algorithms, discovering flaws with them, fixing these flaws only to find more surprises. Rinse and repeat. Basically its a significant exercise in uncertainty management. In such a situation, estimation results in underestimates since these are based on visible not actual challenges and surprises. The solution that does get applied with some success is to use historic estimation errors to compensate current estimates. (Note that thats not always obvious since the estimate could have the compensation already built in).

To add to the problem there is usually no Bridge 2.0, Lathe 2.0 or a Marketing Campaign 2.0 (and to the extent there might be some, these are far less dependent upon the quality of the work that went into the 1.0). Some programmers are continuously focused on 1.0, while many are thinking about the 10.0 even as they work on the 1.0. Optimising for 1.0 alone may not be the right way to approach an initiative. In the good old design methodology one attempted to anticipate and build for a lot of future changes. This requires a lot of extra complexity and development in terms of layering, parameterisation, open extensibility etc. In conventional agile methodologies the focus is much more on the 1.0, but there is a deliberate and definite cost introduced in terms of creation of a large number of automated test cases. These automated test cases is the assurance premium to be paid, in order to provide the comfort to undertake the necessary refactoring challenges when moving from 1.0 to 2.0. Thus whichever way one looks at it, the 1.0 is always suboptimised to allow for a more optimal 2.0, 3.0 etc. (note: I am referring to 1.0 figuratively as the current initiative and higher version numbers to future initiatives building on the current one). But selection of the necessary suboptimisations, with a view to optimise the overall sequence, requires a substantial understanding of the software development processes, the code, the tools etc.

Finally programming has a far far higher human element involved than other disciplines. After all there’s hardly any raw material dependency and the tooling is often a desktop, the cloud and set of open source libraries. Thus it is subject to the vagaries and frailties of human behaviour. Coordination, Communication, Productivity, are all held hostage by this fickle human element. While programming is not unique in this respect, achieving high team productivity is as much a function of morale, motivation, and communication as perhaps any other discipline. Thus progress is often non linearly proportional to such soft factors.

I believe there’s some benefit in evaluating software development management styles with how sports teams are managed. In sports, team performances are rarely consistent. Moreover even though team playing skills are critical, raw talent is also very important. But the analogy is most appropriate when one evaluates the game. The game is often a series of surprises reacted to by immediate on the fly action. Thats where adaptability and ability to counter attack surprises, can be effectively brought to bear. And in sports as in software development, player morale and motivation needs to be enhanced even as the player arrogance or dysfunctional behaviours are contained.

Since different sports are structured differently let me clarify the roles here. In some cases the manager and coach are different. In others they are vested in the same person. In my mind the manager’s role is resource management, broader strategy planning, impediment removal and stakeholder communication. The role of the coach is to enhance team skills, ensure fitness and training, and in case of many sports actively guide the team in refining the on field strategy when the game is in progress. Now lets consider the players. In most cases the players are living the game. They are in the middle. A substantial portion of their interesting professional time is spent in the middle. And when they are doing that, they are continuously dealing with a set of constant surprises, even as they broadly attempt to achieve the deliverables required for the end. A slight difference between sports and software development is that in case of sports the time is fixed and the score is variable. In case of software development at least in the non agile world, the score (feature set) is fixed and the time therefore becomes the variable. But the interesting takeaway from this analogy is that once the game starts, the ends become the background and the game dominates. The players and the developers are all living in the middle.

Why is this particularly relevant ? It is so because it helps us understand the implication of management styles in software development teams performance. If one wants to optimise – one needs to very well understand the middle and not just the ends. And (to come to the point) thats where I have a big issue with software developers, who end up imagining that as they rise up in the hierarchy, they somehow can stop worrying about coding and quality assurance, and instead start focusing only on managing people and deadlines. Thats a mistake. Since the game is played in the middle, you need to be able to help your team play the game. And you need to do it as a coach or a captain. For that, so long as you want to manage an ongoing game, you need to keep on playing and keep on being either sufficiently conversant about whats happening in the middle or be the captain and play the game with the team. Thinking that somehow you’ve grown in stature enough to not worry about the tactics, and focus on things increasingly strategic might actually make you unhelpful. The game requires skill and continuous adaptation – and you will need to be able to play, until you one day decide, you are no longer interested in managing the game but only the periods between.

The non playing managers who attempt to manage the game on the other hand display some interesting smells. They stop understanding the sport. They start believing that since they had played some particular sport many years ago, its easy for them to now manage any other sport. They are unable to appreciate the actual game play that happens. So when required to score a particular number of goals in 90 minutes (say in soccer), they start pushing in more team members (alas, this is a luxury not available to sports teams). Now on field coordination starts getting more difficult. The newer members being relatively junior are the ones who start getting intercepted more often (thus the team keeps on losing the ball more frequently). If the team luck is all run out, the same managers also attempt to run the tactics and gameplay selection. This is based on some fairly old set of experiences. The spectators (customers) start seeing far more confusion on the field. And the objectives change from playing the game well to just scoring the required number of goals. And in the end the spectators may get to see the number of goals promised, but only through a confused dissatisfying game, and that too sometimes only after many bouts of extra time. And soon enough once the non playing managers realise they are unable to effectively control the game play, they try to manage other variables. But really whats starting to happen is that they are hurting the gameplay, and sometimes instead of managing the game, they start managing the spectators with the game on autoplay. Suboptimisation maximised. Alas, sometimes the spectators also get used to this and start forgetting what a good game used to be like.

To summarise, ends are good to have, ends are important, ends provide the background urgency. But software development is not so much about managing the end goals as much as managing the middle. If you want to build good software and help your team build good software, make sure you continue to focus on how software is built not just the quantitative targets. Make sure you live and enjoy the middle. More often than not, the ends you achieve eventually could be amongst the best possible ends that were practical – but more importantly the customers will also go home satisfied. So long as you keep your eye on the ball and don’t lose sight of the middle. And if you want to view it a bit philosophically, in the end we are all dead. Only the middle matters.

Authors Note : This post has been substantially revised from its earlier version from a syntax, grammar and punctuation perspective. And the philosophical note (the last line) was added later.

Concise python code

Posted by – January 6, 2010

I love python for a number of reasons. Brevity is one of them. Readability is another. However writing readable concise code can take some time to getting used to. This post discusses some code snippets especially in the context of some of the comparisons done between python and clojure – another language which excels at brevity and shares yet another feature with python, ie. people either love or hate its syntax, they are rarely indifferent to it.

Please note : The intent of this post is not to compare LOC of python with other languages. It is meant to demonstrate python code implementation which helps describe the capabilities of python in terms of writing clean, concise, readable code. This post continuously refers to other posts and may require you to read the referred posts in conjunction with this post to get maximum value.

Code and ceremony

The post Java vs Clojure – Lets talk ceremony! describes a simple problem which requires 28 LOC in Java and 1 or 9 in clojure. It is a simple logic which creates a list, populates it with four values (two of them identical) and creates a list with unique items. There are two alternative clojure implementations, one in 1 line of code and another in 9 – the former using a built in construct in the language which especially helps to write a more concise logic, and the latter by writing code similar to the Java code. Without any further ado, here are the two alternative python implementations

a. By using built in capabilities – In this case we use the set collection

print set(("foo","bar","baz","foo"))

b. Now here’s where we will not use a built in operator which automatically chooses the unique elements but again write code similar to the java code. This code is as follows :

print reduce(lambda x,y:  x + [y] if y not in x else x ,("foo","bar","baz","foo"),[])

There – both of them are exactly one line long.

Project Euler solutions
The post Python vs Clojure – Evolving refers to many sample python implementations solving some of the problems documented on project euler site. We shall revisit these python implementations here.

a. Find all numbers dividable by 3 or 5 in 0 < x < 1000:
In this case the clojure code is 4 lines long where as the python implementation is 3 lines long

print sum(i for i in range(1,1000) if i%3 == 0 or i%5 == 0)

Again the implementation is exactly 1 line long – no lambdas, no filters etc.

b. Get the sum of all even Fibonaccis below 4 mill.

The suggested python implementation is about 17 lines long. The implementation below 7.

from itertools import takewhile

def fib():
    x,y = 1,1
    while True :
        x,y = y,x+y
        yield x

print sum(i for i in takewhile(lambda x : x < 4000000,fib()) if i%2 == 0)

c. Finding Palindromes
The suggested python solution uses about 23 lines of code, to 6 in clojure and takes 15 seconds almost 300% slower than the clojure code (emphasis from the post referred to).

Here’s another alternative implementation.

print max(x * y for x in xrange(100,1000) for y in xrange(100,1000)  if str(x*y) == str(x*y)[::-1])

Again if you notice – exactly one line of code. And incidentally on my notebook it clocks in about 0.85 seconds which makes the clojure implementation about 470% slower than the python implementation. (emphasis mine)

Top Rank Per Group

In yet another post Python vs Clojure – Reloaded, the author discusses the Rosetta Top Rank Per Group implementation of python. Unsurprisingly the python implementation uses 11 lines of code (not including the initialisation) compared to clojure’s 6.

One of the commenters suggests an eminently readable and simple python solution in about 6 lines of code. The code snippet is as follows

from itertools import groupby
from operator import itemgetter

data = [dict(name="Tyler Bennett", id="E10297", salary=32000, department="D101"), ...]

department = itemgetter('department')
for dep, persons in groupby(sorted(data, key=department), department):
    print "\nDepartment", dep
    print " Employee Name   Employee ID     Salary          Department"      
    for person in sorted(persons, key=itemgetter('salary'):
        print "%(name)-15s %(id)-15s %(salary)-15s %(department)-15s"

An alternative far less readable solutions which shaves off one more line but certainly not preferred to the solution above (am just listing it since it shaves off the step of having to create an intermediate data structure) is

import operator
from itertools import groupby
   
data = [('Employee Name', 'Employee ID', 'Salary', 'Department'),....]

print reduce(lambda x,y : x + y,
         (('\nDepartment %s\n  Employee Name   Employee ID     Salary          Department  ' % dept +
           reduce(lambda x,y : x + ("\n  " + "%-15s " * len(data[0]) % y),
                  sorted(recs, key=lambda rec: -rec[-2])[:3],''))
         for dept, recs in groupby(sorted(data[1:],key=operator.itemgetter(3)), lambda x : x[3])),'')

Some might wonder whether I am joining multiple lines of code into one just to reduce the line count. The code is consistent with the idiomatic usage of python and also with the way python list comprehensions are documented in the formal python proposal for list comprehensions PEP 202 : List Comprehensions. Besides python is a whitespace sensitive language and the compiler disallows multiple lines of code in one line without using the ‘;’ separator which I have most definitely not used in the suggested solutions.

PS: There’s of course an interesting image at the end of the post Python vs Clojure – Reloaded indicating the perception of how clojure code speaks for itself ;) . The reason I mention it is that I believe if one is expressing a strong opinion as the one reflected by the picture, one better be far more careful and thorough with verifying the veracity of the assertions appropriateness of the samples compared to. Let me clearly state that I do not know that the clojure loc compared to is an appropriate target so do not infer this post to be a Python vs Clojure LOC as much as a post which emphasises that Python is a good vehicle to write concise readable code. FWIW I am learning clojure and look forward to building more competence with it.

Update: There is another nice post by Stephan Schmidt – Anatomy of a Flawed Clojure vs. Scala LOC Comparison which does reflect an opinion in the context of Scala and a post authored on the same blog as the ones I refer to above.

Post conference recap : The 4th Indicthreads.com conference on Java Technology

Posted by – December 14, 2009

The annual indicthreads.com java technology conference is Pune’s best and possibly one of India’s finest conferences on matters related to Java technologies. I looked forward to attending the same and was not disappointed a bit. The last one was held about 3 days ago on Dec 11th and 12th, and this post reviews my experiences at the same.

As with any other conference usually something or the other isn’t quite working well in the morning, so I soon discovered we had a difficulty with the wireless network being swamped by the usage. There were some important downloads that needed to be completed, so my early morning was spent attempting to get these done .. which meant I missed most of Harshad Oak’s opening session on Java Today.

The next one i attended was Groovy & Grails as a modern scripting language for Web applications by Rohit Nayak. However I soon discovered that it (at least initially) seemed to be a small demo on how to build applications using grails. Since that was something I was familiar with, I moved to the alternative track in progress.

The one I switched to even as it was in progress was Java EE 6: Paving the path for the future by Arun Gupta. Arun had come down from Santa Clara to talk about the new Java EE6 spec and its implementation by Glassfish. Arun talked about a number of additional or changed features in Java EE6 in sufficient detail for anyone who got excited by them to go explore these in further detail. These included web fragments, web profile, EJB 3.1 lite, increased usage of annotations leading to web.xml now being optional, and a number of points on specific JSRs now a part of Java EE6. Some of the things that excited me more about Glassfish were, (a) OSGi modularisation and programmatic control of specific containers (eg Servlet, JRuby/Rails etc.), embeddability, lightweight monitoring. However the one that excited me the most was the support for hot deployment of web apps for development mode by allowing the IDEs to automatically notify the running web app which in turn automatically reloaded the modified classes (even as the sessions continued to be valid). The web app restart cycle in addition to the compile cycle was alway one of my biggest gripes with Java (second only to its verbosity) and that seemed to be going away.

I subsequently attended Getting started with Scala by Mushtaq Ahmed from Thoughtworks. Mushtaq is a business analyst and not a professional programmer, but has been keenly following the developments in Scala for a couple of years (and as I later learnt a bit with Clojure as well). Unlike a typical language capability survey, he talked only about using the language for specific use cases, a decision which I thought made the presentation extremely useful and interesting. The topics he picked up were (a) Functional Programming, (b) DSL building and (c) OOP only if time permitted. He started with an example of programming/modeling the Mars Rover movements and using functions and higher order functions to do the same. Looking back I think he spent lesser time on transitioning from the requirements into the code constructs and in terms of what he was specifically setting out to do in terms of higher order functions. However the demonstrated code was nevertheless interesting and showed some of the power of Scala when used to write primarily function oriented code. The next example he picked up was a Parking Lot attendant problem where he started with a Java code which was a typical implementation of the strategy pattern. He later took it through 7-8 alternative increasingly functional implementations using Scala. This one was much easier to understand and yet again demonstrated the power of Scala quite well in terms of functional programming. Onto DSLs, Mushtaq wrote a simple implementation of a “mywhile” which was a classical “while” loop as an example of using Scala for writing internal DSLs. Finally he demonstrated the awesome power of using the built in support for parser combinators for writing an external DSL, and also showed how a particular google code of summer problem could be solved using Scala (again for writing an external DSL). A very useful and thoroughly enjoyable talk.

The brave speaker for the post lunch session was Rajeev Palanki who dealt both with overall IBM directions on Java and a little about MyDeveloperworks site. In his opinion he thought Java was now (post JDK 1.4) on the plateau of productivity after all the early hype and IBM now focused on Scaling up, Scaling down (making it easier to use at the lower end), Open Innovation (allow for more community driven innovation) and Real Time Java. He emphasised IBMs support to make Java more predictable for real time apps and stated that Java was now usable for Mission Critical applications referring to the fact that Java was now used in a USS Destroyer. He referred to IBMs focus on investing in Java Tooling that worked across different JRE implementations. Tools such as GCMV, MAT, and Java Diagnostic Collector. Finally he talked about the IBM MyDeveloperWorks site at one stage referring to it as the Facebook for Geeks.

The next session was Overview of Scala Based Lift Web Framework by Vikas Hazarati, Director, Technology at Xebia. Another thoroughly enjoyable session. Vikas dealt with a lot of aspects related to the Lift web framework including various aspects related to the mapper, the snippets, usage of actors for comet support etc. I was especially intrigued by Snippets which act as a bridge between the UI and the business logic have a separate abstraction for themselves in the framework and how the construct and functionality in that layer is treated so differently from other frameworks.

I subsequently attended Concurrency: Best Practices by Pramod Nagaraja who works on the IBM JRE and owns the java.nio packages (I think I heard him say owns). He talked about various aspects and best practices related to concurrency and one of the better aspects of the talk was how seemingly safe code can also end up being unsafe. However he finished his session well in time for me to quickly run over and attend the latter half of the next presentation.

Arun Gupta conducted the session Dynamic Languages & Web Frameworks in GlassFish which referred to the support for various non java environments in Glassfish including those for Grails/Groovy, Rails/JRuby, Django/Python et. al. The impression I got was Glassfish is being extremely serious about support for the non java applications as well and is dedicating substantial efforts to make Glassfish the preferred platform for such applications as well. Arun’s blog Miles to go … is most informative for a variety of topics related to Glassfish for both Java and non Java related aspects.

The last talk I attended during the day was Experiences of Fully Distributed Scrum between San Francisco and Gurgaon by Narinder Kumar, again from Xebia. Since a few in the audience were still not aware of agile methodologies (Gasp!), Narinder gave a high level overview of the same before proceeding down the specific set of challenges his team had faced in implementing scrum in a scenario where one team was based in Gurgaon, India and another in San Fransciso, US. To be explicit, he wasn’t describing the typical scrum of scrum approaches but was instead describing a mechanism wherein the entire set of distributed teams would be treated as a single team with a single backlog and common ownership. This required some adjustments such as a meeting where only one person from one of the locations and all from another would take part in a scrum meeting in situations where there were no overlapping working hours. There were a few other such adjustments to the process also described. The presentation ended with some strong metrics which represented how productivity was maintained even as the activities moved from a single location to a distributed model. Both during the presentation and subsequently Narinder described some impressive associations with senior Scrum visionaries and also some serious interest in their modified approach from some important companies. However one limitation I could think of the model was, that it was probably better geared to work where you had developers only in one of the two locations (offshoring). I perceived the model as a little difficult to work if developers were located across all locations (though that could end up being just my view).

The second day started with a Panel Discussion on the topic Turning the Corner between Arun Gupta, Rohit Nayak, Dhananjay Nene (thats yours truly) and moderated by Harshad Oak. It was essentially a discussion about how we saw some of the java and even many non java related technologies evolving over the next few years. I think suffice to say one of the strong agreements clearly was the arrival of Java the polyglot platform as compared to Java the language.

The next session was Developing, deploying and monitoring Java applications using Google App Engine by Narinder Kumar. A very useful session describing the characteristics, opportunities and challenges with using Google App Engine as the deployment platform for Java based applications. One of the take away from the sessions was that subject to specific constraints, it was possible to use GAE as the deployment platform without creating substantial lockins since many of the Java APIs were supported by GAE. However there are a few gotchas along the way in terms of specific constraints eg. using Joins etc.

I must confess at having been a little disappointed with Automating the JEE deployment process by Vikas Hazrati. He went to great depths in terms of what all considerations a typical J2EE deployment monitoring tool should take care of, and clearly demonstrated having spent a lot of time in thinking through many of the issues. However the complexities he started addressing started to get into realms which only a professional J2EE deployment tool writer would get into. That made the talk a little less interesting for me. Besides there was another interesting talk going on simultaneously which I was keen on attending as well.

The other talk I switched to half way was Create Appealing Cross-device Applications for Mobile Devices with Java ME and LWUIT by Biswajit Sarkar (who’s also written a book on the same topic). While keeping things simple, Biswajit explained the capabilities of Java ME. He also described LWUIT which allowed creation of largely similar UI across different mobile platforms. He explained that while the default Java ME used native rendering leading to differing look and feel across mobile handsets just like Java AWT, using LWUIT allowed for a Java Swing like approach where the rendering was performed by the LWUIT library (did he say around 300kb??) thus allowing for a more uniform look and feel. He also showed sample programs and how they worked using LWUIT.

Allahbaksh Asadullah then conducted the session on Implementing Search Functionality With Lucene & Solr, where he talked about the characteristics and usage of Lucene and Solr. It was very explicitly addressed at the very beginners to the topic (an audience I could readily identify myself with) and walked us through the various characteristics of search, the different abstractions, how these abstractions are modeled through the API and how some of these could be overridden to implement custom logic.

How Android is different from other systems – An exploration of the design decisions in Android by Navin Kabra was a session I skipped. However I had attended a similar session by him earlier so hopefully I did not miss much.

However Navin did contribute occasionally into the next session Java For Mobile Devices – Building a client application for the Android platform by Rohit Nayak. Rohit demonstrated an application he is working on along with a lot of the code that forms the application using Eclipse and the Android plugin. A useful insight into how an Android application is constructed.

As the event drew to a close, the prizes were announced including those for the Indicthreads Go Green initiative. A thoroughly enjoyable event, leaving me even more convinced to make sure to attend the next years session making it a third in a row.

Rinse and Repeat with TDD

Posted by – November 25, 2009

I must confess having had a awed love relationship with TDD (Test Driven Development). This is the style where you first write the test cases and then write the code to make them pass. I’ve always felt awed by it. I have always firmly believed that it is the right way to offer sustainable quality. As someone who hasn’t been particularly awed by separate QA teams, I have always imagined TDD to be the silver bullet to offer controlled, sustained, high quality. Well, the silver bullet is a bit of an exaggeration, but sounded nice while I was on a roll :) . A number of times I wrote automated test cases post facto. But when the rubber met the road – I didn’t ever implement TDD.

Its probably a function of style. I like to write a piece of code and then keep on testing it against different scenarios and often keep on remoulding it substantially. My focus is at this stage on the code structure – the design aspects. Trying to do it with TDD always created a mess. The amount of intense refactoring I do always meant I soon got tired of also simultaneously refactoring the test cases and gave up on them soon.

Anyways, I finally think I found a formula that works – at least for me. I do not write production code in the first pass. I just write prototype code. I keep on playing with it, shaping it, challenging it, cajoling it, recasting it. Until I’m satisfied with the overall structure and internal design. Now when thats done, I rewrite it. Completely. Umm, I copy/paste maybe 40-50% of the code. But this time I write it on a completely new fresh set of files. And I write the test cases before I write the code. I find I can now really write tons of test cases and code quite rapidly. And with the tremendous satisfaction that at the end of the day, I now have the control harness to be able to keep on adding new features without having to continuously worry if I broke something else. With substantially detailed test cases, now I can actually feel quite confident that if I did break something – I’ll get to know it. Not next month, not next week, not even tomorrow – in the next few minutes.

So this approach seems to be working for me – the initial efforts have delivered great results and I’m quite satisfied. Now just wondering what to call it – for lack of a better term I shall term it “Rinse and Repeat with TDD”.

So if TDD works for you – great. You have one of the most important coding disciplines nailed down. But if you’re wanting to try to do TDD but haven’t been able to successfully implement it – you may just want to check out the Rinse and Repeat with TDD style. Who knows, you might find it useful too.

Five important trends on the enterprise architect’s radar

Posted by – November 2, 2009

It is no secret that the internet architectures are influencing enterprise architectures. This post attempts to summarise some of the recent trends in the internet space, which seem to be carrying some momentum sufficient enough to influence the enterprise. So without further ado, these trends are :

  1. REST : The Representational State Transfer architecture style builds on the essential elements of those constructs which made the internet so globally scalable. A detailed explanation of the rationale and strengths of REST are completely beyond the scope of this article. If your job requires you to be continuously aware of emergent trends and whether they fit your enterprise architecture needs – this is the one must explore trend.

    Impact : Web based architectures, Service Oriented Architectures, wide availability and immediate usability of data and processing requests (resources) through simple HTTP URIs and minimal integration effort

  2. Interoperable Cloud : The interoperable cloud is the ability to create a private cloud and also leverage a public cloud. This has been made possible by offerings such as the Ubuntu Enterprise Cloud which allows you to build a private cloud or use a public cloud such as Amazon EC2 while being able to access them using the same set of APIs thanks to open source efforts such as Eucalyptus. This allows you the flexibility of initially using either a private or public cloud and then subsequently shifting to the other, or being able to use both simultaneously.

    Impact : Large servers vs cluster of commodity servers, virtualisation, elastic deployments, flexible hardware procurement / provisioning, infrastructure management in organisational hierarchy.

  3. NoSQL : While I am unhappy with the name, it has stuck. This refers to a set of options now available to store your data unconstrained by many RDBMS requirements (eg. flexible schema, key value pairs etc.). Some of the databases also allow you to store data in a distributed manner over a number of servers with an intent to support high availability in write intense scenarios even as they may require you to move towards eventual consistency. These options increase your manouverability / flexibility as an architect even as they require you to meet a different set of challenges.

    Impact : Relational databases, data storage strategies, data distribution strategies, vertical vs. horizontal scalability, transactionalisation, consistency and availability

  4. Polyglotism : Developer costs now occupy an increasing percentage of total costs, development time is being an increasingly dominant factor for time to market, and ability of software to change and adapt quickly to newer demands is now a critical success metric. One of the solutions is to write different parts of the software in a different languages most appropriately suited for concise and rapid coding as well as supporting quick reaction changes to each part appropriately. Thus it is conceivable to have some of the business rules written in a dsl written using jruby and some of the algorithms written in clojure in a software built on the JEE platform.

    Impact :Development culture and processes, minimum developer skill and scalability, risk management for managing required vs. available skills.

  5. Decentralised processing : Thanks to many developments which are leading to increasingly distributed processing including REST and NoSQL, applications will need to be a set of collaborating network based components (we’ve heard this before with distributed objects as well). However especially given some of the lesser guarantees that such architectures can provide around immediate guaranteed processing, latency issues, distributed control and asynchronous processing, a particular piece of business logic may get satisfied in a staggered fashion across a number of collaborating components. This may increase challenges in terms of currency of available data even as it helps actually deliver on the vision of distributed objects and simplifies individual component development. While asynchronous capabilities such as those supported by MQ series and the like have been used in the enterprise for ages, I do anticipate increasing use of lighter messaging constructs such as PubSubHubbub within the enterprise.

    Impact : Application partitioning, network based components, difficulty in supporting fully synchronous workflows.

Service Oriented Architecture is primarily about business and not technology. Bollocks!

Posted by – October 26, 2009

There’s quite a few times I’ve heard / read a gross oversimplification of architecture in reference to business and technology. And while I believe I understand the ‘essential cause’ which drives such a simplification, I’ve often felt quite frustrated at the resultant impression thats provided by such a simplification. In many ways and forms, it boils down to the statement (not exactly the same since I’m not quoting directly), quite similar to the one below :

Service Oriented Architecture is primarily about business and not technology

This also reflected in the recent SOA Manifesto which states as its very first described value :

Business value over technical strategy

Allow me to straight away start picking some holes into this :

  1. Anything that a business does – whether it is soa, software architecture, building architecture or simple plant and machinery design, to the extent (which is exactly 100%), technology serves the business goals, all technology activities (and non-technology as well) are at the end of the day about achieving business objectives and therefore about business. So why single out architecture? And even more so why single out SOA?
  2. Architecture is also about business. But its not the same as saying its primarily about business and not so much about technology. For a moment lets step away from Software/Hardware Architecture and look at Building Construction Architecture. The legendary creation of Ayn Rand – Howard Roark, for all his eccentricities and seemingly portrayed egocentric and egotistic behaviour did meet the test of business objectives to the extent of making the residents of his creations extremely satisfied. And at no point would you gather the impression that he in any manner put construction technology to any secondary position to his business context and objectives. At the end of the day thats what architecture is. It is not about making one of business or technology more important than or subservient to other. Its about effectively mapping the two to provide a strong technology solution appropriate to the business needs.

I suspect one of the important causes here is that people have forgotten that the A in SOA stands for architecture, and therefore shorn of architecture, business and technology can be seen to be competing in a non win-win form.

So if SOA stands for Service Oriented Architecture, then I must submit that architecture is the art of getting the two working together. And I am of the opinion that an exercise suggesting one is more important than other is an exercise in a field unrelated to architecture. I suspect many practicing software architects will agree with this. I suspect Ayn Rand wouldn’t disagree as well.

Stop calling me NoSQL

Posted by – October 23, 2009

Dear Reader,

Apologies for sending this note to you completely unannounced and out of the blue. However I find myself in a peculiar situation of having a very weird name being dumped upon me. While I am indifferent to the name per se, I am greatly pained as I realise that it is a completely inappropriate name. What is even more confounding is the very bunch of people who have happily assigned me the name and continue to popularise it belong to that class of people some of whom actually are extremely particular about accurate nomenclature and have no hesitation in creating a 100 letter class or function name by concatenating 20 words just to make sure the name is unambiguous and conveys the intent clearly.

Ahh.. but I digress and impose upon you without introducing myself adequately first. I am a data storage style. I am not new, but lately far too many a software engineer have started taking a liking for me. Ever since I have been around, I have with great amounts of jealousy watched my cousin the RDBMS being courted by the finest of engineers (in all honesty there were some fine engineers interested in me too, but far too few compared to my cousin). But lately multiple concurrent developments have made a fair amount of attention come my way too.

You see unlike RDBMS, I don’t require that data be clearly split into tables, columns and rows. I can work with data the way it is most naturally represented. As a tree of individual data fields, lists, arrays, dictionaries etc. Also I do not require that you always clearly define each and every possible schema element before being able to store data corresponding to the schema. I can happily accept a schema dynamically or even work without a schema. Some of my early forms were based on key value pairs stored as B-Trees (eg. Berkeley DB). Over the years people have figured out ways to represent the data as a set of decomposed document elements, store data spread across a cluster, replicate it for better availability and fault tolerance, and even perform post storage processing tasks using map-reduce sequences. But really what separates me from my cousin and other storage systems is that I don’t make demands on the data – I take it in its naturally found form and then store it, replicate it, slice it, dice it and glean information out of it. And therein lies my true identity – I will work with data the way the data is best represented with all its arbitrary inconsistencies and inabilities to always clearly specify a constraining schema. And the engineers who’ve spent time with me seem to have enjoyed it quite a bit.

But the horror of it – they gave me a completely inappropriate moniker – ‘NoSQL’. First and foremost I exist to promote a storage style and thats what identifies me. I work with data in its natural and arbitrary forms. Therefore to make it seem like I represent a lack of something else is utterly missing the point. The SQL in NoSQL stands for Structured Query Language, which depends upon Fixed Structure Relational Data. Since I change the very nature of the data being stored, that SQL is not required or relevant is automatic and inconsequential.

Its like calling a under-the-ocean-mountain_range as NoIgloo. Its dead obvious igloos will not be found there. But calling that mountain range NoIgloo is a big disservice to visitors. You use that as a marketing term, attract people, then tell them that NoIgloo actually has nothing to do with Igloos – its got to do with mountains and oceans, and that they need to first unwind all the confusion they created in their minds due to NoIgloo and then go through a phase of reunderstanding mountains and oceans. And while they came prepared for a possibly warmer place given the name NoIgloo – it actually is a wet place so they need to again change their garments and equipment for the journey. A wholely avoidable situation.

Update: Brad Anderson pointed out this interesting post NoSQL: A Modest Proposal which traces the genesis of my name which leaves me very very disappointed. Almost seems to suggest that people are flocking together and naming me not based on something inherently powerful about me – but as a mechanism to demonise my cousin RDBMS. This is most unfortunate, since we actually end up being useful in very different situations and more often than not are likely to complement each other rather than compete with each other. I do hope a better moniker does prevail over time

What I would like is to see a better / more appropriate name for me. Hmm .. call me free form storage, natural persistence or flexi schema storage or perhaps something else even more appropriate (this blog owner prefers “natural persistence”). Each of these conveys far more about me far more accurately than NoSQL does. Basically please please call me something better than NoSQL. So can I request you to carry forward my plea by further forwarding and retweeting this to your friends and ask them how they can so callously call me by such a silly name when they take the utmost precautions in properly naming their classes and methods. Plead with them to stop doing this and please work with others to give me a better name. I think it will cause less confusion over the coming months and years, and the field of software shall recover its glorious tradition of maintaining precision in communication by using accurate naming.

Sincerely,
The one who doesn’t want to be called NoSQL

PS : As a background to this there was an interesting conversation earlier today between this blog owner dnene and Kent Beck on twitter, where Kent so kindly and graciously helped carry forward the thought process of helping identify my essential characteristics, and it is in no small part, thanks to this conversation that I was able to articulate myself and my grief. I reproduce that conversation below. (Update: though in all likelihood Kent’s intent was to help clarify the thought rather than contest the names. In hindsight, it makes sense to ask for permission to reproduce conversations .. even when such are on the public twitter stream – something that wasn’t done in this case. :( )

Twitter ID Tweet / Message
dnene NoSQL is such an inappropriate name. NoTables at least makes a little more sense.
KentBeck @dnene but what would nosql be called if you wanted to say something positive about it?
dnene @KentBeck Thats a great question .. still thinking .. best thought so far – FlexiStore (though not good enough yet :( )
KentBeck @dnene what can you do with a nosql store that you can’t do with an sql database? why would you be excited to use one?
dnene @KentBeck I see where u r going with this (a) unconstrained & composite storage (b) store resources not records (c) shard/scale horizontally
.@KentBeck I think there is a merit in attempting to define nosql in terms of what it is rather than what it isn’t
KentBeck @dnene there are many more people confused about what datastore to use than who hate sql. the positive approach appeals to the former.
dnene @KentBeck Agreed .. and I’m aware of many more who wonder why we need a different datastore than the RDBMSs. NoSQL as a name doesn’t help.
KentBeck @dnene well, why *do* we need a different data store?
dnene @KentBeck Primary Need : We need support for flexible/arbitrary schemas with complex depths – RDBMSs don’t dance well in this space.
@KentBeck Secondary Need : Support for deferred processing required for analytics (eg. Map/Reduce). RDBMS don’t do too bad a job here
@KentBeck Tertiary Need (not one that I’ve felt strongly yet) : Distributed and horizontally scalable storage on commoditized h/w.
KentBeck @dnene it seems like you’re looking for realistically structured data, not data twisted to fit a formula convenient for mathematicians.
dnene @KentBeck Yes.. thats it! I’m looking for realistically or naturally structured data storage / persistence. Rocks compared to the term nosql
@KentBeck Wonder if the term arbitrarily structured makes sense as well. This has been one heck of a conversation/Q&A so far +1:)
KentBeck @dnene glad you found it helpful. you get bonus points if the opposite of the name you pick is unattractive, a la “structured programming”

NoSQL – A fluid architecture in transition

Posted by – October 21, 2009

Lot of talk about NoSQL. Much of it well deserved. And while lot of the excitement around it is well understood by those in the know, some of it may actually be confusing to those who are relatively new to the matter. This post is actually for the latter group – not to argue for or against NoSQL – just to put it in perspective.

What is NoSQL : Some of the characteristics shared by most if not all the NoSQL engines are as follows :

  1. Schemaless or Hierarchical Schema Storage NoSQL assumes at its very basis a schemaless or a hierarchichal schema storage system. In most cases this consists of a simple key value pair storage. While some storage engines excel at storing small values (LightCloud/Tokyo Cabinet), some are strong at storing large documents (CouchDB).
  2. Distributed storage : This is one of the driving forces of NoSQL growth, though not a distinguishing characteristic of NoSQL. One of the areas these storage systems separate themselves from RDBMS’s is their ability to allow better horizontal scalability. This varies from the simple master-master replication for MongoDB, to multi node sharding using consistent hashing with LightCloud (a la memcached) to a multiple master eventual consistency model of Riak. The basic premise in using some of the NoSQL engines is that storage will scale horizontally.
  3. Support for deferred processing :Many of these engines allow for some degree of deferred processing. Whether this be simple lua scripting in case of LightCloud or map-reduce scripts in case of CouchDB, the general assumption is that some amount of latency in computation times is acceptable, and some of the computations (especially related to analytics based views) will be performed post storage.
  4. Eventual Consistency : This may seem like a necessary feature of all NoSQL storage systems but it isn’t. While clearly some such as CouchDB (in terms of its map-reduce views) and Riak are better placed for supporting and implementing eventual consistency, it is quite feasible to use others such as LightCloud or MongoDB to implement immediate consistency using a single master-master pair. Suffice to note that eventual consistency is not a necessary side effect of using a NoSQL storage system, though it wouldn’t be incompatible for the two to work together.

But the points I would really like to emphasize are :

  • NoSQL is not a direct competitor to RDBMS/SQL : It is actually a solution to many use cases where using RDBMS was perhaps a poor fit. Thus the decision for an architect is not which of the two competing options (RDBMS or NoSQL) Update: one should selectshould be the preferred standard storage strategy, – it simply is which one is the more appropriate storage system for the application under consideration.
  • NoSQL is still at a fluid stage of its development : All the NoSQL storage systems (but for LightCloud/Tokyo Tyrant) are still being quite actively developed. These have not reached v 1.0 (Update: MongoDB is at v 1.1) and it is likely that some time will pass before any of these get beyond the beta and release candidate stage and get a 1.0 in-production stamp. While there is a lot of interest, there still is a substantial amount of experimentation in terms of the right feature sets leading to differently focused developments in different storage systems. To an architect this represents an interesting challenge. I think the way to approach this right now is to not use these in mission critical (eg. life or health impacting) systems, and to focus on reasonable expectation management in terms of ensuring the right kind of SLAs around their availability (simply because many of these haven’t yet been put to intense use in production the way say an Oracle or MySQL have been). This is not an attempt to spread “FUD” about NoSQL – far from it it is an exercise in setting appropriate expectations. i would also recommend that it would be appropriate to evaluate the available NoSQL choices only when reasonable SLAs can be worked out for their usage. It is certainly preferred to using NoSQLs rather than using RDBMS’s in an inappropriate manner (large objects serialised into BLOBs or into name-value pair tables). However, I would suggest that you do not deeply bind yourself into a particular NoSQL engine. The future development of most of the storage systems is still unknown to a certain extent, as is the future landscape including any shakeouts. Should one recommend usage of a NoSQL engine – it is important to have a clear plan for switching over to an alternative engine should a need arise in the future. While this is easier said than done, deciding the appropriate level of abstraction to use (ie. code to the API directly vs. use a layer of abstraction for engine independence) is best left to designer / architect to dwell upon.