Aug 14

This is not a post in defense of outsourcing. However every so often I come across a rant like Why Outsourcing Sucks and much as I emotionally connect to such posts which talk about poor software quality, I find the rant and the underlying thoughts a highly misguided exercise. This post attempts to place outsourcing in an overall economics based perspective and explores some of the future trends in how they are likely to impact software quality.

Before attempting to explore and define problems around outsourcing, lets first understand why it exists.

There is something right about outsourcing.

Clearly there has to be something right about outsourcing for it to have come so far. Stating that outsourcing results in crappy code or shitty quality and therefore is wrong questions a collective wisdom of the market, a collective wisdom that while I have reservations about, I will always place above mine. So how does outsourcing help ?

  • Outsourcing has supported the wall street model of seeking fiscal efficiencies : Wall street encourages companies to be fiscally responsible and financially competitive. Quality of products is in relative terms an irrelevant parameter. While toplines and bottomlines are to be maximised, quality is meant to be at a level “the market will bear”. For a variety of reasons, including all listed below, outsourcing has helped support better fiscal efficiencies.
  • Outsourcing helps reduce costs (in the short to medium term at least) :This is a no brainer. However it is often assumed to be the primary reason for outsourcing, which I am not so sure of.
  • Outsourcing helps reduce risk :It allows companies to have lesser people on their rolls, and allows them an ability to ramp up or ramp down under growing or shrinking market scenarios in a much easier way.
  • Outsourcing increases a customers negotiability (initially):Anyone who has attempted to get software delivered from an internal development team and an external vendor will most certainly tell you that the negotiating power is much higher at least in initial days with a external vendor than with internal development teams.
  • Outsourcing allows ability to service an exploding demand :Lets face it, for all the jobs that outsourcing might have transferred overseas or made redundant, these are a small fraction of the total number of jobs that outsourcing has created. There is only one reason for that - exploding demand for software development and maintenance. If the outsourcing surge had not happened all the participating economies individually would’ve been worse off than they are today (though the extent of the difference is quite different across various economies).
  • Outsourcing has relatively little to do with quality :Outsourcing is a fiscal phenomenon. Management is a function of business imperatives. Quality is a result of engineering processes. Business imperatives impact engineering processes far more than outsourcing does. These imperatives are not driven by outsourcing alone, but by many other yardsticks which drive outsourcing itself. If you worry about ways to improve software quality, focus on these business imperatives and not on outsourcing alone. Some of these imperatives are quarterly / annual targets, get it to the market at all costs, we need to get in there first we will fine tune the offering later etc. etc.

Relationship to Software Development.

I shall now explore some of the factors I touched on above in the context of Software Development.

  • Cost Reduction : Cost reduction is achieved in many different ways.
    • Transferring jobs to lower paying markets : This is probably the easiest understood mechanism. There’s enough written on this topic for me to add anything more.
    • Transferring jobs to lesser skilled people : If you want to reduce costs, one way is to hire lesser skilled people thus having to pay them less. One particularly nasty way this manifests itself is that under such scenarios, seasoned and experienced personnel either need to work across a larger number of projects simultaneously (architects, consultants etc.) or need to move out of development and into management (a much much more common occurrence at least in India). There are very real economic underpinnings to why 8 or more out of 10 developers in India are likely to say project manager when asked what would they like to be in the next 5 years. Whichever way you look at it, the average skill level deployed on a project in engineering terms does go down.
    • Increasing % utilisation of associates : Higher focus on costs leads to a higher focus on billable utilisation. This reduces the amount of time available to learn, study, research, prototype etc.
    To summarise, cost reduction leads to a quicker movement of senior engineers into management roles, greater breadth of projects for senior engineers who continue to be in software development, and lesser ability to focus on learning and research.

  • Risk Reduction : Since outsourcing allows customers to ramp up or down easily, it increases the onus on the vendors leading to a large number of people “on bench”. This allows project head counts to be increased easily when required. It is sufficiently known that adding 10 people to your project is not going to increase your productivity ten fold of that you would get by adding 1 person. However the ability to ramp up given the bench strengths tempts managers into quickly adding as many people to get the next release out faster. Since people get added in a hurry under situations when the next release is needed quickly, net productivity per person suffers, and the increased difficulty in doing the skill ramp up of all the people rapidly results in quality sliding down.

  • Increased customer negotiability :While this is definitely helpful in many ways in terms of helping lower costs, there are some serious side effects as well. I would argue that even the best amongst us is unlikely to be unimpacted by a higher negotiating power. The simple law of nature is that more the negotiability, use it. Vendors are of course perfectly aware that vendor negotiability is directly proportional to the time spent on the project and that it will improve over time. When customer negotiability gets used to an extreme, in a competitive scenario, the vendor simply says yes to expectations that may be unlikely to be fulfilled. While questionable, this is a practice which has grown to an extent where quite often customers factor it in and is often defended by if I don’t say yes, my competition will. This sets in motion a negative spiral that only the very brave can even attempt to try to compensate. And as the saying goes fools rush in where angels fear to tread.

  • Exploding demand :One of the big reasons why quality has been falling is this. As the demand explodes, it is harder and harder to maintain average skill levels. It is but obvious today that average skill and training levels are far lower today than 5 or 10 or 15 years ago (and they have been continuously falling). The ability to deploy skilled and trained personnel on a consistent basis to the exploding demand is non existent in the current combined planet wide educational systems. Falling quality is as much a function of our educational and training systems to keep pace with demand as with anything else.

  • Business Imperatives :In todays economic environment, managers are rewarded much more for meeting the quarterly and annual targets, than 5 year targets. We have created this system and it is likely to be around for quite some time to come. There are good reasons why it exists, it allows for much better fiscal risk management, and results in superior capital investment decisions. For software developers however it creates a conflict, because good software does not get created in a quarter (and if it were to, the target would just get moved by adding much more functionality in the same quarter given the exploding demand). Even if it saddens us deeply, probably the most appropriate way to deal with it is to recognise its merits in fiscal and economic terms and deal with it as a reality that is unchangeable in the short to medium term.

Having said that there is an increasing awareness of the cost of maintaining poor quality software. Customers and Vendors are both finding it increasingly difficult to keep a lid on costs when maintaining and extending such software. For companies where this cost is a high percentage of their total costs, this is threatening their very ability and viability to stay competitive. I think most business managers today understand this and are trying to find a way out of this but only with varying degrees of success.

Why criticism of outsourcing is occasionally misplaced

I have often heard of issues such as differing time zones, cultural mismatches, communication snafus, skill diferences etc. as big issues with outsourcing. These are all indeed solvable problems. It would be a great underestimation of ingenuity of today’s business enterprises if we assume or believe that they are less than capable to do so. I submit that the reason companies haven’t gotten around to working on these issues as much as they could’ve is simply because these problems are not high up on the radar. As I have attempted to explain, managers are concerned with a plethora of issues, software quality being only one of them. If vendors were to find their business drying up, due to poor software quality, some of which is as a result of aspects of outsourcing as listed above, believe me vendors will fix the problem pronto. They have the skills, they have the ingenuity, but the economic drivers are not strong enough today. This is not a problem with outsourcing per se but with relative importance of economic drivers.

Another difficulty is in what is perceived to be quality. There is sometimes a difference in how quality is perceived across the development community and business managers. For a developer poor quality is when the software does not work as requested or as stated or as designed for. For a manager, poor quality is when the customer turns away and declines to offer repeat business due to unhappiness with the software.

Finally this issue about lower skill sets. I completely agree at the risk of making a broad generalisation that the average entry level skills in the software development market are far lower in India than in the United States. But this is a manifestation of the fact that India got into the software act a lot later and when the development demand boomed, it fortunately or unfortunately had the ability to deploy a large number of software developers even though at lesser entry skill levels. (The statement above is likely to hold for other highly developed and developing economies as well). If hypothetically India and other developing countries had not got into this act, I have no hesitation in believing that given the booming demand, average skill levels in the industry would’ve gone down sharply in the developed countries as well. This is again an economic event much more a function of the population levels, wage disparities and output of the educational systems. To blame this on outsourcing is to shoot the messenger.

So whats next

Let me summarise some of the points made so far :
  • Businesses will continue to attempt to be more financially prudent :What this means is that while this prudence benefits us in general terms, it will continue to attempt to increase productivity and drive down costs.
  • Businesses will continue to be focused on quarterly and annual results :Love it or hate it, this system is here to stay for some time. If we can adjust software development to be in tune with it, probably all of us can be better off. Methodologies such as short iterations and frequent releases, YAGNI, continuous refactoring are likely to be better supportive of this environment.
  • Exploding demand for software :This is likely to continue unabated at least for a few more years or a decade. A large part of the world doesn’t yet invest heavily into software development yet but if food and energy demands are any indication, as more countries start consuming more software, this demand growth is likely to accelerate.
  • Crappy software is better than no software :This is probably the most difficult and perhaps a most profound thought for many developers. As software demand grows, and as our worldwide ability to keep pace with it shrinks, crappy software is likely to deliver positive economic results compared to lack of software. Combine this with focus on short term results and you will realise that customers are more likely to tolerate buggy software than ever before in times to come, and if customers tolerate it, vendors will find ways to test these tolerance levels.

So will things never improve for software quality ? I think two factors will help, one in the really long term and one in the medium term.

  • Software demand will start slowing and productivity improvements and better training will catch up eventually : While it does look very far away, it will happen one day. We are finding ways to create better tools, improved methodologies, offer better training etc. Software demand will peak one day and ability to service it will catch up. Thats when customers will start being intolerant of poor quality software. Thats when vendors will most certainly deliver good quality software.
  • SAAS model will help take the load off the demand supply gap :In terms of ability to create an economic impact per developer, the Software as a service - SAAS model works far superior to custom application development. (For the purposes of this discussion a SAAS offering is expected to be simultaneously used by a large number of customers). Development costs are only a small part of the total revenues in case of SAAS. However customer biteback in case of defects is much stronger due to a larger number of customers using the same software instance. Thus SAAS will take some pressure off development cost reduction, focus on a higher quality threshold from day one, and take some pressure off the demand supply gap. This to me is the biggest positive side of things in the days to come from a software quality perspective. However even this will take some time to play itself out. When that does happen, SAAS is likely to be the biggest demand killer in terms of the demand for software developer head counts on a worldwide basis.

If you are a developer or user who is frustrated with the quality of software, there are the following things you can do :

  • Recognise that some of the current situations (problems ?) are here to stay and be prepared to deal with them : If you can accept it (and it might be unacceptable for many) it will allow you to deal with the issues in a more rational and non emotional way.
  • You can choose to focus on buying / supplying high quality software : There certainly will be some customers and vendors who will go down this path. Try to identify customers / vendors / partners who share this belief system. Even if the entire industry quality levels don’t improve substantially, your individual experiences are likely to be better. However there is an economic impact of this choice and make sure you can absorb that by figuring out a way to offer a compelling value proposition to your customers. Individual developers who are particularly upset about quality levels should be willing to work for companies with a higher commitment levels to quality along with the concomitant adjustments to individual growth path and economic outlook if any.
  • Move to SAAS wherever feasible :I am not suggesting this out of any particular fascination for SAAS. There is built in argument in SAAS which is based on sound economics which encourages high quality software development. And while arguments based on emotional or moral merits are tough to win, arguments based on economics win by themselves. Developers wanting to work on high quality software development are better likely to find their needs satisfied with SAAS vendors.

PS : Lest it be misconstrued, let me clarify I am often anguished and occasionally offended by the state of software quality. I do not intend to justify poor quality at all. However I have learnt that that before solving a problem, it is important to frame it correctly. This post is an attempt to frame the current situation appropriately in economic terms and not in either emotional or moral terms since that has to be the first step in dealing with it.

Aug 12

Interesting post in the Agile Journal, “Software Testing in an Agile Environment”. Great article and a reflection on how the software testing function would be influenced in an agile environment. IMHO, the author does a nice job, but perhaps could have gone a little bit further in terms of exploring how the development and testing functions could get merged within the same set of people, and this post attempts to do the same. Some comments I have are as follows.

But, for the QA professional an Agile approach causes discomfort - In the ideal world they would have a ‘finished’ product to verify against a finished specification. To be asked to validate a moving target against a changing backdrop is counterintuitive

This is no different than developers adapting themselves from a situation where they would’ve expected a set of clean requirements and/or design specifications. To be asked to develop a moving target against changing backdrop is counter-intuitive as well.

For some, the role of QA is now questionable citing Test Driven Development (TDD) as the key to testing. But, what is most important is that QA is directly involved in the agile scrums all the way through, to be an integral part of the team designing the tests, at the same time as the requirements and solutions evolve.

Absolutely.

1. “You only need to unit test - TDD testing is sufficient”

For the vast majority of commercial developments this simply isn’t true. Even strong proponents of agile development recognize the need for their armory to include a range of testing techniques.

Here the assumption is that TDD is in some way meant to constrain itself to unit testing. While most examples of TDD do show unit testing, I haven’t actually come across an opinion which says TDD is limited to unit testing. In fact this is one of the most important aspect to be understood if one needs to place QA in the right perspective in TDD environments.

TDD stands for test driven development. These tests could be white box or black box, unit or integration. Once it sinks in that TDD encompasses all forms of testing, it is easier to work out the workflow for the delivery teams. The most obvious implication is that the QA function needs to work very closely with development in deciding the tests of all types up front (prior to writing code). Given the agile proclivity for executable code and executable tests as the specification for code and unit tests, the same would apply to functional and acceptance tests as well. One issue here is usage of Record and Playback tools. I suspect these can no longer be used in a TDD environment simply because these assume the application under test is completely ready when writing the tests. (Record and Playback tools can be used as supporting tools additionally .. but they cannot meet the expectations of a Test Driven Development workflow). Coming back to executable tests whether these be coded in jUnit or HttpUnit or whatever one’s choice of tools, these can still be coded upfront and checked in before the code gets written to ensure that the code that eventually comes out meets the QA expectations.

Therein to me lies the biggest adaptations that testing functions shall need to adapt to. For most small to medium projects, in a TDD environment, the members (who perform both development and testing roles) write the tests and the code thus obviating the need of a separate testing department. Agile Environments will drive development and testing skills to be merged within the same set of people. While this increases the learning curve, it also substantially increases productivity since the entire communication stream between development team and testing team is largely eliminated, the occasional blame game between the two now just needs to get resolved within one persons mind.

However there are situations where I think classification of members into developers and testers might be called for eg. where a particular product needs to be tested on a v. wide range of hardware / software platforms independently, where writing the executable tests calls for a high level of capability and skill that is of a specialised nature etc. For such teams, their rhythm will now need to adapt. Instead of developers writing code and delivering it to testers, testers shall write tests and deliver these to developers who shall then need to write code to pass them. I believe this is the most important leap that the author of the article wasn’t able to make.

3. “Developers write the tests using open-source tools, so we no longer need testers, or automation tools”

Professional testers fulfill a different and equally valid role from their development colleagues.

Heres the tradeoff . Separating developers and testers has benefits of increased specialisation but levies high costs in terms of communication overheads, resolution of mismatched understandings, and crossing departmental boundaries in some cases. Merging the responsibility into one team member (development + testing) helps reduce all this costs even though the same person now needs to be trained in terms of having both the skillsets. Importantly, it clearly identifies where the buck stops (and eliminates the occasional ping pong between development and testing) resulting in developers verifying their own code with a much higher level of vigour resulting in a higher quality. I would submit there would be a large number of projects out there (but certainly not all) where merging these functions and removing the separation between developers and testers makes sense.

Often, TDD projects have at least as much test code as application code and, therefore, are themselves software applications. This test code needs to be maintained for the life of the target application.

All the more reason why development and testing responsibilities could get merged.

7. “Developers have adequate testing skills”

If testing was easy everybody would do it and we’d deliver perfect code every time. Sadly, many organizations that invest in increasing a developer’s coding skills and provide them with the latest integrated development toolsets, fail to see the need to develop the equivalent testing skills for their QA team, or provide them with the tools to do the job either effectively or efficiently.

Even better option - train team members to do both development and testing and equip them with the appropriate tools.

An independent testing team serves as an objective third-party, able to see “the big picture”, to validate the functionality and quality of the deliverable. While developers tend towards proving the system functions as required, a good tester will be detached enough to ask “what happens if…?” When you include business user testing as well, you are more likely to have a system that is fit for purpose.

Why would you want to have developers who cannot see the big picture. This is a training issue not a separation of function issue. Moreover I have always found it a bit strange that one can trust one person to write the code but not believe in him to be able to think through various combinations. I would argue that it is more cost effective to have the members simultaneously trained in “how to write code …” and “what happens if …” thinking.

10. “Developers and Testers - are like oil and water”

Since the dawn of time there has often been a “them and us” tension between developers and testers. This is usually a healthy symbiotic relationship which, when working correctly, provides a mutually beneficial relationship between the two groups resulting in a higher quality deliverable for the customer.

I would submit that this has been a high cost separation. Probably at least half (and perhaps many more) of the projects out there would benefit from these responsibilities getting merged in terms of increased productivity and quality. A small project should only have customers (or their representatives or proxies if not available directly) for specifying what is needed and conducting acceptance testing, and members (who have abilities to develop and test) who service these needs. Larger projects may choose to have a more classified team (domain experts, developers, tool builders, graphic designers, testers etc.), however such classification does and will make them lesser agile (as in the english word, not the methodology) to some extent.

Aug 06

A lot of posts talk about how caching can improve performance and how one can use tools such as memcached for the same. It is a great tool, and I am certain it does wonders, but I do wonder if it and its usage is getting too much press at the cost of other caching options.

This post talks about the various dimensions that should be carefully examined before deciding upon the caching strategy. I would like to argue that the most difficult part of caching is not necessarily learning or selecting the caching tool but the detailed analysis of the application, its architecture and its data flow, that actually precedes the caching tool selection. I also submit that once this analysis is conducted and the various dimensions as described below well understood, the actual cache design then becomes a relatively simpler and lesser risky activity. Before getting into the dimensions of cache design, there is some important groundwork to be done.

Note : I am referring to data caching. I am not referring to optimisation of opcodes / byte codes / instruction sets etc. which is a different class of caching.

Know your application well :

Let me emphasise this. If you want to decide on the most appropriate caching strategy, you need to have a sufficiently good understanding of your application. This is relatively easy if you are enhancing the caching capabilities in an existing application or are rebuilding an existing application. But if you are building a completely new application, some of the data listed below may not be easily available and you may need to project a little bit after understanding the requirements and a high level data / logic flow. One of the things you really need to have either a clear understanding or at least some reasonable projection of, are the various data elements in the application, how frequently will they be accessed and modified and given the processing flow, which elements make sense to be cached. You will also need to have a good understanding of how critical is it to have the most current data (eg will a 1 min. stale data be acceptable) and the transactional semantics around the data processing (are strict ACID semantics a must ?).

Note that for the rest of this post when I mention data, I shall be referring to cachable data.

Know your performance targets :

Unless you are clear about your performance targets, it will be quite difficult to work out the appropriate strategy. I would suggest you should have clear targets defined (for a specified hardware) before proceeding down this path. Some times the target specifies very ambitious hardware which you may not have access to in early stages of the development. Try to convert these targets into appropriate targets for the hardware you shall be working with (with some reasonable margin of safety).

Know your architectural guidelines / preferences :

Will you be using a single server or multiple ? When handling future growth would you prefer to scale up or scale out ? What languages will you be using ? Would you prefer to use a single database instance or would you want to create vertically partitioned shards ? As we shall see, these are extremely important considerations before getting into cache design. Sometimes the decision making is not sequential, so you may end up revisiting these topics once you get into your cache design.

Dimensions

We shall now explore some of the various dimensions that influence cache design.
Continue reading »

Jul 23
Just realised, have been blogging for more than 6 months now (actually I had started another blog ages ago .. but that tapered off soon then). Over this period, I believe I learnt or adopted a few practices. Just sharing them here. Feel free to comment. YMMV.


  1. Treat your readers like a jury not as customers :By jury, I mean a jury as in a academic thesis not as in a court. Whats the difference ?
    • With customers you sell, with a jury you defend your perspective. You may think you are selling your views, but a jury doesn’t shell out any money to buy them. This makes a typical sales process a much more harder and onerous task than just defending. Most readers aren’t out to buy, they are out to learn more and interact more.
    • With customers you assume they may not know all about your product, so you focus on educating them in general towards making a pitch. With a jury you assume they already know far more than you do in general, but you attempt to educate them and draw them into a discussion into something specific that you have spent your time on, on something specific that you are presenting.
    • In a defense, the onus is on you to provide credible backing evidence. In a sales pitch the onus is on the customer to verify your pitch. Most readers would prefer to not carry the additional overhead of having to verify your statements. If you have provided the rationale for your statements clearly and supported it with available evidence if relevant, you have made the readers job much easier. You have increased the chances of the reader wanting to come back to your blog.


  2. Make a strong statement. Avoid taking strong positions : Allow me to define this. By position I mean making absolutist statements without providing a sufficient context or a frame of reference or assuming ones own frame of reference as the only valid one. There is a wide diversity of readers out there. Some are into client side, some into server side. Some are into high usability, some into high speed processing. Some are doing graphics algorithms, some others are into CRUD and business validations. A large majority of your readers are likely to have a different frame of reference than yours. If they can’t understand where you are coming from, they will assume you are coming from the same context that they do. And they are likely to feel confused when what you say doesn’t end up matching their world view. A statement like “I found X more suitable than Y under a context Z” rather than a position like “X is better than Y” is more helpful since :
    • You get to describe your context. Your statement is a statement within a context. It is not treated as a blanket position. Readers with different contexts and divergent views can sometimes trace the differences to the context. Such readers can still suggest alternative views within other contexts easily without appearing to contradict you. Readers with similar contexts and divergent views can still choose to take you on.
    • You have lesser chances of being misinterpreted. You don’t want to get caught in an interview a year down the road when you are changing your job from writing a forms based application to one where you might be required to build say a graphics processing engine, where your interviewer might have just read your blog, and your posts actually do not make sense in the newer context.
    • When you make a strong statement without taking a strong position, readers record their agreement / disagreement with the post rather than you or your blog in general. I personally find that a much more comforting thought than readers choosing to agree / disagree with the blog in general.


  3. Be prepared to update your blog soon :There is a large number of smart people out there, often a lot smarter than us, or having a difference experience set than us. As the comments start coming in, you start learning things you wish you knew before you wrote the post. If the comments indicate something useful and relevant to the post that you would’ve wanted to include in the post had you known about it earlier - go ahead, add it into the post. A convention I have seen is that all non trivial changes after the initial posting should be prefixed with the word “Update:” or “Updates:” so that readers can make out you’ve changed something after your initial post. A comment or two may be especially relevant. It helps to be able to review the comments regularly and update the post if relevant soon. If you are going to be traveling soon, either submit your post a little earlier or post it a little later - but post it when you know you will be able to review the comments and will have the flexibility to take 5 to 10 minutes off your regular work to update the blog if necessary.


  4. Be prepared for surprises : Even if you write carefully you will end up making a small set of readers either happy or disappointed with you in a manner that will leave you puzzled. However hard you try there is a good likelihood someone is going to misquote you or take you on strongly in an unanticipated way. Some of this may be unavoidable and needs to be factored into your assumptions. However some of it will be avoidable, and do follow up such incidents to figure out if there are any learnings that you can apply the next time. A great way to do so is to write a mail back to the commenter or to the blogger who may have linked to your post and get a better understanding of his/her viewpoint.


  5. Don’t title spam your readers : Every so often I come across a post with a provocative title, but which does not live up to the title at all. I prefer call this title spamming, since lot of the spam I receive has a provocative title, but often irrelevant content. Title is important. It influences readership strongly. But if you title spam regularly, it might help you get 2-3 posts higher readership, but its going to hurt in the longer run.


  6. Understand how blog aggregators and networks work :It is important to understand the demographics of different blog aggregators. If you would like your blog to be read by larger number of people, be clear in your mind which demographics you are targeting when writing your post. Some aggregators like javablogs.com and artima.com will target specific programming languages and work off an RSS feed. Explore your blogging software and see if it offers category / tag based feeds. If it does use the categories / tags to ensure your rss feed registered with these aggregators sends only relevant posts to them. I use wordpress and it supports tag / category based RSS feeds. Networks like dzone.com, news.ycombinator.com, reddit.com, slashdot.org, digg.com have very different demographics. Don’t blanket post to all networks. Register your post with those networks where the readers are likely to find your post helpful. I have occasionally come across people wondering whether one should register one’s own posts to a network. My opinion is that it is an acceptable activity.


  7. Ensure you have blog analytics enabled : Over a longer period of time you will start gleaning useful information about your readers. eg. what part of the world do they come from, which links do they come from (eg. you can get statistical information about the referrers such as google reader (RSS), blog aggregators, blog networks etc.). You can also get information about what searches led the search engines to your blog. I prefer wordpress.com stats plugin for wordpress and google analytics. The former is better at providing more immediate feedback, whereas the latter is more comprehensive.


  8. Pay attention to search engines as well :Most blog aggregators and networks will drive substantial traffic to your blog for the first 24-48 hours. Search engines will send a small trickle initially. However there is a big difference. Traffic from aggregators and networks will dry up after a few days for any post. But traffic from search engines will keep on coming. Over a sustained period of time, search engines can start driving a substantial traffic to your blog. Read up about Search Engine Optimisation and see if you can help your blog. I would recommend however that you use such optimisation fairly and only to the extent that it is not misleading.
Jul 08

I presented this last week at the Session @ Java Meet organised by IndicThreads. Have attempted to look at the contrast from the points of view of developers, architects and managers.

Jul 08

Update (README CAREFULLY) : I am starting to see hyperlinks to his post with only some of the findings being treated as the link title (eg. X is 100 times faster than Y, X faster than Z). I emphasise once again that I have carefully indicated in the original post that this is but one of many possible microbenchmarks and that you should treat the results as one of many data points. Given the comments I’ve received and some of the links I’ve seen to this post, if I was to make this posting anew, I would choose to assign the title of this post as “Implementing an identical object oriented solution to the Josephus Problem in Java / C++ / Ruby / JRuby / Python / Jython / Groovy and measuring the performance results thereof.”

This post compares performance across various languages for a specific micro benchmark (actually it isn’t really a microbenchmark - it is simply a benchmark for a specific piece of logic - but thats the closest word I could think of).

Last week, while preparing for a presentation - Contrasting Java and Dynamic Languages, I came across this interesting Perl/Python/Ruby Comparison which focused on comparing the code style of different languages. I thought it would be interesting to use the same to get some actual benchmarks based on the same. Note that you could also use the code segments below to get a feel for different syntactic flavours. However since I have strived to keep the code as similar as possible to each other, some of the advanced syntactic sugar of the dynamic languages is not on display here.

Problem Statement

Quoting from the post linked to above :

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.
In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor


Design

I actually changed the design of the solution as compared to the original post. Instead of using the deeply recursive calls as used in the earlier post, I decided to split the logic into two classes, and use loop iteration instead of recursion. It is my belief that we tend to do loop iterations far more frequently than recursions, and the resultant class design having two classes - one to indicate a Chain and one reflecting a Person seemed more appropriate to me.

Logic
The Chain object contains a reference to one person (first) who is but one member in a circular linked list. Each person object has a reference to its previous (prev) and next (next) person in the circle. When the kill loop starts, it sets a threshold (nth). The count starts with 1 from the first person. Each person when asked to shout, checks if the shout count (shout) is less than the threshold (nth). If less, the person just returns an incremented count. If the two are same, the person in effect commits suicide. In doing so the person, updates the next reference of its prev, and prev reference of its next to take himself off the circle and keep the circle consistent, finally returning a shout of 1 (which is what the next person in the list will shout).

The code does not have any comments (sorry!) and all the console outputs have been removed so that the benchmarking activity is not interfered with by the IO overheads.

The results

All the results are as observed on my notebook with the following config
OS : Ubuntu Gutsy Gibbon 7.10
Kernel : 2.6.22-15-generic
CPU : Intel® Core™ Duo CPU T2600 @ 2.16GHz
RAM : 2GB

LanguageVersionLines of CodeTime per iteration (microseconds)
JavaSun JDK 1.6.0.03101861.6
C++4.1.3 20070929 (prerelease)
(Ubuntu 4.1.2-16ubuntu2)
Compiled with optimisation -O3
863
Rubyruby 1.9.0 (2008-04-14 revision 16006) [i686-linux]63114 89
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]372 380
jruby : ruby 1.8.6 (2008-05-28 rev 6586) [i386-jruby1.1.2]84 80
Python2.5.141225 192
2.5.1 with psyco33
Jython 2.2.1 on JRE 1.6.0.03884 632
GroovyGroovy Version: 1.5.6 JVM: 1.6.0_03-b05 uncompiled81363
Compiled to bytecode and run using java360
UpdateGroovy Version: 1.6-beta-1 JVM: 1.6.0_03104
PHPPHP 5.2.3-1ubuntu6.3 (cli)85593


Updates :
  • Ken suggested syntactic improvements (see comments below) which lead to even faster ruby execution times : jruby : 80 microseconds, ruby 1.9 : 89 microseconds, ruby 1.8.6 : 380 microseconds. The table above has been updated
  • Cato requested a run using Groovy 1.6 beta 1 - have updated the same. Big improvement
  • Nicholas Riley suggested introducing slots and using “is not” and “is” in the if conditions for the python code. Updated the results to reflect the figure of 192 and 632 micro seconds for CPython and Jython. The figure was 182 microseconds for CPython and 131 microseconds for Jython if I did not use the new style classes, however I did not reflect the same, since most new code is likely to be using new style classes. However this does indicate one possible performance optimisation if your code does not depend upon new style classes. Moreover makes me really interested in waiting for the Jython performance optimisations for new style classes that Nicholas suggests are on their shortlist.
  • Tim Fountain in a comment below indicates that on his hardware (core 2 Quad) with Ubuntu Hardy Heron, Ruby 1.8.6 (same version as above) performs somewhat faster (15%) whereas upgraded version of Python and PHP run much faster(63% for python and 83% for ruby). Another difference in config- he is running 64 bit.
    • python - version 2.5.2: - 138 microseconds
    • ruby - 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] :321 microseconds
    • PHP - 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2 (cli): 323 microseconds.
  • Peter Lupo requested a reduction in line count for Java since the conventional way is to have the opening braces not on a separate independent line. Given the fact that it is a fair comment, I have reduced the line count of java to 86 (didn’t physically change the code - 86 = 101 - 15 opening curly braces).
  • Added another finding of using python with psyco


Summarisation

The following are the results. Given the long code blocks, I am presenting the summarisation first followed by the code.

Note: This can only be treated as one particular benchmark. The results are a little atypical with respect to my general understanding. Advise caution against drawing broad conclusions based on this benchmark alone but would suggest that you could treat this as one data point amongst many. People better versed than me in the details of language runtimes might be able to suggest why some of the results seem surprising or atypical.

  • Java / C++ Rock : The performance of Java and C++ was head and shoulders beyond other languages (nearly 100 times faster). My thought is that while a difference of 10x was only to be expected - this difference was just way too massive
  • Java is faster than C++ : Though I had read about other microbenchmarks reaching the same conclusions, it is the first time I actually ran one where Java was faster. There are many others I have run where C++ beats Java quite handsomely. More importantly - the performance of C++ worsened by almost 40% once I added code which started freeing memory that was being allocated (there’s still a small memory leak in the code - there is no Chain destructor which will clean up first). I would later definitely want to look at the impact of garbage collection in this context, and whether the Java garbage collector simply was much faster than the hand crafted new - delete calls in C++.
  • Ruby 1.9 is twice as fast as Python : While it has been known for a while Ruby 1.9 is much faster than Ruby 1.8.6, heres one more supporting data point. I was expecting ruby 1.9 to give python a run for its performance money. But at least in this particular context it seems to be much much faster.
  • JRuby is faster than Ruby : Even ruby 1.9. Very interesting indeed.
  • Jython still has some catching up to do : Though in the ballpark as the other languages, it was the slowest in the pack.
  • Overhead of dynamism is dominant : I have no idea if JRuby ran much faster because of the java bytecode or because of its implementation (though its performance was not even remotely close to that of Java). However even after I compiled groovy code, to java bytecode, it still ran much slower than python and ruby. It seems the overhead of supporting dynamic constructs is much more dominant than any benefits that one gets out of compilation (whether to java byte code or to intermediate compiled files). I think the argument that because something compiles to java bytecode it is likely to be fast should be looked at a little carefully.
  • PHP stays at the rear end : Though I benchmarked PHP for the first time, I wasn’t completely surprised by the fact that PHP could only manage to be faster than Jython.

Update : There are many comments to this post including those from cwilbur who benchmarks perl using a idiomatic method, Paddy3118 who offers an optimised algorithm for python, and peter lawrey who offers an optimised algorithm for Java. I would like to state that each of their solutions offer superior performance than that what has been described here. However I believe any benchmark comparison should compare apples to apples. Should these contributions be taken into account and be reflected in the table above ? I certainly believe there is a case to do so as an exercise using a different algorithm. However to ensure that it is a fair comparison, one has to modify all the code in all other languages also to reflect the same algorithm. Only then can we get an apple to apples comparison. That is probably an exercise for another post. Is the algorithm I have chosen the fastest - No. However I believe it is a very readable algorithm and if one ignores the IO with networks and databases and files, it is probably close to the kind of code many programmers write (and maintain) on a day to day basis. It has been consistently implemented in all the languages. Readers should be aware that there are algorithms which will deliver much superior performance - but they will also make the performance superior in all the languages (perhaps to slightly differing extents and thus possibly somewhat different results).

The code

For all you who are either interested in running it for yourself or would like to perhaps explore this in more detail, .. here’s the code. Note that I am not equally competent across all languages. So if you believe there is something that could be more appropriate way to code the same, do post a comment. One of the things I have tried to do is to ensure that the code remains more or less similar across all languages. Also I have used getter - setters or skipped them based on my understanding of the generally accepted convention for users of the language.
Continue reading »

Jun 10

Seems like a nice term doesn’t it - Polyglot programming. Some recent posts related to it are Kill Java, Vol. 2 and Fractal Programming It may seem hot, it may seem in and it may make sense for you. However please allow me to lay down the fine print.

Polyglot programming requires to build bridges across computer languages. These bridges act like translators at the UN. If you are attempting to run the UN you need the translators, you need to accept the cost. But make sure you understand that there is a cost, and justify to yourself that you really need to incur it.

Let me give you a case in point - JSP Tag Libraries. These were meant to help the presentation / UI guys write the cool UIs while allowing the java programmers to focus on the other interesting stuff which was much better done in Java. The Tag Library construct was then a bridge between HTML and Java. Did you ever try to measure the runtime performance cost of tag libraries ? I did and it turned out that for all but the completely non trivial tag libraries - the cost was too high. I was able to run the controller logic, lookup the data from the cache, update the data in the cache (basically the entire lifecycle of a transaction except for the final updates) and sometimes finish the database updates as well, all in much much lesser time than the time it required to move the data across the tag library when presenting the data. I had a simple rule - if you need screaming concurrency and thruput from a small box - do not touch tag libraries. I however did use tag libraries when the performance demands weren’t so high.

Another example of such bridges being sold without proper warnings - EJB v 1.0. The fine print did talk about the remote API costs, but the general paradigm encouraged using the EJB APIs to communicate between the session beans and the entity beans. History does tell us this performance sucked too!

If you work with JNI you will soon realise that a lot of data transformation needs to happen when moving complex data structures between C and Java. Again while much faster than exchanging XML documents over inter process pipelines, this can be expensive too.

Suggestion : focus on the granularity and the triviality of what you are doing. If the communication across the different languages is fine grained or too frequent sit up, take notice and be careful. If the code in the other language (the language that is providing the service and thus is being called) is too small or too trivial, again ask yourself whether this is really required.

Like it or not we are already polyglots. We do DHTML + CSS + Javascript + (Java / Python / Name your Language). It makes sense to be polyglottish. Treat polyglottism as a tool in your toolbox and not as a fashion statement. Keep the kids in mind and especially when showing them how to move between the static and dynamic programming languages - run your benchmarks and lay down the caveats. There is indeed a possibility that reckless application of the paradigm might lead you (more likely your unsuspecting readers) to a crawling system - something the users will describe as a PolyClot. :)

Jun 09

I have 8+ years experience on C++ and Java each and at least consider myself an expert on the latter and used to consider myself one on the former a long time back. For any programmer it is a difficult choice to move away from the platform and the environment in which one has both a substantial investment into and into another one where you essentially throw away years of experience and start as a novice (at least in pure programming terms).

This is not a post which is either to be interpreted as pro Python or anti Java or a pro/anti “name any language you wish”. I will gladly and gleefully go back to C++ / Java when using them makes sense under a context. I am sharing my thoughts and not promoting or denigrating any languages or frameworks here.

Context :

Here’s the shift of context (not all aspects are necessarily relevant to the shift in language). There are many details I have deliberately not got into for obvious protection required for any commercial activity.

  • From a relatively large programming team that I used to manage into a small one (yours truly only to begin with)
  • From building a commercially owned closed software into open source software development (what is at this stage intended to be though it will take a few months to get there)
  • From customers always being large corporates who could often afford substantial hardware into customers who can range from individuals to large corporates
  • From mostly internal (intranet) facing to internet facing
  • From performance requirements of upto a few hundred thousands of transactions per hour into a completely wide range of requirements based on each customer
  • From a very high percentage of writes to a much smaller percentage (ie. read percentages are now much higher)

Initial Choice Set

Given the fact that this application is intended to be hosted on the internet and is primarily a browser based application my choices quickly narrowed down to Java / JEE (something I had a long experience on), PHP (I had developed one web based application with it), Ruby and Python (I only had academic exposure to these). C/C++ did not figure on the choice set for their obvious development overheads in the context of web applications.

I went through a fair degree of thought and creation of dummy applications and the mental to and fro and the the process was not nearly as linear as I will describe below. The process is simplified below simply so that the reader can get at least some insight into my mind and my mind is made to seem a lot less confused than it actually is.

1st Elimination :

Java was the first one to get knocked off. One reason was that Java based applications typically require either a dedicated host or have to work under memory constraints in case of shared hosts. Simply put Java scales exceptionally well but it has a minimum hardware / investment requirement which was not acceptable within this context. The application should be able to worked on shared hosting environments. Another equally important reason was that the productivity of initial development and that of making changes to Java applications is much lesser than the other languages. I was only too acutely aware of the performance implications of this choice, but I believe I the appropriate choice here shall be to scale out especially where the read activity is especially large compared to the writes. Scaling out does require more complex architectures (compared to simply scaling up) but thats the way that is appropriate in this context.

Subsequent thought process

This one was much tougher. Let me delve for a moment into what I believed the key strengths of the various languages were.

Ruby : I really really loved the syntax. Compact, cute ‘n’ thoroughly OO. Strong metaprogramming capabilities
PHP : Massive developer base (especially important when the intention is to eventually open source the application). It is amongst the easiest languages to use. Another advantage in its favour is the ‘C’ness of the syntax which makes it easier for anyone coming in from the C/C++/Java world.
Python : The metaprogramming and OO were almost but not as good as Ruby. I really love the indentation driven syntax. I know many might differ but I really like it and the neat paragraphs and lack of block braces make the code a lot more readable. Current production interpreters seem to be the best performant compared to Ruby and PHP. I know Ruby 1.9 is getting much faster but I suspect it is unlikely to be enough to make it much faster than Python already is.

As I considered the languages, it was important to look at the frameworks. I looked at CodeIgniter, CakePHP and Zend for PHP, Rails for Ruby and Pylons and Django for Python.

I believe one of the important aspects of architecture decision making today is that you bring in the available toolsets / frameworks into the decision making process even when you are attempting to select a language. You are in effect evaluating a package involving both the language and the framework. A specific issue to be noted in my context was that regardless of what framework I chose I was completely sure I would need to change it / extend it due to the fact that some of the basic requirements of the application I am about to build - no well known existing framework supports them. I am certain and convinced this is not a “Not Built Here” syndrome that I am suffering from but simply a necessity of the domain I am attempting to work with.

In terms of ease of use for simpler applications I would rate Rails very highly. Not only does it make the actual programming simple, it gives you a nice set of tools around it to make a lot of typical activities really easy.

2nd elimination

Ruby and Rails went out. The reasons were as follows.

  • Ruby is just so well designed from a syntax and OO perspective, that coming from so many years of C++/Java background with a substantial grounding in Object Orientation as implemented by these languages, I really did not get the confidence that I could do it sufficient justice. My fear was that I would keep on finding ways of doing things in a better way in this language (and given my inherent compulsions would feel inclined to refactor code rather than focus on newer development).
  • I perceived that Rails was really focused on typical much simpler use cases. What I intend to do requires getting into an immense amount of complexity and I felt fairly certain Rails wasn’t designed for that and that I would spend a very very substantial time reinventing or hacking through Rails.
  • One of the strong features of Rails was something I couldn’t really completely come to terms with was “Convention over Configuration”. I still can’t get over some of the implicitness in the environment.

3rd Elimination

Clearly PHP is such a widely used language with so many developers who are already trained on using it, that using it for an as yet intended to be open source application should be a no brainer - Right ? Not in this case. Two reasons why PHP went out.
  • I intend to pull off some really complex programming. Given the better OO and metaprogramming capabilities of python - I thought I would be able to keep my code much better concise, structured and readable if I was to use Python (this would’ve been true with Ruby as well!).
  • Django - This framework simply came closest to being the framework I would really like to end up with. Thus the gap between what it has vs. what I would like to have was the smallest, and the general design principles were those that I was extremely comfortable with.

In summary : Why these got eliminated

  • JEE : The difficulty of using JEE in shared hosting environments and the long development times required made it tough to use it.
  • Ruby : I like the language. I was a little intimidated by it. Rails however seemed limited to simpler use cases
  • PHP : Wonderful developer base. However didn’t give me the same comfort in terms of its OO capabilities and metaprogramming and did not believe the resultant code would be the most compact, concise, and readable.

In summary : Python and Django : What do I hope to get from them

Now that I have made the choice - what do I expect from these choices at this stage :

  • Excellent framework to start off with - provide the maximum initial boost
  • Ability to write concise, structured and readable code
  • Ability to make changes rapidly
  • Reasonably performant application

Other choices that I did not spend too much time on

During this process I also evaluated other languages such as C# and Groovy/Grails. I actually was quite impressed with the recent architecture trends coming from Microsoft and was actually tempted to spend more time on them. However the same reasons that eliminated JEE quickly made these choices unviable as well. I wish I had the luxury to consider other languages as well, but these were what were at the top of my mind, and I did not consider any other languages in the process given the time constraints.

I am missing JEE

I really really will miss JEE. I would love to have used it for the simple fact that I know it so well and this one will require me to move from an expert to a novice category. I will also miss it for the fact that with JEE I knew what it took to deliver exceptionally high performance. Even though I am feeling the JEE separation pangs, I believe Python is the right choice since it will allow for the fastest development, will allow for some really rapid changes (agility in coding), and will allow me to get the developers who shall eventually be joining us come up the curve much faster and be able to deliver more features much faster.

Final thoughts

One thing I realised through the whole process was that there are two strong influencing factors to any architecture choices. The first one obviously is the context. There is no way to compare or contrast any architecture or design choices without putting a context around it. Secondly and this was a little bit more of a surprise to me was that you simply cannot remove personal proclivities from such a choice making process. Since I as an individual am more comfortable with some styles of design than others, and since I am likely to be substantially involved with the initial development, it only seems to make sense that the choice set gets evaluated in this subjective context of individual comfort and therefore the projected individual productivity and effectiveness as well. Note that while in this case I was evaluating it strictly from my own perspective, one could evaluate the choices in the context of Team styles, culture and comfort as well.