Category: php

Improve your web based software development and maintenance ROI with dynamic programming languages

Posted by – June 15, 2009

This is a cross post of my article which appeared in PuneTech in March 2009 here. The article is reproduced verbatim including the editor’s notes (in italics). I had already posted the slides referred to in the talk that was proposed in this article in the blog postTalk Slides : Programming Language Selection.


After we carried a few quick articles on why you should learn more about Ruby and Ruby on Rails (take 1, take 2) last month, we decided that we wanted to give people a much deeper article on why these new languages (Ruby, Python, PHP) and frameworks (Rails, Django) are setting the web world on fire. We invited Dhananjay Nene to write an article with an in depth discussion of the technical reasons how these new languages differ from the older ones and when to choose one over the other. He responded with this article which, as an added bonus, also includes the business reasons for your decisions. At the request of the community, Dhananjay is also giving a talk on the relative strengths and weaknesses of different programming languages on Saturday, 28th March, 4pm, at SICSR. All those who found this article interesting should definitely attend.

Introduction

Programing language selection is often a topic that elicits a lot of excitement, debate and often a bit of acrimony as well. There is no universally superior programming language that one can recommend, so I tend to generally disregard most language opinions which say ‘X language is the best’, without specifying the context under which it is superior. Finally most language debates often deal with the technical issues and not the ROI issues. Hopefully I shall be able to address this topic without being guilty of any of these problems.

So what languages are we referring to here ?

Official Ruby logo
Image via Wikipedia

The range of languages that fall under Dynamic Programming Languages category is rather extensive. My experience is primarily limited to Python and to a lesser extent PHP, Ruby, Javascript, and Groovy. For the rest of this article, I shall be primarily referring to Python or Ruby when I use the word dynamic languages, though many of the references may continue to be applicable and relevant for a number of other dynamic programming languages.

As I describe the technical characteristics, I shall also continue to attempt to address the business aspects as well, so you might find this article at a little techno-business level. Assuming I am able to excite their interest, the tech guys would not find sufficient technical details and would be hungry to hunt for more, and while the business guys would get a little teased with the possibilities, they will not quite get the ROI served in the traditionally formatted excel spreadsheets. Being aware of that, I continue down this path with a feeling that this perhaps will be the most appropriate level for me to abstract this article to.

Characteristics of Dynamic Programming Languages.

Let us quickly review some of the characteristics :

CPython
Image via Wikipedia

Object Oriented : Many dynamic languages support full object orientation. There are many who don’t necessarily buy the benefits of Object Orientation, but it is my strong belief, that once a piece of software grows beyond a certain threshold of complexity and / or size, Object Orientation starts delivering very strong dividends. There are a few areas such as highly complex, algorithmic processing which might be better suited for functional programming. However a majority of the medium-to-large sized web applications are better served by OO. The empirical evidence at least bears out the fact that most of the most popular languages today (except C) are Object Oriented. However this still is a very very large class of languages which in them include C++, Java, PHP, Python, Ruby etc. The one area where some dynamic languages separate themselves from the others is in the notion of “everything is an object”, ie. primitives such as numbers, functions are all objects by themselves.

Business implications: OO code well designed and implemented allows for a substantial reduction in maintenance costs. When working with a team which is up the curve on OO, it is likely to lead to lower costs and time on inital coding as well. On the other hand, both training costs and skill requirements are higher for fully OO languages. If you are already using partialy OO / hybrid languages such as PHP, C++ or Java, and are convinced about OO, using fully OO languages such as Python or Ruby will help you leverage the OO capabilities even further.

Duck Typing : In very loose terms, duck typed languages do not require you to declare an explicit interface. You send an object a message (ie. invoke a function or access an attribute) and if it can respond to it, it will, and if it can’t it will result in an error. Duck typing is a specific typing system which is a subset of a broader system called Dynamic Typing, which often makes for an interesting debate with its counterpart – Static typing : Static and Dynamic Type checking in practice. For people well grounded in static typing alone, this can sometimes seem to be sacrilegious. I am convinced that duck typing makes writing code much much faster for two reasons – a) You now require to write fewer lines of code and b) You often don’t have to keep on regularly waiting for the compiler to do its work. There is also a substantial capability enhancement that dynamic typing makes to the language type system, which allow the frameworks to build dynamic types on the fly. This in turn offers the framework users many more capabilities than frameworks written in other languages. That is why it is nearly impossible to write frameworks like Rails or Django in Java (You can modify the class loaders and use byte code generation to generate the new types, but the compiler can’t see them so you cant use them). That is also why there is a lot of anticipation of using JRuby, Jython and Grails on the JVM since the languages underlying them (Ruby, Python and Groovy respectively) bring the dynamic typing capabilities to the JVM platform.

Business Implications :Writing code is much much faster. Maintenance depending upon the situation can sometimes be more or less difficult in case of dynamic typed languages. Refactoring is usually a lot more difficult in case of dynamically typed languages since the underlying type system is not able to infer sufficiently about the code to help the refactoring tools, as is possible in case of statically typed languages. It is my opinion that a skilled and trained development team using dynamic languages can generally substantially outperform another equally capable team using static languages. Insufficiently or poorly skilled development teams however can lead to very very different kind of pitfalls in these class of languages. In both cases the code becomes difficult to change or maintain due to a) cryptic code in case of dynamically typed languages and b) extremely large code bases in case of statically typed languages. Both are undesirable situations to be in but if I had to choose between one of the two, I would go for being in the cryptic mess since it is at least manageable by bringing in external skilled help.

Metaprogramming : Metaprogramming is in loose terms the ability of programs to write programs. A large proportion of developers may not use this capability too frequently. Specifically in web application development it gets used as a mechanism to transform one set of datastructures which a programmer specifies into code at runtime. As I point out later in this article, it in fact is a very important element in designing common frameworks and libraries which in turn offer substantial capabilities including small code and easier maintenance. A quick note to state that metaprogramming is not code generation. In case of code generation, one uses the generator to generate code which is then compiled. A big limitation with this is the fact that often people modify the generated code leading to really tough maintenance nightmares and the fact that it is a two stage process which is prone to more errors. Metaprogramming results in new code “coming to life” so to speak while your program is running.

Business Implications : Read on, they will get covered in the final roundup. They are large and they are positive.

Function blocks/objects, iterators, closures, continuations, generators: I will not go into any substantial details of this issue except to say that small pieces of code logic can be handled in a much much more concise way than if these weren’t supported. While many situations may not need closures support, you will be glad to have them on your side when needed.

Business Implications : Helps having shorter, cleaner code leading to lesser development and maintenance costs. Another significant positive is that your developers are just likely to be so much happier since they get some truly nice building blocks for concise and elegant expression of their logic. Can’t think of any significant negatives.

There are a full range of other capabilities, but none come to mind immediately as something that have strong business implications as well.

The role of frameworks

Ruby on Rails
Image via Wikipedia

When did these languages say Ruby and Python originate ? Most people are likely to be a little surprised if the answer is in the last millenium. Yet Guido von Rossum started working on Python in 1986 and Ruby was released in 1992. Python has been rather well known within the scientific community and perhaps a bit within the systems / OS utility programming communities for quite some time. However both languages grabbed a large mindshare only post 2005. A big reason for their popularity (especially in case of Ruby’s case) came from the popularity the frameworks which used them. Ruby on Rails for ruby and Django (to the best of my knowledge) for python. These frameworks combined the language capabilities with the learnings of good design practices for internet applications (eg MVC, declarative validations, simple ORM etc) into a simple usable package, which developers could take and build web applications quickly. There are examples of people having built simple web apps within a day and medium complexity apps in 1-3 weeks using these frameworks. The languages are the ingredients, the frameworks are the cooks – a great combination for serving great meals. Now you will find many such frameworks in these languages, including some which have better capabilities for building more sophisticated / complex applications eg. Merb and Pylons.

I am not too sure of how many people are exactly aware of the role of metaprogramming in the frameworks’ successes. I am willing to believe that but for metaprogramming, these frameworks simply would not have achieved anywhere close to the success they achieved. It is metaprogramming which takes the datastructures as defined by a developer and converts it into runtime code implicitly, saving the developer lots of time and effort. So even if most developers don’t actively write metaprograms, their lives are so much easier. Metaprogramming capabilities are also the reason why it is virtually impossible to write similar frameworks in Java. However if you are on the .NET or JVM environments, things are definitely looking encouraging with the possibilities to use IronPython or IronRuby on .NET or JRuby or Jython or Groovy+Grails on the JVM.

Business implications : If you are focused on scientific or desktop or highly algorithmic applications, where python especially is used extensively, you are likely to get benefits from these languages on their own merit alone. For web applications you will see the maximum benefits by using the web MVC frameworks along with the languages. I submit that on the whole you are likely to see very substantial reduction in development, enhancement and maintenance times – sweet music for any end user, investor or project manager.

Increased Business Agility

There is one more reason why I believe these languages are especially helpful. They help by increasing development agility to an extent where it now allows for the business to be more agile. You can get a first prototype version up in weeks, take it around to potential users, and gather feedback on the same. Incorporate elements of this feedback into the next release of working code quickly. The business benefits of such a scenario are tremendous. You might wonder that this is a process issue, so what does it have to do with a language selection. I would submit, that languages which allow changes to be made faster, help support this process in a far superior way. Another equally important facet is the superior risk management. Since you are able to build features with lower investments, you are able to get a series of customer feedbacks into your decision making process much faster. This helps being able to come up with a product that really meets the customer expectations much earlier. This happens by allowing the better features to come in earlier and also by allowing the lesser important or lesser relevant features to be decided to be deferred earlier. That’s precisely the reason why the dynamic languages have found a strong acceptance in the startup world. I believe the increasing agility which is often required in the startup world, is and will continue to be increasingly required of established enterprises. Precisely the reason why I believe these languages will continue to do better in the enterprise space as well. Finally, these languages make it relatively easier to tell your business sponsor – We will work with you on imprecise requirements rather than spending months on nailing down requirements which anyways are likely to change later. This has both a pro and a con especially for outsourcing situations. It is likely to allow for tremendous customer delight in terms of a vendor that works with him in such a flexible manner, yet it does introduce challenges in terms of how the commercials and management of the project are handled.

The reason I would like to especially point out increased business agility is because programmers don’t often visualise or evangelise it much, but when I wear a manager’s hat, it is perhaps the most compelling benefit of these languages.

Concluding

As I said earlier, there is no single universal language which is the best for all scenarios. There are some scenarios where using dynamic languages will not be helpful

Programming language book sales 4Q2008

A Treemap view of sales of programming language books by O’Reilly Media in 4Q2008. The size of a box represents the total sales of a book. The color represents the increase or decrease in sales compared to same quarter in 2007. Green = increase, bright green = big increase, red = decrease, bright red = large decrease. See full article at O’Reilly Radar for lots of interesting details.

When not to use these languages

  • You are building a simple / small application and don’t have the available skill sets. One exception to this is where you decide to use it in a simple application to allow yourself a non risky mechanism of building these skillsets.
  • Extremely High performance requirements. However please make sure that you really need the high performance capabilities of say a C, C++ or Java. In my experience 80% of developers like to believe that they are building highly performant applications where the maximum speed is a must have. Yet the top 10% of them are facing far far more critical performance requirements than the remainder. Unless you are convinced you are in the top 10%, you should certainly consider dynamic languages as an option. Moreover in case of most high performance requirements, these can sometimes be boiled down to a few inner loops / algorithms. Consider implementing the same in C, / Java or other .NET languages (depending upon the choice of your dynamic language interpreter implementation)
  • You have an architecture standard in place which does not allow using these languages. If you are convinced your applications are better served by using dynamic languages both from your individual application and an overall enterprise perspective, consider taking the feedback to your standards setting body to see if you can pilot a different approach. Also evaluate if the .NET or JVM versions can help you comply with the architecture guidelines.
  • You are unable to commit to the retraining requirements. While these languages are easy and powerful to use, leveraging that power can require some amount of retraining. If that does not fit your business plans, since the retraining effort could impact immediate and urgent requirements, that could be a reason to not use these languages. However in such situations do consider investing in building this skill sets before you get to another similar decision point.
  • You need a very high levels of multithreadinging as opposed to multi processing support. While this is not a typical situation for web applications, you should be aware that most dynamic languages have some limitations in terms of multi threading support. This actually is not necessarily an issue with the language as with the implementation eg. the C implementation of python has the notorious Global Interpreter Lock which constrains you from being able to use more than a handful of threads per processes efficiently. However the same restriction is not present in Jython (the jvm implementation of python). This is likely to be an issue for a miniscule percentage of the web applications market for the primary reason that multi process / shared nothing architecture styles often work quite well for many web applications and they don’t really need multi threading.

So where’s my return on investment ?

First of all lets talk of the investment part. If you get into it in a paced approach, the investment may not be that great. Start with a team size of anywhere between 2-6 people (depending upon your organisation and project size). Think of 15 days of intensive training followed by a 2-6 months coming up the curve effort (more likely 2 than 6). Make sure your first project is not a critical one under tremendous business pressure. This can be subsequently followed by more people getting retrained as necessary. In the longer term it might actually help reduce your incremental investment, since it might be much easier to ramp up new programmers in Ruby or Python than say Java or C#.

Secondly lets look at the incrementally higher costs. You are likely to need people who are a little bit more capable in terms of understanding and debugging the same logic expressed in fewer lines of code (that sometimes can be a challenge) and then be able to modify and enhance the same. This may increase your testing and fixing costs in the earlier days. Finally while the fewer lines of code can make refactoring easier, you could find that your total refactoring costs are a little higher.

Now the returns part. I am convinced that the increased business agility is the strongest return in business terms. Immediately after that is the substantial reduction in development, enhancement and maintenance times. If neither of these benefits are appealing, when contrasted with some other issues that you might perceive, maybe considering dynamic languages in your context is not such a great idea.

One more factor that I would of course encourage you to evaluate from a business perspective are the implications for you if your competition (assuming it is not already using them) started using these languages. The implications would vary from case to case, but it could also help you decide how important this issue is for you.

About the author – Dhananjay Nene

Dhananjay is a Software Engineer with around 17 years of experience in the field. He is passionate about software engineering, programming, design and architecture. He did his post graduation from Indian Institute of Management, Ahmedabad, and has been involved in Senior Management positions and has managed team sizes in excess of 120 persons. His tech blog, and twitter stream are a must read for anybody interested in programming languages or development methodologies. Those interested in the person behind the tech can check out his general blog, and personal twitter stream. For more details, check out Dhananjay’s PuneTech wiki profile.

Talk Slides : Programming Language Selection

Posted by – March 29, 2009

Slides of my talk yesterday on programming language selection (from a technical and a business perspective)

Background article (not exactly same but on a similar topic) : Improve your web based software development and maintenance ROI with dynamic programming languages

Talk Announcement : Seminar: Strengths and weaknesses of various programming languages – 28th March

Poll : Usage of web services (SOAP / REST / HTTP)

Posted by – November 19, 2008

While the web service meme and implementations has been out there for many years, statistics about how many developers publish or consume them are a little hard to come by. Many statistics talk about the % organisations which have adopted or intend to adopt SOA / Web Services etc., but these often are less than sufficiently useful  IMHO since it does not indicate how widely these get used.

This poll attempts to understand how many developers “actually” either publish or consume web services in the current projects that they are involved with, and if so, what is the nature of the web service APIs (SOAP / REST / HTTP-POX).

Please note, that the poll is in the top right widget of this blog. Multiple selections are acceptable. If this is an answer you seek as well, please invite other developers you know to participate as well through email and blog posts Just indicate to them that the poll can be found in the top right widget at http://blog.dhananjaynene.com. While you can follow the current results online as entries are added, the poll will close on November 30th when the final results will be published.

Update :

Due to apparent lack of interest (no votes for the last 3 days), I am closing this poll early. The final summary of the votes is as follows :

  • Publishing with REST (57%, 12 Votes)
  • Consuming / Using REST services (57%, 12 Votes)
  • Consuming / Using SOAP services (29%, 6 Votes)
  • Publishing with SOAP (24%, 5 Votes)
  • Publishing non SOAP / non REST HTTP services (24%, 5 Votes)
  • Consuming non SOAP / non REST HTTP services (14%, 3 Votes)
  • No web services being published or consumed (5%, 1 Votes)

Total Voters: 21

Commentary on Python from a Java programming perspective

Posted by – September 17, 2008

After having worked with Java (and earlier C++) for a number of years, I have been working with Python for the last few months. Since I came to Python from Java, I thought it might be useful to share my experiences, which might be of interest to many programmers. This is not intended to be a language or feature comparison or something that contrasts the various advantages or disadvantages, but is intended to reflect on the softer aspects of how it feels to write Python programs. Hence I have refrained from including code snippets and other programming constructs in this post, but if you believe I am unclear in some of the comments I am making, do let me know and I shall be glad to update the post or write a new one as required.

Programming is Easier and Enjoyable

I think the most dominant impression from the last few months is that python does make programming feel a lot more easier and often more enjoyable. The feeling is not very different between riding a bicycle without gears then riding one with gears. In the latter case one just feels one can cover a lot more distance much more easily though any physicist will tell you the actual effort is not particularly different. It just feels like one has a much bigger toolbox (ie a wider assortment of tools) to work with and therefore the task seems simpler. Why do I think that way ? I believe the following features of python do help (in no particular order) :

  • Concise Coding style : The code typically is much more concise, with much lesser verbosity
  • Dynamic typing : You really do not need to worry about declaring data types and making sure the inheritance hierarchies especially for all the interfaces and implementations well laid out. The various objects do not even need to be in the same inheritance hierarchy – so long as they can respond to the method, you can call it. This is a double edge sword, but that doesn’t take away the fact that programming under dynamic types environment does seem a lot easier.
  • Easier runtime reflection : Java seems to have all the reflection capabilities but I think these are just way too painful to use as compared to python. In python the entire set of constructs (classes, sequences etc.) are available for easy reflection. In case you need to use metaprogramming constructs, python really rocks.
  • More built in language capabilities : Items such a list comprehensions, ability to deal with functions as first class objects etc. give you a broader vocabulary to work with.
  • Clean indentation requirement : It took me about 2-3 days to get over it but, it seems that python code is much easier to read since if you do not indent it correctly it will be rejected.

Need to spend time on understanding how to write code in Python

One of the statements I have heard in different contexts is that Java programmers when they start using python write it as if they are writing a java program using python syntax (unpythonic in python parlance). This actually took me a fair amount of time to understand. I did look at a lot of other python code to attempt to understand this in greater detail. I realised that python offers many more capabilities than java and that as a person who had written java code earlier, I was able to be immediately productive using the constructs in python which mapped onto the equivalent constructs in java easily. However it did take a long time to be able to start using the other constructs. One of the exercises that did help was to get away from the problem at hand and try to work on something entirely different (especially a program which had a strong algorithms element to it with complex data structures), and slowly review each line with how it might be better done in python.I realised it is easy to start coding in python but it takes some effort and time to start using the entire python toolbox effectively. While I reviewed other programs written in python, I did think there was one area where the prior exposure to java helped. Class design. I am not sure if this was an issue with the programs I read and thus there was an issue with the sample references I used. However while these helped me understand how to write code in python differently, I have a strong feeling that the programs I wrote were much stronger in terms of class design. There are many situations especially given the fact that python supports both function oriented programming and object oriented programming, where one wonders what is a better way to design the logic in a particular context. I thought it generally made sense to use proper class based design and use the functional constructs in situations which started getting a little loopy or algorithmic.

No Compile Cycle

Another thing I really enjoy about python is – no compile cycle (its implicit). So while in the middle of my editor, I could exit to the shell prompt, run the source immediately without trying to deal with an ant script in between which compiles java code and then recycles the web application. This has boosted my net productivity quite a bit.

Productivity

So are the statements that one can be 5 to 10 times more productive in python supported by my experience. While I haven’t gotten through the full life-cycle yet, prima facie I do believe 5 times does seem like a good number. However I would introduce the following caveats. It will not happen for your first project .. more like your second project onwards, primarily since it does require a lot of effort to start understand how to write pythonic code and that does take away a lot of those benefits initially. Secondly it does not factor in refactoring. Point refactoring is much much easier in python, but bulk refactoring is much much more difficult since automatic refactoring is tough. Thus when I am refactoring, I sometimes feel I was able to get it done faster in python, but in many other cases I thought I ended up taking a lot more time.

Switched to python completely ?

Not really. But I feel happy I have more options available to me. One thing that really sucks is the poor refactoring capabilities. This is unlikely to be an issue with python alone and is likely to be an issue with all dynamically typed languages. However if you want to use dynamic languages, be prepared to spend some more effort during refactoring since many of the automatic refactoring capabilities you may have gotten used to, may not be available. The other issue is performance. Java wins hands down on performance by a big margin. If I have to ever get back to another project where performance was particularly important and the primary constraint in performance was not network or disk IO but was likely to be the CPU, and it would be acceptable to substantially reduce development productivity in order to get the performance gains, I shall be found to be writing Java code for certain. The next project I work on, I am likely to “first” evaluate whether python fits the bill and switch to other languages if I do not find it appropriate enough.

Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy

Posted by – July 8, 2008

Update (README CAREFULLY) : I am starting to see hyperlinks to his post with only some of the findings being treated as the link title (eg. X is 100 times faster than Y, X faster than Z). I emphasise once again that I have carefully indicated in the original post that this is but one of many possible microbenchmarks and that you should treat the results as one of many data points. Given the comments I’ve received and some of the links I’ve seen to this post, if I was to make this posting anew, I would choose to assign the title of this post as “Implementing an identical object oriented solution to the Josephus Problem in Java / C++ / Ruby / JRuby / Python / Jython / Groovy and measuring the performance results thereof.”

This post compares performance across various languages for a specific micro benchmark (actually it isn’t really a microbenchmark – it is simply a benchmark for a specific piece of logic – but thats the closest word I could think of).

Last week, while preparing for a presentation – Contrasting Java and Dynamic Languages, I came across this interesting Perl/Python/Ruby Comparison which focused on comparing the code style of different languages. I thought it would be interesting to use the same to get some actual benchmarks based on the same. Note that you could also use the code segments below to get a feel for different syntactic flavours. However since I have strived to keep the code as similar as possible to each other, some of the advanced syntactic sugar of the dynamic languages is not on display here.

Problem Statement

Quoting from the post linked to above :

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.
In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor

Design

I actually changed the design of the solution as compared to the original post. Instead of using the deeply recursive calls as used in the earlier post, I decided to split the logic into two classes, and use loop iteration instead of recursion. It is my belief that we tend to do loop iterations far more frequently than recursions, and the resultant class design having two classes – one to indicate a Chain and one reflecting a Person seemed more appropriate to me.

Logic
The Chain object contains a reference to one person (first) who is but one member in a circular linked list. Each person object has a reference to its previous (prev) and next (next) person in the circle. When the kill loop starts, it sets a threshold (nth). The count starts with 1 from the first person. Each person when asked to shout, checks if the shout count (shout) is less than the threshold (nth). If less, the person just returns an incremented count. If the two are same, the person in effect commits suicide. In doing so the person, updates the next reference of its prev, and prev reference of its next to take himself off the circle and keep the circle consistent, finally returning a shout of 1 (which is what the next person in the list will shout).

The code does not have any comments (sorry!) and all the console outputs have been removed so that the benchmarking activity is not interfered with by the IO overheads.

The results

All the results are as observed on my notebook with the following config
OS : Ubuntu Gutsy Gibbon 7.10
Kernel : 2.6.22-15-generic
CPU : Intel(R) Core(TM) Duo CPU T2600 @ 2.16GHz
RAM : 2GB

Language Version Lines of Code Time per iteration (microseconds)
Java Sun JDK 1.6.0.03 10186 1.6
C++ 4.1.3 20070929 (prerelease)
(Ubuntu 4.1.2-16ubuntu2)
Compiled with optimisation -O3
86 3
gcc version 4.2.3
(Ubuntu 4.2.3-2ubuntu7)
Compiled with optimisation -O3
Alberto Bignotti’s modified code with customised memory reuse and management
124 approx 0
Ruby ruby 1.9.0 (2008-04-14 revision 16006) [i686-linux] 63 114 89
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] 372 380
jruby : ruby 1.8.6 (2008-05-28 rev 6586) [i386-jruby1.1.2] 84 80
Python 2.5.1 41 225 192
2.5.1 with psyco 33
Jython 2.2.1 on JRE 1.6.0.03 884 632
Groovy Groovy Version: 1.5.6 JVM: 1.6.0_03-b05 uncompiled 81 363
Compiled to bytecode and run using java 360
UpdateGroovy Version: 1.6-beta-1 JVM: 1.6.0_03 104
PHP PHP 5.2.3-1ubuntu6.3 (cli) 85 593

Updates :

  • Ken suggested syntactic improvements (see comments below) which lead to even faster ruby execution times : jruby : 80 microseconds, ruby 1.9 : 89 microseconds, ruby 1.8.6 : 380 microseconds. The table above has been updated
  • Cato requested a run using Groovy 1.6 beta 1 – have updated the same. Big improvement
  • Nicholas Riley suggested introducing slots and using “is not” and “is” in the if conditions for the python code. Updated the results to reflect the figure of 192 and 632 micro seconds for CPython and Jython. The figure was 182 microseconds for CPython and 131 microseconds for Jython if I did not use the new style classes, however I did not reflect the same, since most new code is likely to be using new style classes. However this does indicate one possible performance optimisation if your code does not depend upon new style classes. Moreover makes me really interested in waiting for the Jython performance optimisations for new style classes that Nicholas suggests are on their shortlist.
  • Tim Fountain in a comment below indicates that on his hardware (core 2 Quad) with Ubuntu Hardy Heron, Ruby 1.8.6 (same version as above) performs somewhat faster (15%) whereas upgraded version of Python and PHP run much faster(63% for python and 83% for ruby). Another difference in config- he is running 64 bit.
    • python – version 2.5.2: – 138 microseconds
    • ruby – 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] :321 microseconds
    • PHP – 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2 (cli): 323 microseconds.
  • Peter Lupo requested a reduction in line count for Java since the conventional way is to have the opening braces not on a separate independent line. Given the fact that it is a fair comment, I have reduced the line count of java to 86 (didn’t physically change the code – 86 = 101 – 15 opening curly braces).
  • Added another finding of using python with psyco
  • C++ – Added results using Alberto Bignotti’s alternative code with customised memory reuse management

Summarisation

The following are the results. Given the long code blocks, I am presenting the summarisation first followed by the code.

Note: This can only be treated as one particular benchmark. The results are a little atypical with respect to my general understanding. Advise caution against drawing broad conclusions based on this benchmark alone but would suggest that you could treat this as one data point amongst many. People better versed than me in the details of language runtimes might be able to suggest why some of the results seem surprising or atypical.

  • Java / C++ Rock : The performance of Java and C++ was head and shoulders beyond other languages (nearly 100 times faster). My thought is that while a difference of 10x was only to be expected – this difference was just way too massive
  • Java is faster than C++ : Though I had read about other microbenchmarks reaching the same conclusions, it is the first time I actually ran one where Java was faster. There are many others I have run where C++ beats Java quite handsomely. More importantly – the performance of C++ worsened by almost 40% once I added code which started freeing memory that was being allocated (there’s still a small memory leak in the code – there is no Chain destructor which will clean up first). I would later definitely want to look at the impact of garbage collection in this context, and whether the Java garbage collector simply was much faster than the hand crafted new – delete calls in C++. Update:Using the customised memory management (which is not used in any of the other examples) but the same algorithm as in the code written by Alberto, C++ is much faster than Java
  • Ruby 1.9 is twice as fast as Python : While it has been known for a while Ruby 1.9 is much faster than Ruby 1.8.6, heres one more supporting data point. I was expecting ruby 1.9 to give python a run for its performance money. But at least in this particular context it seems to be much much faster.
  • JRuby is faster than Ruby : Even ruby 1.9. Very interesting indeed.
  • Jython still has some catching up to do : Though in the ballpark as the other languages, it was the slowest in the pack.
  • Overhead of dynamism is dominant : I have no idea if JRuby ran much faster because of the java bytecode or because of its implementation (though its performance was not even remotely close to that of Java). However even after I compiled groovy code, to java bytecode, it still ran much slower than python and ruby. It seems the overhead of supporting dynamic constructs is much more dominant than any benefits that one gets out of compilation (whether to java byte code or to intermediate compiled files). I think the argument that because something compiles to java bytecode it is likely to be fast should be looked at a little carefully.
  • PHP stays at the rear end : Though I benchmarked PHP for the first time, I wasn’t completely surprised by the fact that PHP could only manage to be faster than Jython.

Update : There are many comments to this post including those from cwilbur who benchmarks perl using a idiomatic method, Paddy3118 who offers an optimised algorithm for python, and peter lawrey who offers an optimised algorithm for Java. I would like to state that each of their solutions offer superior performance than that what has been described here. However I believe any benchmark comparison should compare apples to apples. Should these contributions be taken into account and be reflected in the table above ? I certainly believe there is a case to do so as an exercise using a different algorithm. However to ensure that it is a fair comparison, one has to modify all the code in all other languages also to reflect the same algorithm. Only then can we get an apple to apples comparison. That is probably an exercise for another post. Is the algorithm I have chosen the fastest – No. However I believe it is a very readable algorithm and if one ignores the IO with networks and databases and files, it is probably close to the kind of code many programmers write (and maintain) on a day to day basis. It has been consistently implemented in all the languages. Readers should be aware that there are algorithms which will deliver much superior performance – but they will also make the performance superior in all the languages (perhaps to slightly differing extents and thus possibly somewhat different results).

The code

For all you who are either interested in running it for yourself or would like to perhaps explore this in more detail, .. here’s the code. Note that I am not equally competent across all languages. So if you believe there is something that could be more appropriate way to code the same, do post a comment. One of the things I have tried to do is to ensure that the code remains more or less similar across all languages. Also I have used getter – setters or skipped them based on my understanding of the generally accepted convention for users of the language.
More…

How I ended up selecting Python for my latest project

Posted by – June 9, 2008

I have 8+ years experience on C++ and Java each and at least consider myself an expert on the latter and used to consider myself one on the former a long time back. For any programmer it is a difficult choice to move away from the platform and the environment in which one has both a substantial investment into and into another one where you essentially throw away years of experience and start as a novice (at least in pure programming terms).

This is not a post which is either to be interpreted as pro Python or anti Java or a pro/anti “name any language you wish”. I will gladly and gleefully go back to C++ / Java when using them makes sense under a context. I am sharing my thoughts and not promoting or denigrating any languages or frameworks here.

Context :

Here’s the shift of context (not all aspects are necessarily relevant to the shift in language). There are many details I have deliberately not got into for obvious protection required for any commercial activity.

  • From a relatively large programming team that I used to manage into a small one (yours truly only to begin with)
  • From building a commercially owned closed software into open source software development (what is at this stage intended to be though it will take a few months to get there)
  • From customers always being large corporates who could often afford substantial hardware into customers who can range from individuals to large corporates
  • From mostly internal (intranet) facing to internet facing
  • From performance requirements of upto a few hundred thousands of transactions per hour into a completely wide range of requirements based on each customer
  • From a very high percentage of writes to a much smaller percentage (ie. read percentages are now much higher)

Initial Choice Set

Given the fact that this application is intended to be hosted on the internet and is primarily a browser based application my choices quickly narrowed down to Java / JEE (something I had a long experience on), PHP (I had developed one web based application with it), Ruby and Python (I only had academic exposure to these). C/C++ did not figure on the choice set for their obvious development overheads in the context of web applications.

I went through a fair degree of thought and creation of dummy applications and the mental to and fro and the the process was not nearly as linear as I will describe below. The process is simplified below simply so that the reader can get at least some insight into my mind and my mind is made to seem a lot less confused than it actually is.

1st Elimination :

Java was the first one to get knocked off. One reason was that Java based applications typically require either a dedicated host or have to work under memory constraints in case of shared hosts. Simply put Java scales exceptionally well but it has a minimum hardware / investment requirement which was not acceptable within this context. The application should be able to worked on shared hosting environments. Another equally important reason was that the productivity of initial development and that of making changes to Java applications is much lesser than the other languages. I was only too acutely aware of the performance implications of this choice, but I believe I the appropriate choice here shall be to scale out especially where the read activity is especially large compared to the writes. Scaling out does require more complex architectures (compared to simply scaling up) but thats the way that is appropriate in this context.

Subsequent thought process

This one was much tougher. Let me delve for a moment into what I believed the key strengths of the various languages were.

Ruby : I really really loved the syntax. Compact, cute ‘n’ thoroughly OO. Strong metaprogramming capabilities
PHP : Massive developer base (especially important when the intention is to eventually open source the application). It is amongst the easiest languages to use. Another advantage in its favour is the ‘C’ness of the syntax which makes it easier for anyone coming in from the C/C++/Java world.
Python : The metaprogramming and OO were almost but not as good as Ruby. I really love the indentation driven syntax. I know many might differ but I really like it and the neat paragraphs and lack of block braces make the code a lot more readable. Current production interpreters seem to be the best performant compared to Ruby and PHP. I know Ruby 1.9 is getting much faster but I suspect it is unlikely to be enough to make it much faster than Python already is.

As I considered the languages, it was important to look at the frameworks. I looked at CodeIgniter, CakePHP and Zend for PHP, Rails for Ruby and Pylons and Django for Python.

I believe one of the important aspects of architecture decision making today is that you bring in the available toolsets / frameworks into the decision making process even when you are attempting to select a language. You are in effect evaluating a package involving both the language and the framework. A specific issue to be noted in my context was that regardless of what framework I chose I was completely sure I would need to change it / extend it due to the fact that some of the basic requirements of the application I am about to build – no well known existing framework supports them. I am certain and convinced this is not a “Not Built Here” syndrome that I am suffering from but simply a necessity of the domain I am attempting to work with.

In terms of ease of use for simpler applications I would rate Rails very highly. Not only does it make the actual programming simple, it gives you a nice set of tools around it to make a lot of typical activities really easy.

2nd elimination

Ruby and Rails went out. The reasons were as follows.

  • Ruby is just so well designed from a syntax and OO perspective, that coming from so many years of C++/Java background with a substantial grounding in Object Orientation as implemented by these languages, I really did not get the confidence that I could do it sufficient justice. My fear was that I would keep on finding ways of doing things in a better way in this language (and given my inherent compulsions would feel inclined to refactor code rather than focus on newer development).
  • I perceived that Rails was really focused on typical much simpler use cases. What I intend to do requires getting into an immense amount of complexity and I felt fairly certain Rails wasn’t designed for that and that I would spend a very very substantial time reinventing or hacking through Rails.
  • One of the strong features of Rails was something I couldn’t really completely come to terms with was “Convention over Configuration”. I still can’t get over some of the implicitness in the environment.

3rd Elimination

Clearly PHP is such a widely used language with so many developers who are already trained on using it, that using it for an as yet intended to be open source application should be a no brainer – Right ? Not in this case. Two reasons why PHP went out.

  • I intend to pull off some really complex programming. Given the better OO and metaprogramming capabilities of python – I thought I would be able to keep my code much better concise, structured and readable if I was to use Python (this would’ve been true with Ruby as well!).
  • Django – This framework simply came closest to being the framework I would really like to end up with. Thus the gap between what it has vs. what I would like to have was the smallest, and the general design principles were those that I was extremely comfortable with.

In summary : Why these got eliminated

  • JEE : The difficulty of using JEE in shared hosting environments and the long development times required made it tough to use it.
  • Ruby : I like the language. I was a little intimidated by it. Rails however seemed limited to simpler use cases
  • PHP : Wonderful developer base. However didn’t give me the same comfort in terms of its OO capabilities and metaprogramming and did not believe the resultant code would be the most compact, concise, and readable.

In summary : Python and Django : What do I hope to get from them

Now that I have made the choice – what do I expect from these choices at this stage :

  • Excellent framework to start off with – provide the maximum initial boost
  • Ability to write concise, structured and readable code
  • Ability to make changes rapidly
  • Reasonably performant application

Other choices that I did not spend too much time on

During this process I also evaluated other languages such as C# and Groovy/Grails. I actually was quite impressed with the recent architecture trends coming from Microsoft and was actually tempted to spend more time on them. However the same reasons that eliminated JEE quickly made these choices unviable as well. I wish I had the luxury to consider other languages as well, but these were what were at the top of my mind, and I did not consider any other languages in the process given the time constraints.

I am missing JEE

I really really will miss JEE. I would love to have used it for the simple fact that I know it so well and this one will require me to move from an expert to a novice category. I will also miss it for the fact that with JEE I knew what it took to deliver exceptionally high performance. Even though I am feeling the JEE separation pangs, I believe Python is the right choice since it will allow for the fastest development, will allow for some really rapid changes (agility in coding), and will allow me to get the developers who shall eventually be joining us come up the curve much faster and be able to deliver more features much faster.

Final thoughts

One thing I realised through the whole process was that there are two strong influencing factors to any architecture choices. The first one obviously is the context. There is no way to compare or contrast any architecture or design choices without putting a context around it. Secondly and this was a little bit more of a surprise to me was that you simply cannot remove personal proclivities from such a choice making process. Since I as an individual am more comfortable with some styles of design than others, and since I am likely to be substantially involved with the initial development, it only seems to make sense that the choice set gets evaluated in this subjective context of individual comfort and therefore the projected individual productivity and effectiveness as well. Note that while in this case I was evaluating it strictly from my own perspective, one could evaluate the choices in the context of Team styles, culture and comfort as well.

Java : if (compete with PHP / Ruby / Python) { stop fixing the syntax and start fixing the runtime }

Posted by – April 14, 2008

A lot of people are wondering whether Java is under threat from a set of nimble languages – PHP / Ruby / Python / Perl. There is a flurry of activity relating to Whether Java is losing the battle for the modern web lot of which are being driven from the earlier post by Andi Gutman Java is losing the battle for the modern Web. Can the JVM save the vendors?.

My recent activities had me visit the same question and the following is how I would summarise the situation and the mechanisms for Java to better compete with these nimble languages.

Current scenario

Yes, I think there is some threat to the scope of use of Java as a server side web application development language. Given the high acceptance of Java in the corporate environments this threat will take some time to play itself out. The sales pitch of Java has often been supported heavily by the vendors, and this has led to lesser focus on making java compete at the non-corporate-enterprise end. The nimble languages compete quite well in this space and combined with the increasing power of hardware which makes them more and more relevant each day for high performance requirements, there is a clear threat of these languages starting from a lower end pushing java out of even the middle end.

Opinion: Dynamism of syntax is not the real issue

What I am not yet fully on board with is the characterisation that the real issue is that Java is lesser dynamic than say PHP / Ruby / Python. Java has a pretty strong set of capabilities and while some may desire more from a syntax perspective I don’t know that thats the real issue. Lack of Closures, Dynamic Types, Duck Typing may make it difficult for java to compete in some contexts only to be at least partially offset in others.

Issue : Shared hosting of java applications

The first real issue is the overhead to develop and then start hosting java applications. It is difficult for host providers to support allowing each user the ability to run their own JVM processes in a sustained fashion. I remember the days when using PostgresSQL required you to have a dedicated server whereas you could use mysql easily by just setting up yourself for a small multi-tenant plan on a web server out there. Today Java is like PostgreSQL of those days. There is no easy way to simply set yourself up to run a small Java application in a shared tenant environment. Even if you did set yourself up chances are that you would be far less than satisfied with the performance in a multi tenant situation even though Java is actually a really really fast language. Java has endeared itself to the corporate environments with their funding for creating large infrastructure stacks and simply hasn’t offered enough opportunities for small enterprises / individuals to create and host their web applications on a shared infrastructure. This takes away an entire community of supporters who when they grew up would’ve carried java along into the larger of their applications.

Issue: Compile cycle

The second issue is the nastily long compile-build cycle. This leads to a scenario where developers need to twiddle their thumbs for half their development times waiting for ant / maven / javac to do their work. It destroys their rhythm and hurts their productivity. The traditional argument of the speed / scale and enterprise-ness of Java over the nimble languages just seems lesser and lesser relevant as these languages start having a nice stack of their own enterprise frameworks and hardware developments whittle away the performance disadvantage. When someone very recently asked me why I was less likely to choose Java for my next project I just said Java is not agile enough. The agility of the nimble languages (no compile-build cycle) coupled with their having adequate performance profiles do lead Java to become lesser capable of competing in the space. By the way one of the real strengths of eclipse is its incremental compilation, but I cannot unfortunately use it with making code changes remotely the way I would be able to change say PHP.

So how can java compete ?

Assuming that java needs to compete with these languages on a better footing, some changes will be required (and some of them quite painful). These will be :

Make java easy to host on multi-tenant application servers

This would definitely require some changes to JVM to reduce the startup / shutdown overheads to make each processes really lightweight. It would also require some attention to how much memory gets utilised. Scenarios such as those I have used in the past to really make applications run faster by caching big time in the RAM are a no-no for multi-tenant hosting. We will thus need a version of JEE which loses the ApplicationContext. We will also need to be able to creatively work with other pools such as connection pools. Finally JEE application servers may need to be restructured to support rapidly dying processes (ie. process per web request). I am no JVM writer so wouldn’t know if some changes might be required to the language but cant think of any major changes that might be required.

Lose the compile cycle

Why can’t “javac” be conditionally executed implicitly at the beginning of “java” process. JSPs generate java on the fly. Similarly python creates the .pyc compiled files on the fly. If the deployment can be done using .java files instead of .class, it would eliminate the compile cycle and allow a developer to change the code in one window, save it, do an alt-tab and go press the reload button on the browser. Compared to all the attention that is being paid to language features like closures and duck typing, I think this is a really really big deal.

How will this help ?

These steps are not targeted towards attracting the developers working in building the corporate enterprise applications (though I am certainly they will break out into three cheers if the compile cycle was done away with – especially in an optional way – ie. the production deployment would still require a war consisting of .class and not .java files). More and more development going forward is going to be characterised by a larger number of smaller applications rather than the old 60s-70s days of small number of large applications. Average applications will get smaller and these will communicate with each other more frequently using web service (REST?) calls. The trend will definitely be towards more in-place remote application reuse by remote invocation rather than through using class libraries. The nimble languages are well positioned to compete in this space. Java isn’t. This will help Java attract and retain developers in a space where currently it is only likely to lose them increasingly. Moreover as application hosting infrastructure starts getting more outsourced and cloud computing gets more prevalent (eg. Amazon Web Services / Google App Engine) Java can at least compete in that space which is otherwise likely to be locked out for it.