Contrasting Performance : Languages, styles and VMs - Java, Scala, Python, Erlang, Clojure, Ruby, Groovy, Javascript

Mon 15 August 2011

Retrieving comment count

Major update

This blog post is now formally retracted. A part of the original post remains. As does a record of the contributors and a log of the annotations of the updates. The code also remains on github. What are deleted are the actual published results, and some other sections that are now less relevant after this retraction. Added is the narrative about the reason for retraction.

The way this post started was a casual exercise in measuring performance of my earlier python code in pypy. I found the results interesting, so soon tried the same in Erlang. Having found the same to be interesting as well, very soon I found myself adding more languages and different coding styles resulting into a set of exercises which caused me to think (naively in hindsight) - hmmm, these results do tell me something. And probably the learnings are useful enough to share. It disappoints me to not be able to continue to share the same. Yet thats what I am just doing.

Why ?

I find myself a little wiser and a lot humbled. The learnings above are unlikely to be forgotten. I also find myself much more aware of the performance implications of code thats consistent with my subjective understanding of idiomatic. The original reason I started this exercise has satisfied its objective in terms of an improved personal understanding, perhaps even more so because of a number of contributions I received. Thats a useful learning as well.

The results that were published earlier on this page need to go. The code continues to remain on github for your perusal and further tweaking based on your definition of letter and spirit.

Sincere thanks to all who contributed. Particularly to Isaac. I learnt a lot from him. Further activity on this post shall cease.

Parts of the Original Post

There's a better place to specifically look at performance comparisons across languages than this post - The computer languages benchmarks game. But this post attempts look at performance comparisons a little differently. Based on coding idioms as well. And for a much narrower range of problems (namely one).

There are languages which are tightly opinionated on a particular way of doing things. And there are languages which allow you to implement a given logic in multiple ways. Yet, depending upon the language (and as we shall see, the runtime), the performance could vary quite substantially based on the nature of the code we write. This post attempts to take a small piece of logic, and implements in upto 3 different styles in 8 languages (10 if you count the runtime variations as well).



Quoting from The Josephus Problem,

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.

In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor. In the code I benchmarked, n = 40 and k = 3.

Update. Note: Some are getting confused that I start by striking out the very first soldier in the chain and then starting to count up to the k value. This is one of the variations of the Josephus problem I had introduced this time. All the versions implement this logic consistently.


I have considered three idioms :

I've attempted to implement code in all languages using the styles above as long as reasonably feasible and appropriate. Since (barring C/C++), Java continues to be the language to beat from a performance perspective, I've attempted to implement roughly equivalent logic in all styles using Java as well. All programs typically run the code once to print the results (to verify correctness), and then 100000 or a million iterations to warmup, and then again repeat the iterations and measuring the elapsed time. There is a slight inconsistency between the various code snippets. The counter either varies between 0 to 39 or between 1 to 40.


I can't write the fastest possible code across all these languages. This is the best I could do. However if you can find a better way to implement the code, do let me know in the comments (or send me a pull request on github). I shall certainly include better solutions here if and as they are identified. At the point in time of publishing this, at least two authors had contributed to the code. I imagine (based on my experience with the prior post), more might be interested in suggesting tweaks to further improve performance. These are all listed here.

Hardware / Software


Metrics :


Observations : (Updated)


Full Source code is available on github at

Finally, thanks to a number of folks I had a chance to preview the post with and especially to Saager Mhatre to suggest moving the code from a attached zip file to github.


comments powered by Disqus