Nov 19

While the web service meme and implementations has been out there for many years, statistics about how many developers publish or consume them are a little hard to come by. Many statistics talk about the % organisations which have adopted or intend to adopt SOA / Web Services etc., but these often are less than sufficiently useful  IMHO since it does not indicate how widely these get used.

This poll attempts to understand how many developers “actually” either publish or consume web services in the current projects that they are involved with, and if so, what is the nature of the web service APIs (SOAP / REST / HTTP-POX).

Please note, that the poll is in the top right widget of this blog. Multiple selections are acceptable. If this is an answer you seek as well, please invite other developers you know to participate as well through email and blog posts Just indicate to them that the poll can be found in the top right widget at http://blog.dhananjaynene.com. While you can follow the current results online as entries are added, the poll will close on November 30th when the final results will be published.

Jul 08

I presented this last week at the Session @ Java Meet organised by IndicThreads. Have attempted to look at the contrast from the points of view of developers, architects and managers.

Jul 08

Update (README CAREFULLY) : I am starting to see hyperlinks to his post with only some of the findings being treated as the link title (eg. X is 100 times faster than Y, X faster than Z). I emphasise once again that I have carefully indicated in the original post that this is but one of many possible microbenchmarks and that you should treat the results as one of many data points. Given the comments I’ve received and some of the links I’ve seen to this post, if I was to make this posting anew, I would choose to assign the title of this post as “Implementing an identical object oriented solution to the Josephus Problem in Java / C++ / Ruby / JRuby / Python / Jython / Groovy and measuring the performance results thereof.”

This post compares performance across various languages for a specific micro benchmark (actually it isn’t really a microbenchmark - it is simply a benchmark for a specific piece of logic - but thats the closest word I could think of).

Last week, while preparing for a presentation - Contrasting Java and Dynamic Languages, I came across this interesting Perl/Python/Ruby Comparison which focused on comparing the code style of different languages. I thought it would be interesting to use the same to get some actual benchmarks based on the same. Note that you could also use the code segments below to get a feel for different syntactic flavours. However since I have strived to keep the code as similar as possible to each other, some of the advanced syntactic sugar of the dynamic languages is not on display here.

Problem Statement

Quoting from the post linked to above :

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.
In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor


Design

I actually changed the design of the solution as compared to the original post. Instead of using the deeply recursive calls as used in the earlier post, I decided to split the logic into two classes, and use loop iteration instead of recursion. It is my belief that we tend to do loop iterations far more frequently than recursions, and the resultant class design having two classes - one to indicate a Chain and one reflecting a Person seemed more appropriate to me.

Logic
The Chain object contains a reference to one person (first) who is but one member in a circular linked list. Each person object has a reference to its previous (prev) and next (next) person in the circle. When the kill loop starts, it sets a threshold (nth). The count starts with 1 from the first person. Each person when asked to shout, checks if the shout count (shout) is less than the threshold (nth). If less, the person just returns an incremented count. If the two are same, the person in effect commits suicide. In doing so the person, updates the next reference of its prev, and prev reference of its next to take himself off the circle and keep the circle consistent, finally returning a shout of 1 (which is what the next person in the list will shout).

The code does not have any comments (sorry!) and all the console outputs have been removed so that the benchmarking activity is not interfered with by the IO overheads.

The results

All the results are as observed on my notebook with the following config
OS : Ubuntu Gutsy Gibbon 7.10
Kernel : 2.6.22-15-generic
CPU : Intel® Core™ Duo CPU T2600 @ 2.16GHz
RAM : 2GB

LanguageVersionLines of CodeTime per iteration (microseconds)
JavaSun JDK 1.6.0.03101861.6
C++4.1.3 20070929 (prerelease)
(Ubuntu 4.1.2-16ubuntu2)
Compiled with optimisation -O3
863
gcc version 4.2.3
(Ubuntu 4.2.3-2ubuntu7)
Compiled with optimisation -O3
Alberto Bignotti’s modified code with customised memory reuse and management
124approx 0
Rubyruby 1.9.0 (2008-04-14 revision 16006) [i686-linux]63114 89
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]372 380
jruby : ruby 1.8.6 (2008-05-28 rev 6586) [i386-jruby1.1.2]84 80
Python2.5.141225 192
2.5.1 with psyco33
Jython 2.2.1 on JRE 1.6.0.03884 632
GroovyGroovy Version: 1.5.6 JVM: 1.6.0_03-b05 uncompiled81363
Compiled to bytecode and run using java360
UpdateGroovy Version: 1.6-beta-1 JVM: 1.6.0_03104
PHPPHP 5.2.3-1ubuntu6.3 (cli)85593


Updates :
  • Ken suggested syntactic improvements (see comments below) which lead to even faster ruby execution times : jruby : 80 microseconds, ruby 1.9 : 89 microseconds, ruby 1.8.6 : 380 microseconds. The table above has been updated
  • Cato requested a run using Groovy 1.6 beta 1 - have updated the same. Big improvement
  • Nicholas Riley suggested introducing slots and using “is not” and “is” in the if conditions for the python code. Updated the results to reflect the figure of 192 and 632 micro seconds for CPython and Jython. The figure was 182 microseconds for CPython and 131 microseconds for Jython if I did not use the new style classes, however I did not reflect the same, since most new code is likely to be using new style classes. However this does indicate one possible performance optimisation if your code does not depend upon new style classes. Moreover makes me really interested in waiting for the Jython performance optimisations for new style classes that Nicholas suggests are on their shortlist.
  • Tim Fountain in a comment below indicates that on his hardware (core 2 Quad) with Ubuntu Hardy Heron, Ruby 1.8.6 (same version as above) performs somewhat faster (15%) whereas upgraded version of Python and PHP run much faster(63% for python and 83% for ruby). Another difference in config- he is running 64 bit.
    • python - version 2.5.2: - 138 microseconds
    • ruby - 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] :321 microseconds
    • PHP - 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2 (cli): 323 microseconds.
  • Peter Lupo requested a reduction in line count for Java since the conventional way is to have the opening braces not on a separate independent line. Given the fact that it is a fair comment, I have reduced the line count of java to 86 (didn’t physically change the code - 86 = 101 - 15 opening curly braces).
  • Added another finding of using python with psyco
  • C++ - Added results using Alberto Bignotti’s alternative code with customised memory reuse management


Summarisation

The following are the results. Given the long code blocks, I am presenting the summarisation first followed by the code.

Note: This can only be treated as one particular benchmark. The results are a little atypical with respect to my general understanding. Advise caution against drawing broad conclusions based on this benchmark alone but would suggest that you could treat this as one data point amongst many. People better versed than me in the details of language runtimes might be able to suggest why some of the results seem surprising or atypical.

  • Java / C++ Rock : The performance of Java and C++ was head and shoulders beyond other languages (nearly 100 times faster). My thought is that while a difference of 10x was only to be expected - this difference was just way too massive
  • Java is faster than C++ : Though I had read about other microbenchmarks reaching the same conclusions, it is the first time I actually ran one where Java was faster. There are many others I have run where C++ beats Java quite handsomely. More importantly - the performance of C++ worsened by almost 40% once I added code which started freeing memory that was being allocated (there’s still a small memory leak in the code - there is no Chain destructor which will clean up first). I would later definitely want to look at the impact of garbage collection in this context, and whether the Java garbage collector simply was much faster than the hand crafted new - delete calls in C++. Update:Using the customised memory management (which is not used in any of the other examples) but the same algorithm as in the code written by Alberto, C++ is much faster than Java
  • Ruby 1.9 is twice as fast as Python : While it has been known for a while Ruby 1.9 is much faster than Ruby 1.8.6, heres one more supporting data point. I was expecting ruby 1.9 to give python a run for its performance money. But at least in this particular context it seems to be much much faster.
  • JRuby is faster than Ruby : Even ruby 1.9. Very interesting indeed.
  • Jython still has some catching up to do : Though in the ballpark as the other languages, it was the slowest in the pack.
  • Overhead of dynamism is dominant : I have no idea if JRuby ran much faster because of the java bytecode or because of its implementation (though its performance was not even remotely close to that of Java). However even after I compiled groovy code, to java bytecode, it still ran much slower than python and ruby. It seems the overhead of supporting dynamic constructs is much more dominant than any benefits that one gets out of compilation (whether to java byte code or to intermediate compiled files). I think the argument that because something compiles to java bytecode it is likely to be fast should be looked at a little carefully.
  • PHP stays at the rear end : Though I benchmarked PHP for the first time, I wasn’t completely surprised by the fact that PHP could only manage to be faster than Jython.

Update : There are many comments to this post including those from cwilbur who benchmarks perl using a idiomatic method, Paddy3118 who offers an optimised algorithm for python, and peter lawrey who offers an optimised algorithm for Java. I would like to state that each of their solutions offer superior performance than that what has been described here. However I believe any benchmark comparison should compare apples to apples. Should these contributions be taken into account and be reflected in the table above ? I certainly believe there is a case to do so as an exercise using a different algorithm. However to ensure that it is a fair comparison, one has to modify all the code in all other languages also to reflect the same algorithm. Only then can we get an apple to apples comparison. That is probably an exercise for another post. Is the algorithm I have chosen the fastest - No. However I believe it is a very readable algorithm and if one ignores the IO with networks and databases and files, it is probably close to the kind of code many programmers write (and maintain) on a day to day basis. It has been consistently implemented in all the languages. Readers should be aware that there are algorithms which will deliver much superior performance - but they will also make the performance superior in all the languages (perhaps to slightly differing extents and thus possibly somewhat different results).

The code

For all you who are either interested in running it for yourself or would like to perhaps explore this in more detail, .. here’s the code. Note that I am not equally competent across all languages. So if you believe there is something that could be more appropriate way to code the same, do post a comment. One of the things I have tried to do is to ensure that the code remains more or less similar across all languages. Also I have used getter - setters or skipped them based on my understanding of the generally accepted convention for users of the language.
Continue reading »

Jun 09

I have 8+ years experience on C++ and Java each and at least consider myself an expert on the latter and used to consider myself one on the former a long time back. For any programmer it is a difficult choice to move away from the platform and the environment in which one has both a substantial investment into and into another one where you essentially throw away years of experience and start as a novice (at least in pure programming terms).

This is not a post which is either to be interpreted as pro Python or anti Java or a pro/anti “name any language you wish”. I will gladly and gleefully go back to C++ / Java when using them makes sense under a context. I am sharing my thoughts and not promoting or denigrating any languages or frameworks here.

Context :

Here’s the shift of context (not all aspects are necessarily relevant to the shift in language). There are many details I have deliberately not got into for obvious protection required for any commercial activity.

  • From a relatively large programming team that I used to manage into a small one (yours truly only to begin with)
  • From building a commercially owned closed software into open source software development (what is at this stage intended to be though it will take a few months to get there)
  • From customers always being large corporates who could often afford substantial hardware into customers who can range from individuals to large corporates
  • From mostly internal (intranet) facing to internet facing
  • From performance requirements of upto a few hundred thousands of transactions per hour into a completely wide range of requirements based on each customer
  • From a very high percentage of writes to a much smaller percentage (ie. read percentages are now much higher)

Initial Choice Set

Given the fact that this application is intended to be hosted on the internet and is primarily a browser based application my choices quickly narrowed down to Java / JEE (something I had a long experience on), PHP (I had developed one web based application with it), Ruby and Python (I only had academic exposure to these). C/C++ did not figure on the choice set for their obvious development overheads in the context of web applications.

I went through a fair degree of thought and creation of dummy applications and the mental to and fro and the the process was not nearly as linear as I will describe below. The process is simplified below simply so that the reader can get at least some insight into my mind and my mind is made to seem a lot less confused than it actually is.

1st Elimination :

Java was the first one to get knocked off. One reason was that Java based applications typically require either a dedicated host or have to work under memory constraints in case of shared hosts. Simply put Java scales exceptionally well but it has a minimum hardware / investment requirement which was not acceptable within this context. The application should be able to worked on shared hosting environments. Another equally important reason was that the productivity of initial development and that of making changes to Java applications is much lesser than the other languages. I was only too acutely aware of the performance implications of this choice, but I believe I the appropriate choice here shall be to scale out especially where the read activity is especially large compared to the writes. Scaling out does require more complex architectures (compared to simply scaling up) but thats the way that is appropriate in this context.

Subsequent thought process

This one was much tougher. Let me delve for a moment into what I believed the key strengths of the various languages were.

Ruby : I really really loved the syntax. Compact, cute ‘n’ thoroughly OO. Strong metaprogramming capabilities
PHP : Massive developer base (especially important when the intention is to eventually open source the application). It is amongst the easiest languages to use. Another advantage in its favour is the ‘C’ness of the syntax which makes it easier for anyone coming in from the C/C++/Java world.
Python : The metaprogramming and OO were almost but not as good as Ruby. I really love the indentation driven syntax. I know many might differ but I really like it and the neat paragraphs and lack of block braces make the code a lot more readable. Current production interpreters seem to be the best performant compared to Ruby and PHP. I know Ruby 1.9 is getting much faster but I suspect it is unlikely to be enough to make it much faster than Python already is.

As I considered the languages, it was important to look at the frameworks. I looked at CodeIgniter, CakePHP and Zend for PHP, Rails for Ruby and Pylons and Django for Python.

I believe one of the important aspects of architecture decision making today is that you bring in the available toolsets / frameworks into the decision making process even when you are attempting to select a language. You are in effect evaluating a package involving both the language and the framework. A specific issue to be noted in my context was that regardless of what framework I chose I was completely sure I would need to change it / extend it due to the fact that some of the basic requirements of the application I am about to build - no well known existing framework supports them. I am certain and convinced this is not a “Not Built Here” syndrome that I am suffering from but simply a necessity of the domain I am attempting to work with.

In terms of ease of use for simpler applications I would rate Rails very highly. Not only does it make the actual programming simple, it gives you a nice set of tools around it to make a lot of typical activities really easy.

2nd elimination

Ruby and Rails went out. The reasons were as follows.

  • Ruby is just so well designed from a syntax and OO perspective, that coming from so many years of C++/Java background with a substantial grounding in Object Orientation as implemented by these languages, I really did not get the confidence that I could do it sufficient justice. My fear was that I would keep on finding ways of doing things in a better way in this language (and given my inherent compulsions would feel inclined to refactor code rather than focus on newer development).
  • I perceived that Rails was really focused on typical much simpler use cases. What I intend to do requires getting into an immense amount of complexity and I felt fairly certain Rails wasn’t designed for that and that I would spend a very very substantial time reinventing or hacking through Rails.
  • One of the strong features of Rails was something I couldn’t really completely come to terms with was “Convention over Configuration”. I still can’t get over some of the implicitness in the environment.

3rd Elimination

Clearly PHP is such a widely used language with so many developers who are already trained on using it, that using it for an as yet intended to be open source application should be a no brainer - Right ? Not in this case. Two reasons why PHP went out.
  • I intend to pull off some really complex programming. Given the better OO and metaprogramming capabilities of python - I thought I would be able to keep my code much better concise, structured and readable if I was to use Python (this would’ve been true with Ruby as well!).
  • Django - This framework simply came closest to being the framework I would really like to end up with. Thus the gap between what it has vs. what I would like to have was the smallest, and the general design principles were those that I was extremely comfortable with.

In summary : Why these got eliminated

  • JEE : The difficulty of using JEE in shared hosting environments and the long development times required made it tough to use it.
  • Ruby : I like the language. I was a little intimidated by it. Rails however seemed limited to simpler use cases
  • PHP : Wonderful developer base. However didn’t give me the same comfort in terms of its OO capabilities and metaprogramming and did not believe the resultant code would be the most compact, concise, and readable.

In summary : Python and Django : What do I hope to get from them

Now that I have made the choice - what do I expect from these choices at this stage :

  • Excellent framework to start off with - provide the maximum initial boost
  • Ability to write concise, structured and readable code
  • Ability to make changes rapidly
  • Reasonably performant application

Other choices that I did not spend too much time on

During this process I also evaluated other languages such as C# and Groovy/Grails. I actually was quite impressed with the recent architecture trends coming from Microsoft and was actually tempted to spend more time on them. However the same reasons that eliminated JEE quickly made these choices unviable as well. I wish I had the luxury to consider other languages as well, but these were what were at the top of my mind, and I did not consider any other languages in the process given the time constraints.

I am missing JEE

I really really will miss JEE. I would love to have used it for the simple fact that I know it so well and this one will require me to move from an expert to a novice category. I will also miss it for the fact that with JEE I knew what it took to deliver exceptionally high performance. Even though I am feeling the JEE separation pangs, I believe Python is the right choice since it will allow for the fastest development, will allow for some really rapid changes (agility in coding), and will allow me to get the developers who shall eventually be joining us come up the curve much faster and be able to deliver more features much faster.

Final thoughts

One thing I realised through the whole process was that there are two strong influencing factors to any architecture choices. The first one obviously is the context. There is no way to compare or contrast any architecture or design choices without putting a context around it. Secondly and this was a little bit more of a surprise to me was that you simply cannot remove personal proclivities from such a choice making process. Since I as an individual am more comfortable with some styles of design than others, and since I am likely to be substantially involved with the initial development, it only seems to make sense that the choice set gets evaluated in this subjective context of individual comfort and therefore the projected individual productivity and effectiveness as well. Note that while in this case I was evaluating it strictly from my own perspective, one could evaluate the choices in the context of Team styles, culture and comfort as well.

Apr 17

This post gives you a small tip which just might make a world of difference to your java hashmap’s performance. This trick has been inspired by the “symbol” construct in Ruby language.

I have often considered using hash maps using Strings as keys as quite expensive indeed. And in many ways they often are. However if the keys used in your hashmap are either a well known set at the time of either writing the code or at least when the program starts up, the following is likely to help you make your map performance much much zippier.

In case you are not familiar with Ruby, it has a special construct called a symbol which is somewhat similar to a constant string. However you can create as many instances of it, but ruby runtime will ensure that multiple instances having the same character data will refer to the same runtime instance.

The design of any key will influence the performance of the hashmap primarily based on the performance of its hashcode and equals methods. The java.utils.HashMap implementation uses the result of hashCode() to narrow down the potential number of keys to be compared and then compares the keys based on whether they are the same instance (ie. occupy the same address space in memory) or in case they aren’t then by invoking the equals() method.

Thus if one wants to use Strings as keys, then there are at least two optimizations that could be potentially targeted :
(a) The hashcode could be cached rather having to be computed each time (Turns out this makes a positive but a rather small difference)
(b) Ensure that the same instance of strings get used for the same string data. (Turns out this does make a substantial difference).

The following two pieces of code indicate the difference.

Continue reading »