Month: July 2008

My first impressions of Google Knol

Posted by – July 24, 2008

Google just launched “Knol” today (announced here on their blog). Back in january, I had written up a blog post – Will the knol be a knowall ?, so I was immediately curious to get a first hand experience. Here’s my fresh off the oven first impressions of Google Knol in no particular order.

  • Current topics focused on healthcare : Most existing content seems to be focused on health care topics (at least the featured ones seem to).
  • Anyone can author a knol : If you have a google id, you can start writing up a knol. The blurb suggests that knols are authoritative but there doesn’t seem to be any verification required for authoritativeness. If you can write a blog post or a wiki page – you can write your own knol.
  • Multiple knols can refer to the same topic : While any person can author a knol, multiple authors can create different knols all referring to the same topic.
  • Collaborative editing is supported (not the default): It is possible to create a knol and then invite additional authors / reviewers.
  • Name Verification is supported : Google has provided for name verification for the author. However for now it seems to be enabled only for US residents.
  • Nice user interface : The UI actually seems much nicer looking than most other google assets. It of course continues to offer nice usability like all google properties do.

I went ahead and wrote up my first knol Comparing and Contrasting Knols, Wikis and Blogs, just to get the feel of writing a knol. Here are my observations on a knol after writing one.

  • If you can write a blog or a wiki you can write a knol. : It doesn’t take any more editing skill set to write a knol than it takes to write a blog post or a wiki. (Hardly surprising).
  • The focus is similar to writing an article : When you are writing a knol, somehow one feels pushed to write a relatively comprehensive article. Its possible to write a knol of a few lines / one paragraph, but I just suspect we won’t see too many of them.
  • Revisioning is supported : Revisioning is automatic and supported. Readers can actually go back and read your earlier revisions or even run a diff across them. (Hmmm .. can’t casually write something stupid and then just go back and update it with something lesser stupid).
  • Separate field for adding references : There’s a separate field for adding references / citation. In Wikis / blogs – we just do it inline I guess. Kind of brings a more formal flavor to writing which requires these to be listed at the end.
  • Its like a time independent blog, or strongly individualized wiki : If you are writing a blog, but seem to be essentially writing long articles which are of an enduring nature, maybe a knol might be a more useful platform. Similarly if a number of authors are collectively contributing many articles on your wiki but each maintaining a set of articles independently, again a knol might be a useful thing to look at.

Implications on corporate environments
Finally to come to my favourite topic of attempting to visualise how knols could get used in corporate intranets. I really couldn’t think of knols adding much more to corporate intranets than wikis could. The existing format for wikis and blogs is in my opinion quite adequate for internal and possibly collaborative publishing requirements of corporate intranets. However I suspect that knols actually might be a nice way to publish external facing content from subject matter experts. I think corporates could look at implementing knols on their external facing web sites which could contain content authored by their CXOs or domain experts. The focus on author and his capabilities is likely to be better suited to this environment and may allow the content to be projected with a sense of strength that a casual wiki or perhaps even a blog is unlikely to be able to project. But for that someone needs to start writing a knol engine first.

Tips for Software / Programming blogging

Posted by – July 23, 2008

Just realised, have been blogging for more than 6 months now (actually I had started another blog ages ago .. but that tapered off soon then). Over this period, I believe I learnt or adopted a few practices. Just sharing them here. Feel free to comment. YMMV.

  1. Treat your readers like a jury not as customers :By jury, I mean a jury as in a academic thesis not as in a court. Whats the difference ?
    • With customers you sell, with a jury you defend your perspective. You may think you are selling your views, but a jury doesn’t shell out any money to buy them. This makes a typical sales process a much more harder and onerous task than just defending. Most readers aren’t out to buy, they are out to learn more and interact more.
    • With customers you assume they may not know all about your product, so you focus on educating them in general towards making a pitch. With a jury you assume they already know far more than you do in general, but you attempt to educate them and draw them into a discussion into something specific that you have spent your time on, on something specific that you are presenting.
    • In a defense, the onus is on you to provide credible backing evidence. In a sales pitch the onus is on the customer to verify your pitch. Most readers would prefer to not carry the additional overhead of having to verify your statements. If you have provided the rationale for your statements clearly and supported it with available evidence if relevant, you have made the readers job much easier. You have increased the chances of the reader wanting to come back to your blog.


  2. Make a strong statement. Avoid taking strong positions : Allow me to define this. By position I mean making absolutist statements without providing a sufficient context or a frame of reference or assuming ones own frame of reference as the only valid one. There is a wide diversity of readers out there. Some are into client side, some into server side. Some are into high usability, some into high speed processing. Some are doing graphics algorithms, some others are into CRUD and business validations. A large majority of your readers are likely to have a different frame of reference than yours. If they can’t understand where you are coming from, they will assume you are coming from the same context that they do. And they are likely to feel confused when what you say doesn’t end up matching their world view. A statement like “I found X more suitable than Y under a context Z” rather than a position like “X is better than Y” is more helpful since :
    • You get to describe your context. Your statement is a statement within a context. It is not treated as a blanket position. Readers with different contexts and divergent views can sometimes trace the differences to the context. Such readers can still suggest alternative views within other contexts easily without appearing to contradict you. Readers with similar contexts and divergent views can still choose to take you on.
    • You have lesser chances of being misinterpreted. You don’t want to get caught in an interview a year down the road when you are changing your job from writing a forms based application to one where you might be required to build say a graphics processing engine, where your interviewer might have just read your blog, and your posts actually do not make sense in the newer context.
    • When you make a strong statement without taking a strong position, readers record their agreement / disagreement with the post rather than you or your blog in general. I personally find that a much more comforting thought than readers choosing to agree / disagree with the blog in general.

  3. Be prepared to update your blog soon :There is a large number of smart people out there, often a lot smarter than us, or having a difference experience set than us. As the comments start coming in, you start learning things you wish you knew before you wrote the post. If the comments indicate something useful and relevant to the post that you would’ve wanted to include in the post had you known about it earlier – go ahead, add it into the post. A convention I have seen is that all non trivial changes after the initial posting should be prefixed with the word “Update:” or “Updates:” so that readers can make out you’ve changed something after your initial post. A comment or two may be especially relevant. It helps to be able to review the comments regularly and update the post if relevant soon. If you are going to be traveling soon, either submit your post a little earlier or post it a little later – but post it when you know you will be able to review the comments and will have the flexibility to take 5 to 10 minutes off your regular work to update the blog if necessary.

  4. Be prepared for surprises : Even if you write carefully you will end up making a small set of readers either happy or disappointed with you in a manner that will leave you puzzled. However hard you try there is a good likelihood someone is going to misquote you or take you on strongly in an unanticipated way. Some of this may be unavoidable and needs to be factored into your assumptions. However some of it will be avoidable, and do follow up such incidents to figure out if there are any learnings that you can apply the next time. A great way to do so is to write a mail back to the commenter or to the blogger who may have linked to your post and get a better understanding of his/her viewpoint.

  5. Don’t title spam your readers : Every so often I come across a post with a provocative title, but which does not live up to the title at all. I prefer call this title spamming, since lot of the spam I receive has a provocative title, but often irrelevant content. Title is important. It influences readership strongly. But if you title spam regularly, it might help you get 2-3 posts higher readership, but its going to hurt in the longer run.

  6. Understand how blog aggregators and networks work :It is important to understand the demographics of different blog aggregators. If you would like your blog to be read by larger number of people, be clear in your mind which demographics you are targeting when writing your post. Some aggregators like javablogs.com and artima.com will target specific programming languages and work off an RSS feed. Explore your blogging software and see if it offers category / tag based feeds. If it does use the categories / tags to ensure your rss feed registered with these aggregators sends only relevant posts to them. I use wordpress and it supports tag / category based RSS feeds. Networks like dzone.com, news.ycombinator.com, reddit.com, slashdot.org, digg.com have very different demographics. Don’t blanket post to all networks. Register your post with those networks where the readers are likely to find your post helpful. I have occasionally come across people wondering whether one should register one’s own posts to a network. My opinion is that it is an acceptable activity.

  7. Ensure you have blog analytics enabled : Over a longer period of time you will start gleaning useful information about your readers. eg. what part of the world do they come from, which links do they come from (eg. you can get statistical information about the referrers such as google reader (RSS), blog aggregators, blog networks etc.). You can also get information about what searches led the search engines to your blog. I prefer wordpress.com stats plugin for wordpress and google analytics. The former is better at providing more immediate feedback, whereas the latter is more comprehensive.

  8. Pay attention to search engines as well :Most blog aggregators and networks will drive substantial traffic to your blog for the first 24-48 hours. Search engines will send a small trickle initially. However there is a big difference. Traffic from aggregators and networks will dry up after a few days for any post. But traffic from search engines will keep on coming. Over a sustained period of time, search engines can start driving a substantial traffic to your blog. Read up about Search Engine Optimisation and see if you can help your blog. I would recommend however that you use such optimisation fairly and only to the extent that it is not misleading.

Presentation : Contrasting java and dynamic languages.

Posted by – July 8, 2008

I presented this last week at the Session @ Java Meet organised by IndicThreads. Have attempted to look at the contrast from the points of view of developers, architects and managers.

Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy

Posted by – July 8, 2008

Update (README CAREFULLY) : I am starting to see hyperlinks to his post with only some of the findings being treated as the link title (eg. X is 100 times faster than Y, X faster than Z). I emphasise once again that I have carefully indicated in the original post that this is but one of many possible microbenchmarks and that you should treat the results as one of many data points. Given the comments I’ve received and some of the links I’ve seen to this post, if I was to make this posting anew, I would choose to assign the title of this post as “Implementing an identical object oriented solution to the Josephus Problem in Java / C++ / Ruby / JRuby / Python / Jython / Groovy and measuring the performance results thereof.”

This post compares performance across various languages for a specific micro benchmark (actually it isn’t really a microbenchmark – it is simply a benchmark for a specific piece of logic – but thats the closest word I could think of).

Last week, while preparing for a presentation – Contrasting Java and Dynamic Languages, I came across this interesting Perl/Python/Ruby Comparison which focused on comparing the code style of different languages. I thought it would be interesting to use the same to get some actual benchmarks based on the same. Note that you could also use the code segments below to get a feel for different syntactic flavours. However since I have strived to keep the code as similar as possible to each other, some of the advanced syntactic sugar of the dynamic languages is not on display here.

Problem Statement

Quoting from the post linked to above :

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.
In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor

Design

I actually changed the design of the solution as compared to the original post. Instead of using the deeply recursive calls as used in the earlier post, I decided to split the logic into two classes, and use loop iteration instead of recursion. It is my belief that we tend to do loop iterations far more frequently than recursions, and the resultant class design having two classes – one to indicate a Chain and one reflecting a Person seemed more appropriate to me.

Logic
The Chain object contains a reference to one person (first) who is but one member in a circular linked list. Each person object has a reference to its previous (prev) and next (next) person in the circle. When the kill loop starts, it sets a threshold (nth). The count starts with 1 from the first person. Each person when asked to shout, checks if the shout count (shout) is less than the threshold (nth). If less, the person just returns an incremented count. If the two are same, the person in effect commits suicide. In doing so the person, updates the next reference of its prev, and prev reference of its next to take himself off the circle and keep the circle consistent, finally returning a shout of 1 (which is what the next person in the list will shout).

The code does not have any comments (sorry!) and all the console outputs have been removed so that the benchmarking activity is not interfered with by the IO overheads.

The results

All the results are as observed on my notebook with the following config
OS : Ubuntu Gutsy Gibbon 7.10
Kernel : 2.6.22-15-generic
CPU : Intel(R) Core(TM) Duo CPU T2600 @ 2.16GHz
RAM : 2GB

Language Version Lines of Code Time per iteration (microseconds)
Java Sun JDK 1.6.0.03 10186 1.6
C++ 4.1.3 20070929 (prerelease)
(Ubuntu 4.1.2-16ubuntu2)
Compiled with optimisation -O3
86 3
gcc version 4.2.3
(Ubuntu 4.2.3-2ubuntu7)
Compiled with optimisation -O3
Alberto Bignotti’s modified code with customised memory reuse and management
124 approx 0
Ruby ruby 1.9.0 (2008-04-14 revision 16006) [i686-linux] 63 114 89
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] 372 380
jruby : ruby 1.8.6 (2008-05-28 rev 6586) [i386-jruby1.1.2] 84 80
Python 2.5.1 41 225 192
2.5.1 with psyco 33
Jython 2.2.1 on JRE 1.6.0.03 884 632
Groovy Groovy Version: 1.5.6 JVM: 1.6.0_03-b05 uncompiled 81 363
Compiled to bytecode and run using java 360
UpdateGroovy Version: 1.6-beta-1 JVM: 1.6.0_03 104
PHP PHP 5.2.3-1ubuntu6.3 (cli) 85 593

Updates :

  • Ken suggested syntactic improvements (see comments below) which lead to even faster ruby execution times : jruby : 80 microseconds, ruby 1.9 : 89 microseconds, ruby 1.8.6 : 380 microseconds. The table above has been updated
  • Cato requested a run using Groovy 1.6 beta 1 – have updated the same. Big improvement
  • Nicholas Riley suggested introducing slots and using “is not” and “is” in the if conditions for the python code. Updated the results to reflect the figure of 192 and 632 micro seconds for CPython and Jython. The figure was 182 microseconds for CPython and 131 microseconds for Jython if I did not use the new style classes, however I did not reflect the same, since most new code is likely to be using new style classes. However this does indicate one possible performance optimisation if your code does not depend upon new style classes. Moreover makes me really interested in waiting for the Jython performance optimisations for new style classes that Nicholas suggests are on their shortlist.
  • Tim Fountain in a comment below indicates that on his hardware (core 2 Quad) with Ubuntu Hardy Heron, Ruby 1.8.6 (same version as above) performs somewhat faster (15%) whereas upgraded version of Python and PHP run much faster(63% for python and 83% for ruby). Another difference in config- he is running 64 bit.
    • python – version 2.5.2: – 138 microseconds
    • ruby – 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] :321 microseconds
    • PHP – 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2 (cli): 323 microseconds.
  • Peter Lupo requested a reduction in line count for Java since the conventional way is to have the opening braces not on a separate independent line. Given the fact that it is a fair comment, I have reduced the line count of java to 86 (didn’t physically change the code – 86 = 101 – 15 opening curly braces).
  • Added another finding of using python with psyco
  • C++ – Added results using Alberto Bignotti’s alternative code with customised memory reuse management

Summarisation

The following are the results. Given the long code blocks, I am presenting the summarisation first followed by the code.

Note: This can only be treated as one particular benchmark. The results are a little atypical with respect to my general understanding. Advise caution against drawing broad conclusions based on this benchmark alone but would suggest that you could treat this as one data point amongst many. People better versed than me in the details of language runtimes might be able to suggest why some of the results seem surprising or atypical.

  • Java / C++ Rock : The performance of Java and C++ was head and shoulders beyond other languages (nearly 100 times faster). My thought is that while a difference of 10x was only to be expected – this difference was just way too massive
  • Java is faster than C++ : Though I had read about other microbenchmarks reaching the same conclusions, it is the first time I actually ran one where Java was faster. There are many others I have run where C++ beats Java quite handsomely. More importantly – the performance of C++ worsened by almost 40% once I added code which started freeing memory that was being allocated (there’s still a small memory leak in the code – there is no Chain destructor which will clean up first). I would later definitely want to look at the impact of garbage collection in this context, and whether the Java garbage collector simply was much faster than the hand crafted new – delete calls in C++. Update:Using the customised memory management (which is not used in any of the other examples) but the same algorithm as in the code written by Alberto, C++ is much faster than Java
  • Ruby 1.9 is twice as fast as Python : While it has been known for a while Ruby 1.9 is much faster than Ruby 1.8.6, heres one more supporting data point. I was expecting ruby 1.9 to give python a run for its performance money. But at least in this particular context it seems to be much much faster.
  • JRuby is faster than Ruby : Even ruby 1.9. Very interesting indeed.
  • Jython still has some catching up to do : Though in the ballpark as the other languages, it was the slowest in the pack.
  • Overhead of dynamism is dominant : I have no idea if JRuby ran much faster because of the java bytecode or because of its implementation (though its performance was not even remotely close to that of Java). However even after I compiled groovy code, to java bytecode, it still ran much slower than python and ruby. It seems the overhead of supporting dynamic constructs is much more dominant than any benefits that one gets out of compilation (whether to java byte code or to intermediate compiled files). I think the argument that because something compiles to java bytecode it is likely to be fast should be looked at a little carefully.
  • PHP stays at the rear end : Though I benchmarked PHP for the first time, I wasn’t completely surprised by the fact that PHP could only manage to be faster than Jython.

Update : There are many comments to this post including those from cwilbur who benchmarks perl using a idiomatic method, Paddy3118 who offers an optimised algorithm for python, and peter lawrey who offers an optimised algorithm for Java. I would like to state that each of their solutions offer superior performance than that what has been described here. However I believe any benchmark comparison should compare apples to apples. Should these contributions be taken into account and be reflected in the table above ? I certainly believe there is a case to do so as an exercise using a different algorithm. However to ensure that it is a fair comparison, one has to modify all the code in all other languages also to reflect the same algorithm. Only then can we get an apple to apples comparison. That is probably an exercise for another post. Is the algorithm I have chosen the fastest – No. However I believe it is a very readable algorithm and if one ignores the IO with networks and databases and files, it is probably close to the kind of code many programmers write (and maintain) on a day to day basis. It has been consistently implemented in all the languages. Readers should be aware that there are algorithms which will deliver much superior performance – but they will also make the performance superior in all the languages (perhaps to slightly differing extents and thus possibly somewhat different results).

The code

For all you who are either interested in running it for yourself or would like to perhaps explore this in more detail, .. here’s the code. Note that I am not equally competent across all languages. So if you believe there is something that could be more appropriate way to code the same, do post a comment. One of the things I have tried to do is to ensure that the code remains more or less similar across all languages. Also I have used getter – setters or skipped them based on my understanding of the generally accepted convention for users of the language.
More…