“It worked once” - The ’selfish gene’ for architects and managers ?

Posted on June 21st, 2008 in architecture | No Comments »

I read two completely disconnected posts roughly at the same time, one related to software architects and another to socio-biology - viz. “The five types of poor architects” and “New discovery proves ‘Selfish Gene’ exists” and I was wondering if architects and managers suffer from a Selfish Gene syndrome ? Now I am obviously stretching the meaning of the word selfish gene, since architect’s kids by default aren’t architects, and any knowledge of architecture passed down is unlikely to be through genetic transfers.


The post about the architects states :

The arguments these architects often use to turn down new ideas include “it will never work”, “this is not practical”, “we already has proven methods, why bother such risky solution?”, or even the discussion-stopping message “I have solved this problem before, I know what works and what do not.”


The post about the selfish gene states :

In studying genomes, the word ‘selfish’ does not refer to the human-describing adjective of self-centered behavior but rather to the blind tendency of genes wanting to continue their existence into the next generation. Ironically, this ‘selfish’ tendency can appear anything but selfish when the gene does move ahead for selfless and even self-sacrificing reasons.


Architects and managers pass on their empirically gathered knowledge and heuristics sometimes as wisdom snippets and sometimes as precepts to the people they are mentoring, and some of it is based on the fact that “it-worked-once” therefore it must be right. I wonder if the “it-worked-once” learning acts like a selfish gene, propagating itself into others and into situations where perhaps it has long lost its relevance or validity and may actually result into self inflicted damage ? My learning is that learnings and heuristics have to be continuously evaluated in changing contexts. And to be on the lookout for this gene.

BTW, I have seen the same work with developers - but it often deals with code (sometimes the gene actually is a piece of code), has far less damaging consequences and its often easier caught. Thats what tools like PMD’s Copy Paste Detector are for ;)

Beware of polyglot programming

Posted on June 10th, 2008 in java, software | 9 Comments »

Seems like a nice term doesn’t it - Polyglot programming. Some recent posts related to it are Kill Java, Vol. 2 and Fractal Programming It may seem hot, it may seem in and it may make sense for you. However please allow me to lay down the fine print.

Polyglot programming requires to build bridges across computer languages. These bridges act like translators at the UN. If you are attempting to run the UN you need the translators, you need to accept the cost. But make sure you understand that there is a cost, and justify to yourself that you really need to incur it.

Let me give you a case in point - JSP Tag Libraries. These were meant to help the presentation / UI guys write the cool UIs while allowing the java programmers to focus on the other interesting stuff which was much better done in Java. The Tag Library construct was then a bridge between HTML and Java. Did you ever try to measure the runtime performance cost of tag libraries ? I did and it turned out that for all but the completely non trivial tag libraries - the cost was too high. I was able to run the controller logic, lookup the data from the cache, update the data in the cache (basically the entire lifecycle of a transaction except for the final updates) and sometimes finish the database updates as well, all in much much lesser time than the time it required to move the data across the tag library when presenting the data. I had a simple rule - if you need screaming concurrency and thruput from a small box - do not touch tag libraries. I however did use tag libraries when the performance demands weren’t so high.

Another example of such bridges being sold without proper warnings - EJB v 1.0. The fine print did talk about the remote API costs, but the general paradigm encouraged using the EJB APIs to communicate between the session beans and the entity beans. History does tell us this performance sucked too!

If you work with JNI you will soon realise that a lot of data transformation needs to happen when moving complex data structures between C and Java. Again while much faster than exchanging XML documents over inter process pipelines, this can be expensive too.

Suggestion : focus on the granularity and the triviality of what you are doing. If the communication across the different languages is fine grained or too frequent sit up, take notice and be careful. If the code in the other language (the language that is providing the service and thus is being called) is too small or too trivial, again ask yourself whether this is really required.

Like it or not we are already polyglots. We do DHTML + CSS + Javascript + (Java / Python / Name your Language). It makes sense to be polyglottish. Treat polyglottism as a tool in your toolbox and not as a fashion statement. Keep the kids in mind and especially when showing them how to move between the static and dynamic programming languages - run your benchmarks and lay down the caveats. There is indeed a possibility that reckless application of the paradigm might lead you (more likely your unsuspecting readers) to a crawling system - something the users will describe as a PolyClot. :)

How I ended up selecting Python for my latest project

Posted on June 9th, 2008 in Uncategorized, java, php, python, ruby, software | 25 Comments »

I have 8+ years experience on C++ and Java each and at least consider myself an expert on the latter and used to consider myself one on the former a long time back. For any programmer it is a difficult choice to move away from the platform and the environment in which one has both a substantial investment into and into another one where you essentially throw away years of experience and start as a novice (at least in pure programming terms).

This is not a post which is either to be interpreted as pro Python or anti Java or a pro/anti “name any language you wish”. I will gladly and gleefully go back to C++ / Java when using them makes sense under a context. I am sharing my thoughts and not promoting or denigrating any languages or frameworks here.

Context :

Here’s the shift of context (not all aspects are necessarily relevant to the shift in language). There are many details I have deliberately not got into for obvious protection required for any commercial activity.

  • From a relatively large programming team that I used to manage into a small one (yours truly only to begin with)
  • From building a commercially owned closed software into open source software development (what is at this stage intended to be though it will take a few months to get there)
  • From customers always being large corporates who could often afford substantial hardware into customers who can range from individuals to large corporates
  • From mostly internal (intranet) facing to internet facing
  • From performance requirements of upto a few hundred thousands of transactions per hour into a completely wide range of requirements based on each customer
  • From a very high percentage of writes to a much smaller percentage (ie. read percentages are now much higher)

Initial Choice Set

Given the fact that this application is intended to be hosted on the internet and is primarily a browser based application my choices quickly narrowed down to Java / JEE (something I had a long experience on), PHP (I had developed one web based application with it), Ruby and Python (I only had academic exposure to these). C/C++ did not figure on the choice set for their obvious development overheads in the context of web applications.

I went through a fair degree of thought and creation of dummy applications and the mental to and fro and the the process was not nearly as linear as I will describe below. The process is simplified below simply so that the reader can get at least some insight into my mind and my mind is made to seem a lot less confused than it actually is.

1st Elimination :

Java was the first one to get knocked off. One reason was that Java based applications typically require either a dedicated host or have to work under memory constraints in case of shared hosts. Simply put Java scales exceptionally well but it has a minimum hardware / investment requirement which was not acceptable within this context. The application should be able to worked on shared hosting environments. Another equally important reason was that the productivity of initial development and that of making changes to Java applications is much lesser than the other languages. I was only too acutely aware of the performance implications of this choice, but I believe I the appropriate choice here shall be to scale out especially where the read activity is especially large compared to the writes. Scaling out does require more complex architectures (compared to simply scaling up) but thats the way that is appropriate in this context.

Subsequent thought process

This one was much tougher. Let me delve for a moment into what I believed the key strengths of the various languages were.

Ruby : I really really loved the syntax. Compact, cute ‘n’ thoroughly OO. Strong metaprogramming capabilities
PHP : Massive developer base (especially important when the intention is to eventually open source the application). It is amongst the easiest languages to use. Another advantage in its favour is the ‘C’ness of the syntax which makes it easier for anyone coming in from the C/C++/Java world.
Python : The metaprogramming and OO were almost but not as good as Ruby. I really love the indentation driven syntax. I know many might differ but I really like it and the neat paragraphs and lack of block braces make the code a lot more readable. Current production interpreters seem to be the best performant compared to Ruby and PHP. I know Ruby 1.9 is getting much faster but I suspect it is unlikely to be enough to make it much faster than Python already is.

As I considered the languages, it was important to look at the frameworks. I looked at CodeIgniter, CakePHP and Zend for PHP, Rails for Ruby and Pylons and Django for Python.

I believe one of the important aspects of architecture decision making today is that you bring in the available toolsets / frameworks into the decision making process even when you are attempting to select a language. You are in effect evaluating a package involving both the language and the framework. A specific issue to be noted in my context was that regardless of what framework I chose I was completely sure I would need to change it / extend it due to the fact that some of the basic requirements of the application I am about to build - no well known existing framework supports them. I am certain and convinced this is not a “Not Built Here” syndrome that I am suffering from but simply a necessity of the domain I am attempting to work with.

In terms of ease of use for simpler applications I would rate Rails very highly. Not only does it make the actual programming simple, it gives you a nice set of tools around it to make a lot of typical activities really easy.

2nd elimination

Ruby and Rails went out. The reasons were as follows.

  • Ruby is just so well designed from a syntax and OO perspective, that coming from so many years of C++/Java background with a substantial grounding in Object Orientation as implemented by these languages, I really did not get the confidence that I could do it sufficient justice. My fear was that I would keep on finding ways of doing things in a better way in this language (and given my inherent compulsions would feel inclined to refactor code rather than focus on newer development).
  • I perceived that Rails was really focused on typical much simpler use cases. What I intend to do requires getting into an immense amount of complexity and I felt fairly certain Rails wasn’t designed for that and that I would spend a very very substantial time reinventing or hacking through Rails.
  • One of the strong features of Rails was something I couldn’t really completely come to terms with was “Convention over Configuration”. I still can’t get over some of the implicitness in the environment.

3rd Elimination

Clearly PHP is such a widely used language with so many developers who are already trained on using it, that using it for an as yet intended to be open source application should be a no brainer - Right ? Not in this case. Two reasons why PHP went out.
  • I intend to pull off some really complex programming. Given the better OO and metaprogramming capabilities of python - I thought I would be able to keep my code much better concise, structured and readable if I was to use Python (this would’ve been true with Ruby as well!).
  • Django - This framework simply came closest to being the framework I would really like to end up with. Thus the gap between what it has vs. what I would like to have was the smallest, and the general design principles were those that I was extremely comfortable with.

In summary : Why these got eliminated

  • JEE : The difficulty of using JEE in shared hosting environments and the long development times required made it tough to use it.
  • Ruby : I like the language. I was a little intimidated by it. Rails however seemed limited to simpler use cases
  • PHP : Wonderful developer base. However didn’t give me the same comfort in terms of its OO capabilities and metaprogramming and did not believe the resultant code would be the most compact, concise, and readable.

In summary : Python and Django : What do I hope to get from them

Now that I have made the choice - what do I expect from these choices at this stage :

  • Excellent framework to start off with - provide the maximum initial boost
  • Ability to write concise, structured and readable code
  • Ability to make changes rapidly
  • Reasonably performant application

Other choices that I did not spend too much time on

During this process I also evaluated other languages such as C# and Groovy/Grails. I actually was quite impressed with the recent architecture trends coming from Microsoft and was actually tempted to spend more time on them. However the same reasons that eliminated JEE quickly made these choices unviable as well. I wish I had the luxury to consider other languages as well, but these were what were at the top of my mind, and I did not consider any other languages in the process given the time constraints.

I am missing JEE

I really really will miss JEE. I would love to have used it for the simple fact that I know it so well and this one will require me to move from an expert to a novice category. I will also miss it for the fact that with JEE I knew what it took to deliver exceptionally high performance. Even though I am feeling the JEE separation pangs, I believe Python is the right choice since it will allow for the fastest development, will allow for some really rapid changes (agility in coding), and will allow me to get the developers who shall eventually be joining us come up the curve much faster and be able to deliver more features much faster.

Final thoughts

One thing I realised through the whole process was that there are two strong influencing factors to any architecture choices. The first one obviously is the context. There is no way to compare or contrast any architecture or design choices without putting a context around it. Secondly and this was a little bit more of a surprise to me was that you simply cannot remove personal proclivities from such a choice making process. Since I as an individual am more comfortable with some styles of design than others, and since I am likely to be substantially involved with the initial development, it only seems to make sense that the choice set gets evaluated in this subjective context of individual comfort and therefore the projected individual productivity and effectiveness as well. Note that while in this case I was evaluating it strictly from my own perspective, one could evaluate the choices in the context of Team styles, culture and comfort as well.

Turbocharge your string keyed hashmaps

Posted on April 17th, 2008 in java, ruby, software | 13 Comments »

This post gives you a small tip which just might make a world of difference to your java hashmap’s performance. This trick has been inspired by the “symbol” construct in Ruby language.

I have often considered using hash maps using Strings as keys as quite expensive indeed. And in many ways they often are. However if the keys used in your hashmap are either a well known set at the time of either writing the code or at least when the program starts up, the following is likely to help you make your map performance much much zippier.

In case you are not familiar with Ruby, it has a special construct called a symbol which is somewhat similar to a constant string. However you can create as many instances of it, but ruby runtime will ensure that multiple instances having the same character data will refer to the same runtime instance.

The design of any key will influence the performance of the hashmap primarily based on the performance of its hashcode and equals methods. The java.utils.HashMap implementation uses the result of hashCode() to narrow down the potential number of keys to be compared and then compares the keys based on whether they are the same instance (ie. occupy the same address space in memory) or in case they aren’t then by invoking the equals() method.

Thus if one wants to use Strings as keys, then there are at least two optimizations that could be potentially targeted :
(a) The hashcode could be cached rather having to be computed each time (Turns out this makes a positive but a rather small difference)
(b) Ensure that the same instance of strings get used for the same string data. (Turns out this does make a substantial difference).

The following two pieces of code indicate the difference.

Slower Code

    map.put(new String("mykey"),/* .. some value .. */);
    Object o = map.get(new String("mykey"));

Faster Code

    String key = "mykey";
    map.put(key,/* .. some value .. */);
    // Note : In this case the same instance of the key is
    //            is used in both the get and the put
    Object o = map.get(key);

The big reason why this makes a difference is the following line of code in java.util.HashMap

// The following line has two ampersand signs indicating a logical 
// and. Formatting is destroying the way it looks 
//     (and I do not know how to fix it)
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) ...

Thus when comparing the keys, the code first tests whether they are identical and then they are equal. Obviously the test for identity is substantially inexpensive compared to that of equality. Thus the faster code shown above is faster since the keys are identical.

In order to be able to provide the same capability of ensuring that only one instance of a string key with a particular string data is constructed, while taking away the onus from the programmer of having to track the instance creation, a wrapper class called Symbol is used as shown below :

Update:I have updated the version to address Khalil’s and Dave’s concerns and suggestions. The important part of the modification is that the hashmap has been done away with and the getSymbol() method is modified. Earlier code which has been replaced has been commented out.

package com.dnene.utils.symbolmap;

/**
 * License : Based on BSD Template
 * 
 * Copyright (c) 2008, Dhananjay Nene
 * All rights reserved.
 * 
 * Redistribution and use in source and binary forms, 
 * with or without modification, are permitted provided 
 * that the following conditions are met:
 *
 *    * Redistributions of source code must retain the 
 *      above copyright notice, this list of conditions 
 *      and the following disclaimer.
 *    * Redistributions in binary form must reproduce 
 *      the above copyright notice, this list of 
 *      conditions and the following disclaimer in the 
 *      documentation and/or other materials provided 
 *      with the distribution.
 *    * The name of the Dhananjay Nene  may not be used 
 *      to endorse or promote products derived from this 
 *      software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND 
 * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, 
 * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 
 * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR 
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 
 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY 
 * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 
 * POSSIBILITY OF SUCH DAMAGE.
 */

import java.util.HashMap;
import java.util.Map;

public final class Symbol
{
	private final String t;
	private final int hashcode;
	// not required after update
        // private static Map<String,Symbol> map = new HashMap<String,Symbol>();
	
        //  method simplified upon update
	// public static Symbol getSymbol(String t)
	// {
	// 	Symbol symbol = map.get(t);
	// 	if (symbol == null)
	// 	{
	// 		symbol = new Symbol(t);
	// 		map.put(t, symbol);
	// 	}
	// 	return symbol;
	// }
	
	public static Symbol getSymbol(String t)
	{
		return new Symbol(t.intern());
	}

	private Symbol(String t)
	{
		this.t = t;
        this.hashcode = t.hashCode();
	}
	
	public final String get()
	{
		return t;
	}

	@Override
	public int hashCode()
	{
		return this.hashcode;
	}


	@Override
	public final boolean equals(Object that)
	{
		return ((that instanceof Symbol) ? 
                           (this.t == (((Symbol)that).t)) : false);
	}

}

The code snippets shown earlier would be modified as follows to use Symbol

Symbol usage :

    Symbol key = Symbol.getSymbol("mykey");
    map.put(key,/* .. some value .. */);
    // Note : In this case another instance of the symbol is created
    // Note : You can also use key.get() in this case so long as you use it consistently
    Symbol key = Symbol.getSymbol("mykey");
    Object o = map.get(key);

Update :As per Dave’s suggestion in using String.intern(), the following could alternatively be used. Note that the performance benefits are to be had only if the symbol construction or the intern() call are made far less frequently than the calls to the getter on the map.

    Symbol key = new String("mykey").intern();
    map.put(key,/* .. some value .. */);
    // ... elsewhere in code
    Symbol key2 = new String("mykey").intern();
    Object o = map.get(key2);


Performance Difference

In my benchmarks involving million keys, the get performance of maps using identical String keys and symbols was almost the same (the symbol based implentation was roughly either slower by 3% or faster by upto 10% with the average performance of the symbol implementation being faster by 4%). I did not benchmark the puts since I did not imagine they would change much. However the Symbol based implementation get method was consistently faster by at least 30% when compared to the String map performance when the string when the keys used during the ‘put’ where equal to but not identical to those used during the ‘get’. There is an overhead of creating a symbol instance compared to a string, but under most circumstances I believe this should be much more than offset by the gains.

What surprised me was that the Symbol based String keys beat the get performance of using Long keys quite handsomely and consistently. I am not sure why it works so and am not sure if I had made a mistake. However since using Long keys was not something I was particularly focused on - I did not hunt down the reason for the performance difference between Symbol and Long based keys.


Update : adding a sample output of performance benchmarking sample run. The runs consist of hashmaps of million entries each, with each key being 16 characters long, and a million lookups getting done. The times mentioned are in nanoseconds for each run of a million lookups.

=== Comparison of Symbol to Long ===
Long lookup time : 246225003
Symbol lookup time : 148041217
Performance Ratio : 1.6632193
Reduction in time : 39%
=== Comparison of Symbol to Identical Strings ===
Usual lookup time : 151010200
Symbol lookup time : 148041217
Performance Ratio : 1.0200552
Reduction in time : 1%
=== Comparison of Symbol to String Copies ===
Copy lookup time : 231829056
Symbol lookup time : 148041217
Performance Ratio : 1.5659764
Reduction in time : 36%

Am adding the link to the source and the driver files for running your tests independently.
Symbol source and driver program

Field of use

This should be useful in a fairly large proportion of typical applications. In most most situations the possible universe of keys in the hashmap are known upfront either when writing the code or when starting up the application. If instead of creating hard coded strings or by using various string key parameters from say an XML file .. just use a Symbol instead. The construction is a little expensive but the map gets run much faster.


This solution is not limited to a String. It could actually be used for any data structure which has a high cost of either hash code computation and / or equality check. In fact the version I wrote for myself was a Symbol<T>. The code shown above is only a specialised version where T is a String.

Java : if (compete with PHP / Ruby / Python) { stop fixing the syntax and start fixing the runtime }

Posted on April 14th, 2008 in java, php, python, ruby, software, web | 11 Comments »

A lot of people are wondering whether Java is under threat from a set of nimble languages - PHP / Ruby / Python / Perl. There is a flurry of activity relating to Whether Java is losing the battle for the modern web lot of which are being driven from the earlier post by Andi Gutman Java is losing the battle for the modern Web. Can the JVM save the vendors?.

My recent activities had me visit the same question and the following is how I would summarise the situation and the mechanisms for Java to better compete with these nimble languages.

Current scenario

Yes, I think there is some threat to the scope of use of Java as a server side web application development language. Given the high acceptance of Java in the corporate environments this threat will take some time to play itself out. The sales pitch of Java has often been supported heavily by the vendors, and this has led to lesser focus on making java compete at the non-corporate-enterprise end. The nimble languages compete quite well in this space and combined with the increasing power of hardware which makes them more and more relevant each day for high performance requirements, there is a clear threat of these languages starting from a lower end pushing java out of even the middle end.

Opinion: Dynamism of syntax is not the real issue

What I am not yet fully on board with is the characterisation that the real issue is that Java is lesser dynamic than say PHP / Ruby / Python. Java has a pretty strong set of capabilities and while some may desire more from a syntax perspective I don’t know that thats the real issue. Lack of Closures, Dynamic Types, Duck Typing may make it difficult for java to compete in some contexts only to be at least partially offset in others.

Issue : Shared hosting of java applications

The first real issue is the overhead to develop and then start hosting java applications. It is difficult for host providers to support allowing each user the ability to run their own JVM processes in a sustained fashion. I remember the days when using PostgresSQL required you to have a dedicated server whereas you could use mysql easily by just setting up yourself for a small multi-tenant plan on a web server out there. Today Java is like PostgreSQL of those days. There is no easy way to simply set yourself up to run a small Java application in a shared tenant environment. Even if you did set yourself up chances are that you would be far less than satisfied with the performance in a multi tenant situation even though Java is actually a really really fast language. Java has endeared itself to the corporate environments with their funding for creating large infrastructure stacks and simply hasn’t offered enough opportunities for small enterprises / individuals to create and host their web applications on a shared infrastructure. This takes away an entire community of supporters who when they grew up would’ve carried java along into the larger of their applications.

Issue: Compile cycle

The second issue is the nastily long compile-build cycle. This leads to a scenario where developers need to twiddle their thumbs for half their development times waiting for ant / maven / javac to do their work. It destroys their rhythm and hurts their productivity. The traditional argument of the speed / scale and enterprise-ness of Java over the nimble languages just seems lesser and lesser relevant as these languages start having a nice stack of their own enterprise frameworks and hardware developments whittle away the performance disadvantage. When someone very recently asked me why I was less likely to choose Java for my next project I just said Java is not agile enough. The agility of the nimble languages (no compile-build cycle) coupled with their having adequate performance profiles do lead Java to become lesser capable of competing in the space. By the way one of the real strengths of eclipse is its incremental compilation, but I cannot unfortunately use it with making code changes remotely the way I would be able to change say PHP.

So how can java compete ?

Assuming that java needs to compete with these languages on a better footing, some changes will be required (and some of them quite painful). These will be :

Make java easy to host on multi-tenant application servers

This would definitely require some changes to JVM to reduce the startup / shutdown overheads to make each processes really lightweight. It would also require some attention to how much memory gets utilised. Scenarios such as those I have used in the past to really make applications run faster by caching big time in the RAM are a no-no for multi-tenant hosting. We will thus need a version of JEE which loses the ApplicationContext. We will also need to be able to creatively work with other pools such as connection pools. Finally JEE application servers may need to be restructured to support rapidly dying processes (ie. process per web request). I am no JVM writer so wouldn’t know if some changes might be required to the language but cant think of any major changes that might be required.

Lose the compile cycle

Why can’t “javac” be conditionally executed implicitly at the beginning of “java” process. JSPs generate java on the fly. Similarly python creates the .pyc compiled files on the fly. If the deployment can be done using .java files instead of .class, it would eliminate the compile cycle and allow a developer to change the code in one window, save it, do an alt-tab and go press the reload button on the browser. Compared to all the attention that is being paid to language features like closures and duck typing, I think this is a really really big deal.

How will this help ?

These steps are not targeted towards attracting the developers working in building the corporate enterprise applications (though I am certainly they will break out into three cheers if the compile cycle was done away with - especially in an optional way - ie. the production deployment would still require a war consisting of .class and not .java files). More and more development going forward is going to be characterised by a larger number of smaller applications rather than the old 60s-70s days of small number of large applications. Average applications will get smaller and these will communicate with each other more frequently using web service (REST?) calls. The trend will definitely be towards more in-place remote application reuse by remote invocation rather than through using class libraries. The nimble languages are well positioned to compete in this space. Java isn’t. This will help Java attract and retain developers in a space where currently it is only likely to lose them increasingly. Moreover as application hosting infrastructure starts getting more outsourced and cloud computing gets more prevalent (eg. Amazon Web Services / Google App Engine) Java can at least compete in that space which is otherwise likely to be locked out for it.

In defense of passionate programming

Posted on April 4th, 2008 in software | 3 Comments »

Mark Dennehy blogs about passion in the context of programming in The case against passion. Took me a while to figure out what he wanted to get at but I think he’s completely mixed up two orthogonal aspects.

Programming is not all about passion.
Passion is the antithesis of good programming

A yes to the first statement. A complete strong no to the second.

The post goes on to talk about why “process” and “professionalism” (and not so much “passion”) are important for programming. I believe the process and professionalism as described in the post are the limbs of the runner whereas passion is the energy. While it would be exceedingly tough to run a race with weak limbs, in an otherwise similarly matched field, the runner likely to come first is likely to be the one who badly wants to get there first (which gives him the energy burst when he really needs it the most). Where I think the post gets it wrong is to set off the limbs vs. energy debate (as in the analogy I described) in a way to where one needs to decide which is more important than the other.

As software engineers we need to understand and implement the practices of good engineering and exercise our mental talents in terms of how we do so and how we continuously improve doing so. In conducting these activities we need drivers. The three most common drivers I have seen are fear, greed and passion. While fear may not be a widely prevalent driver in this context, greed and passion (either or in rare circumstances both) can provide a tremendous boost to the results.

I am not recommending that one goes out and starts hiring passionate but otherwise less competent people. Under typical circumstances competence, engineering, professionalism are extremely important. However what I have seen is this - When the chips are down and the decks are stacked - you need energy - enormous energy and thats when it helps to have a few passionate people in the team (and if you yourself can be one of them, thats even better).

So here’s what I would offer : Passion is not an antithesis of good programming. Its the fuel for good programming.

There is one point made in the post that I liked. That at times it is important to stay dispassionate even while staying motivated.

Software Engineering Code of Ethics and Professional Practice

Posted on March 20th, 2008 in software | No Comments »

It has not been rare to come across situations where loyalties to one’s profession and to the organisation one is working for may end up “seeming” at odds. On rare occasions, I have run into advice or opinions which sometimes were either questionable at best or unethical at worst. While I used my common sense, what was missing for me was a clear and present reference which helps evaluate a particular situation from professional ethics perspective. Came across a document which can help act as such a reference for the software engineering community and thought it was just too good to miss blogging about.

Here’s the document : Software Engineering Code of Ethics and Professional Practice.

In the sometimes cynical and less than innocent world we find ourselves in - this is refreshing content. Perhaps good material for people to read when they enter the workforce and review every time they take on a new assignment or maybe once every six months. Wonder how much is it subject to moral relativism.

A rejoinder to the alternative SafeHashMap

Posted on February 20th, 2008 in java | 1 Comment »

Eugene responded to my earlier post An even more capable SafeHashMap with It is safer not to invent safe hash map / Java. While he does make some valid points I did have issues with some others, the issues being Signature Instability, Compilation Troubles, Principle of Least Surprise and Style.

Read the rest of this entry »

An even more capable SafeHashMap

Posted on February 15th, 2008 in java, software | 1 Comment »

Eric Redmond in his post A Safe HashMap for Java describes a SafeHashMap

Here’s a suggestion to extend the capabilities of the same. Lets look at the code.

Define an interface to create an instance.

First, I create a new interface (We’ll get to know why very soon).

public interface InstanceProvider<K,V>
{
	public V createNew(K k);
}

The SafeMap class itself

Here’s the proposed class. The main distinction is that instead of caching an instance and using the clone method to clone, this implementation stores away a reference to the instance provider and triggers it when required.

public class SafeMap<K, V> extends HashMap<K, V>
{
	// Here's where we cache away the provider
	private InstanceProvider<K,V> provider;

	// Note that the provider is now 
	// passed to all the constructors
	
	public SafeMap(
			int initialCapacity, 
			float loadFactor, 
			InstanceProvider<K,V> provider)
	{
		super(initialCapacity,loadFactor);
		this.provider = provider;
	}

	public SafeMap(
			int initialCapacity, 
			InstanceProvider<K,V> provider)
	{
		this.provider = provider;
	}

	public SafeMap(
			InstanceProvider<K,V> provider)
	{
		this.provider = provider;
	}

	public SafeMap(
			Map<? extends K, ? extends V> m, 
			InstanceProvider<K,V> provider)
	{
		super(m);
		this.provider = provider;
	}

	@Override
	@SuppressWarnings("unchecked")
	public V get(Object key)
	{
		V value = super.get(key);
		if (value == null)
		{
			// use the provider here
			value = 
				provider.createNew(
						(K) key);
		}
		return value;
	}	
}

Test Case demonstrating usage.

public class TestSafeMap
{
	@Test
	public void testGetObject()
	{
		// Am using an anonymous class here. 
		// If additional parameters are required
		// to be passed to constructor, one could 
		// create an abstract class with the constructor
		// and pass the necessary arguments 
		Map<String, List<String>> myMap = 
			new SafeMap<String, List<String>>(
					new InstanceProvider<String, List<String>>()
					{
						public List<String> createNew(String string)
						{
							List<String> list = new ArrayList<String>();
							list.add(string);
							return list;
						}
					}
		);
		
		String key = "hello world";
		assertEquals(
				"List size should've been one",
				1,
				myMap.get(key).size());
		assertEquals(
				"The only element in the list should've been : " + key,
				key,
				myMap.get(key).get(0));
	}
	
	@Test
	public void testArrayInstantiation()
	{
		Map<String, String[]> myMap = 
			new SafeMap<String, String[]>(
					new InstanceProvider<String, String[]>()
					{
						public String[] createNew(
								String string)
						{
							return new String[] {string};
						}
					}
			);
			
		String key = "hello world";
		assertEquals(
				"List size should've been one",
				1,
				myMap.get(key).length);
		assertEquals(
				"The only element in the list should've been : " + key,
				key,
				myMap.get(key)[0]);
	}
}

This way I get a finer level of control on the instantiation of the default instance. There are multiple reasons why one might want that such as :

  • The default instance needs to be configured in some way depending upon the key value (shown in the example above)
  • The default instance needs to be configured based on some constructor parameters passed to it. In this case create an abstract class which implements the interface, declare a constructor with the necessary arguments, use the abstract class during instantiation, and allow the createNew method to behave appropriately based on the values of the arguments.
  • There are some situations such as where the type above is a String[] where the clone method does not work (as in the second test case above). Perhaps the code to conduct the cloning could be modified above to create an array, but I am not too sure (since I’ve often faced difficulties working with instantiation of array types when using generics).

A Developer’s Comparison of Open Social and Facebook platforms

Posted on February 15th, 2008 in software, web | No Comments »

Acknowledgement

There are a couple of helpful posts that set me off on the path to blog this so here’s the acknowledgement :

Introduction

I have been spending some time on the Facebook and OpenSocial platforms, and while these both attempt to solve the same need, they end up doing so quite differently. For anyone interested in web based software design, the differences are both curious and interesting, and serve an interesting contrast of how different means can get used to attempt to reach similar ends.

The Facebook platform

The facebook platform primarily allows the hosted applications to access facebook data through a variety of APIs. The hosted application can primarily draw itself onto a “Canvas” which is the primary channel of interaction, and can also render its own part of the profile, and plug into the messaging infrastructure (news feeds, mails etc.). When a user is working with the canvas, the canvas occupies a large share of the screen and thus dominates the interaction with the user.

The OpenSocial platform

The Open Social platform does not really require the applications to be hosted (they can be purely client side applications using HTML, XML and Javascript). The application presents itself as a gadget and can get information about the network graphs fed into it at runtime. When I refer to a client side application I shall be referring to the gadgets with type=html whereas when I refer to an application as a server side application, I am referring to those applications which are hosted with type=url.

Now for some of the important differences.

1. Application Structure

OpenSocial applications can be both client side and server side. Most of the specification and almost all the examples are those of client side applications. These typically require a high competence in HTML, XML and especially in Javascript. The applications can also be server side but this does not seem to be a predominant mechanism of application serving as yet.

Facebook applications are all hosted on a web site. These typically require a reasonable amount of server side development using a variety of languages. PHP and Java client libraries are provided and supported by Facebook. However client libraries in other languages are also available.

2. Application predominance

In facebook one application is predominant at a time since the canvas it is provided occupies a majority of the rendered screen space. It is not yet clear how most OpenSocial applications will get presented, though it is quite likely that many of these will coexist on one screen with different independent spaces assigned to them

3. Control

In facebook, the facebook site itself acts as a reverse proxy between the browser and the application with some substantial mod_rewrite functionality. It thus intercepts the traffic and actually changes the data stream on the fly (eg. by changing all the javascript variable names). The facebook site is always firmly in control.

In case of OpenSocial, the application is provided a space for its gadget to be hosted, and after that the application is firmly in control ie. the browser communicates directly with the application.

4. Development Skills

Facebook application development skill requirements are likely to be more similar to those required by the conventional server side application development. However the developers will need to learn the facebook platform apis and technologies (eg. rest apis, FBML, FQL etc.).

With OpenSocial if the application is a server side application, the development skill sets are likely to be similar to those of typical server side application development with a lesser requirement to learn newer APIs. However if the application is a client side application and you don’t have serious javascript capabilities, be ready to invest in that quite a bit if you want to create a slick app.

In a scenario where one is using a client side Open Social app, it just seems like the entire controller is getting moved over to the client, and it takes some substantial skills and effort to work through all the interactions in Javascript. I like HTML+Javascript, but dont particularly look forward to writing any more than what is necessary. I generally found that it was relatively faster to work with creating Facebook apps especially in scenarios where a reasonable amount of server side processing was required (FBML is quite helpful).

5. Application or applet

While one can potentially build both applications or applets with both the platforms, seems like the focus of Facebook is to build applications while that of Open Social is to build applets (small focused functionality) which primarily seems to stem from the fact that OpenSocial borrows heavily from the Google Gadget infrastructure which itself was not focused on heavy duty application development.

6. Usage of IFRAMES

OpenSocial is likely to depend a lot more on the usage of IFRAMEs. While older facebook applications used IFRAMEs, many contemporary applications do not. Combined with the fact that due to Facebook acting as a reverse proxy, which ensures that the data stream is coming from a single domain, some of the difficulties associated with IFRAMEs especially for passing information across domains and usage of SSL with IE (where I’ve in the past faced some difficulties) can be avoided easily in Facebook.

7. Security

I am going to go on a limb here and make a conjecture here - Given facebook’s reverse proxy structure and given the fact that its platform had some reasonable time to develop (unlike OpenSocial which just seems like a lets make Google Gadgets into a platform API as fast as we can approach), its “probably” going to be tougher to build more secure applications under OpenSocial

8. Platform data is being pulled or pushed ?

In case of Server side OpenSocial applications, the relationship graph and user preferences are pushed into the application as a part of the URL itself (reminds me a little bit about dependency injections - except that it is being done for data). Additionally the application can communicate with OpenSocial platform through REST based APIs but these specifications are currently still in progress. In case of client side applications, this data is pulled from the OpenSocial platform and mashed up with the application data on the client side.

Facebook also supports the pull and push models but a little differently. While it has a well defined REST interface along with the corresponding language bindings into PHP and java, it allows the application to pull data from facebook as well as push it to facebook (eg. to write to feeds). However given the reverse proxy structure, in a way Facebook is really pulling the entire application content from the application before it mashes it up on its site (eg. processing FBML tags or changing javascript variable names) before sending it to the browser.

9. Standards compliance

Clearly OpenSocial could be “considered” to be more standards compliant. However that is a rather mixed blessing - All the proprietary facebook additions eg. FBML tags actually help reduce a fair amount of development effort with no equivalent capability in OpenSocial yet. There still are many things that Facebook platform supports which are as yet unaddressed by OpenSocial, so I am not so sure if the issue is so important yet. But as OpenSocial starts offering more capabilities in future versions this could start becoming an issue.

Summary

Clearly there exist quite a few dissimilarities between OpenSocial and Facebook platforms. These differences are sufficiently large enough to tempt me to call them apples and oranges (though their end goal is to feed me). It would be wise to avoid the question “which one is better ?” However I can indeed state that as a developer having spent a large time on building server side enterprise applications, facebook just seems to be a lot more enjoyable to work with along with it likely to be more stable and more secure. But remember YMMW (some people like apples more and others oranges).