Author Archives


21
Jan 12

Conference Report – Hasgeek jsFoo Pune 2012

Just finished attending jsFoo Pune 2012 organised by HasGeek. It was an interesting and a well spent day. And I’ve learnt if one wants to blog about a conference, it is best done immediately post the event else much gets lost with a recollection as weak as mine.

The conference was spread over 3 tracks. So the best I can do is talk about the talks I attended.

The first talk I attended was Node.js, HTML5 and Phonegap for high performant content site app by Prasoon Kumar. I learnt some stuff about phonegap and espresso. Good starting point, but hardly got to see any actual code.

Next one was Synchronized models using Backbone, Sockets and Node by Ruben Stolk. He started up a demo and suggested that people in the room connect to it. It basically had a canvas with a image and amongst many things, one could turn the light on the light-stand on and off .. and any actions propagated to the backend and then to all the connected clients. Similarly with post-it kind of notes that one added on the canvas. Was a good example of using backbone for defining models both on the client and the server and using socket.io to stream events between the two. He went through a fair amount of code too and it was very understandable. I came back with a distinct impression that the model maintenance and ability to synchronise the model changes between a client and server and then between the server and all connected clients (thus in effect across all connected clients) was pretty cool. However I still felt there needed a fair degree of wiring up of the event related code, though I am not sure about it and need to experiment with it more.

The next one was How to apply BDD and TDD practices, using Jasmine library? by Anil Tarte. Anil and I have been colleagues in the past and I have a high respect for his skills. In the brief period he was able to convey how to use jasmine to write BDD Test cases for the client side code. His server was over web sockets, and he was essentially intercepting calls into specific methods that eventually communicated with the backend. He also mentioned it would be possible to intercept the over the wire HTTP traffic instead of javascript functions, though it would create more difficulties if one was not able to precisely control the server’s outputs. I quite frankly had not imagined that BDD could be used so effectively and especially localised strictly to the client. So it was an insightful talk, that definitely will have me thinking about javascript BDD the next time. However he did seem rushed and under pressure and perhaps the talk lengths could be extended from the 45 mins to 55 mins or so.

Next was Building real-time web applications … (Introduction to Websockets / Socket.IO) by Aditya Y. He talked a bit about evolution of websockets, various libraries that were developed along the way and the current state of websockets support. There was an interesting demo at the end where changes in one browser session were almost being reflected in real time into another browser session via websockets connections back to the server. One of the important points noted was that Websockets continues to ride over port 80 so can work across many firewalls, and while its initial handshake is HTTP like, the subsequent traffic is essentially TCP like over that HTTP connection.

A thoroughly enjoyable talk was Advanced JavaScript Techniques by Rajasekharan Vengalil. He looked like he had enough stuff to talk about for the whole day, and delved into some of the OO related aspects of Javascript and the additions to ECMAScript 5. He spent some good amount of time in explaining how prototype based languages work. There are just so many intricacies to the language and so many really strange behaviours (like silently ignoring assignments etc.) that if I hated the language I would’ve just continuously muttered WTFs and if I happened to passionately like the language, I just might’ve facepalm’ed my way through the talk. Thankfully my outlook towards javascript is quite neutral, so I just enjoyed it . A lot.

Node.js Patterns and How we build ActiveNode was the next talk by Sreekanth Vadagiri. He talked about some of his experiences with node.js, his preference for using CoffeeScript rather than JS (despite it being harder to debug), many of the patterns he liked, some of the libraries he used and deployment matters. He was just as candid about some of the gotchas as well. A useful insight into the node.js system.

The last one of the day was Amplify your stack by Sunil Pai. He talked about a lot of libraries that could be used for client side development and deployment, about using javascript for templating, ensuring rigorous unit test coverage at all stages. Gave me the feeling there’s a lot I simply do not know about whats happening on the client side JS related libraries and frameworks. Couple of remarks I recollect – “You (the JS developer) own the browser” and “When you have a strong test case coverage, you can CODE BOLDLY”.

Summary: There’s a lot I need to learn about whats happening on the client side. Thats true about the server side as well. And the day helped me understand just a little bit better what I don’t know. Yet I got the distinct feeling of discomfort with node.js. Not with the tool. But with the assumptions that seem to go with its usage. There was poor articulation about what kind of use cases it is good for. Except that it is really good for high thruput/no. of client connections. Either there was a misplaced understanding of it being the only good way to get this kind of thruput or there was inability to clearly articulate what other benefits developers could expect out of using node.js in situations where thruput/concurrent connections is not particularly important for them. Perhaps I was just at the wrong places when someone was offering a more insightful articulation of this. But I really did not hear it.

On the whole, I enjoyed the sessions, and this was a day very well spent. Hats off to Kiran and his team for organising a good event. Makes me look forward to the next one they organise.


11
Jan 12

Scala needs terraces

Terrace (noun) : each of a series of flat areas made on a slope, used for cultivation, or a flight of wide, shallow steps providing standing room for spectators in a stadium

Recently there was a useful discussion triggered off by the post True Scala complexity by Yang Zhang. Much has been debated about it in the comments on the post and other forums, most of it both articulate and constructive.

Martin Odersky proposed an idea on the ycombinator news thread as follows (next paragraph)

So, I believe here is what we need do: Truly advanced, and dangerously powerful, features such as implicit conversions and higher-kinded types will in the future be enabled only under a special compiler flag. The flag will come with documentation that if you enable it, you take the responsibility. Even with the flag disabled, Scala will be a more powerful language than any of the alternatives I can think of. And enabling the flag to do advanced stuff is in any case much easier than hacking your own compiler.

The basis of the idea wasn’t new. Martin himself had earlier suggested Scala Levels, though the suggested enforcement of two levels via a compiler flag was. While I thought it was a useful idea, strangely enough I found many (especially in the twitter streams I follow) less than enthusiastic. I think there exists a perspective where one can look at this idea with more enthusiasm, and this post details that.

Learning scala is non trivial. For a person who is well versed with Java programming and with little else, this will require learning at least the following (I am listing only a few items)

  • Learning traits and objects (and unlearning statics). There’s a lot of stuff here including multiple inheritance, member overriding, whether the members should be defs or vals etc. which takes some times to get used to.
  • Learning to deal with immutability and occasionally lazy evaluation
  • Learning collections constructs such as for comprehensions or, map, flatMap, filter etc.
  • Understanding the capabilities that scala offers in terms of generics including being able to define co/contravariant collections, and then learning to use these capabilities effectively
  • Being able to understand and leverage many aspects of type theory

Typically when a person makes a sustained effort driven by passion, he can cover a lot of ground very quickly. Thus some might be able to deal with the scala learning curve quickly. While this ability to learn could be achievable for many especially smaller companies, it is likely to be quite difficult for many others. Especially when the team sizes are large and/or developer capabilities might be modest, this is a real issue. One could form an analogy with a Cheetah and an Elephant. Some animals will be able to run fast individually, and others will trod slowly as a herd. One could wish elephants could run as fast as cheetahs but thats unhelpful, since the jungle is defined, and one best deal with it the way one finds it.

An ideal way to introduce scala would be to start with small groups of 3-5. But thats not practical in all cases. Besides for large teams, that means, most will need to wait for a much longer time before being able to use scala and profit from its substantial benefits. So lets say an organisation attempts to introduce scala in a modest team of say 20. Of which 2 are passionate, self driven souls, who tend to work much harder the rest to be able to progress much faster. Because learning to use scala the way scala wished to be used, is not a sprint but is a marathon (or at least a long distance jog), this creates issues. Soon enough the 2 are likely to run far ahead of the rest of the team and will start writing code which the others can’t grok. Even as some of the remaining may actually be struggling hard but are likely to be losing enthusiasm simply because they can’t find themselves being able to keep pace. An even bigger risk is the 2 continuing to make rapid progress even as they realise so much more remains to be learnt, even as the remainder are unable to start leveraging scala, since they still think there’s a lot of distance yet to cover and they become far more focused on catching up rather than leveraging what they have learnt. So while learning proceeds, actual development with direct economic benefit stalls.

Lets say hypothetically this team decided that they would only focus on reducing java boiler plate (which is a big deal) and continue to use scala in a primarily OO paradigm (since that is what they are used to) and defer learning remainder of scala by at least six months. There’s likely tons of benefits to be had by shifting to scala even with this limited scope. Yet all it takes to destroy harmony is one very enthusiastic developer. Who starts inserting (as yet) alien notions such as writing highly functional code, or implementing advanced type theory concepts (eg. scalaz) or starting to use a largely un-understood symbol soup (scalaz or the dispatch periodic table). This would make a mockery of the phased learning and force every one else to catch up (even if just to be able to read and maintain somebody else’s code).

We’ve long known hills are not conducive to agriculture. If these hills are our learning curves, fields could be considered to be directly impactful deployment of our skills. Agriculture generally happens flatlands and plateaus but not hillslopes. And yet we’ve learnt to tame the tall hills using terraces. Instead of climbing for a long time, climb a little & grow a little. If teams can learn for a month, and deploy these skills for the next 6-9 months with the cycle repeating itself, it could help in the following ways :

  • Organisations do not need to invest in long training times where the opportunity cost of lost development time is one large fixed cost.
  • Intra team disparity is likely to be contained since the deployment period will allow many to catch up with those in the lead
  • Instead of scala being viewed as one big leap, it could be viewed as a incremental leaps with managed effort. I saw a remark which said people anyways deal with complexity – its called OO. And Java programmers deal with OO. So they should be able to come to terms with scala quickly. But there is a difference. Java wasn’t as big a learning curve leap over C++ as Scala is over Java. And as importantly, a learning curve one has triumphed over already, somehow seems far less onerous than an identically sized one, one still needs to deal with.

In short learning scala needs terraces. Terraces, where some developers will not run far ahead of the pack during the learning stages. Terraces where developers won’t have to feel too scared of alien’ish code suddenly appearing on their monitors. Terraces where developers will learn a few things and then deploy these skills over the next few months. These terraces could be made to work in different ways. It could just be a honour system using a set of levels identical or similar to the levels Martin Odersky proposed (though it would be extremely hard to ensure the same without tooling support). Or they could implemented using (as yet unwritten) tool like a scala-lint which will flag off advanced usages as warnings. Or as compiler switches the way Martin proposed.

There could be other ways to look at it too. Scala could be considered as a sharp weapon. Lethal in the hands of the trained. Self damaging in the hands of the untrained. So compiler switches could simply be a way to graduate from using sticks, to wooden swords, to blunt steel swords to the really sharp ones.

I saw a remark which said a compiler switch would divide the programmers. I seem to think such steps which help create terraces will divide the learning challenges. I also saw another remark which said, implementing compiler switches would play into the hands of languages like Kotlin or Ceylon or Fantom. I believe exactly to the contrary. The lower terraces can compete with Kotlin or Ceylon, and the upper terraces can provide a growth path that many other languages will not have. Not being able to plan a phased and an economically practical learning curve for large teams will make the case for other languages stronger.


12
Oct 11

Which risk would you manage? What would you want to prove? Programming Languages and Type Systems

Debates across programming languages and type systems are not new. And this post does not attempt to shed new light on these (though it is hardly an un-opinionated view)

Yet one point that keeps on bothering me time and again. That the lens used to visualise the many issues around these help clarify, magnify but lose the bigger picture. This post suggests a broader view to be applied.

Every argument refers to either improved benefits, lower costs or better contained risks. And in fact much of the arguments focusing proving the correctness of the code, or compiler ensured verification and thus the resultant guarantees, or the extent of unit test required to cover a wide array of differently typed inputs all boil down to risk.

The risk of code not behaving precisely like the way the author intended it to behave.

Yet there are other risks.

* The risk of spending too much time building something you eventually realise is either not required.
* The risk of not being able to provide rapid feedback to the product team, OR The risk of the customer not being able to partake and participate in your newest features to provide you critical inputs
* The risk of building a fantastic software, but a software that in hindsight needed to be very different.

When rewriting a legacy system, you often know exactly how you would like things to be rebuilt. When having adequate cash flows from alternate revenue streams, which can absorb product development costs, the life of your company may not depend upon the particular development in progress. When a part of a very large enterprise where anyways tons of internecine political empires rule the roost, a particular product / project development may hardly influence overall corporate risk.

In each of these situations – the risk of the software being developed is largely the same as the risk of the software getting delivered. Thats when risk is measured in terms of the software not meeting its feature, performance or delivery schedule goals. Thats when risk management becomes very important. Thats when provability, robustness etc. become important metrics for predictively containing risk.

Yet, not all environments have that luxury. Startups especially. Those that are dependent strongly on a piece of software – bet the farm on that software. Many actually have only a blur of an idea – but require tremendous market feedback to refine the vision. Yet others are operating on fixed constraints in terms of budgets before they decide whether the investments they made are worthy of further investments or whether to cut the losses. The risks here are very different.

In such situations – the risk to manage is not the risk of software correctly meeting its specifications. Its the risk of the software specifications not correctly meeting the market expectations. The real risk here is agility. Can you create the minimal functionality needed for feedback in the quickest manner. Can you act on the received feedback to adapt in the most competitive manner. In short can you iterate. Again. And again. And again. Fast enough before the market throws you out, or your competitor eats your lunch or your budget runs out. Work backwards from these goals. And you’ll find the right type system. And programming language. For you.

Programming languages and type systems are but vehicles to goals. Work backwards from your customer satisfaction perspective, your particular organisational constraints (or lack of them), your imperatives to iterate, and your time to market pressures. Work backwards from these to understand what exactly your company’s life depends upon. And of course factor in your current strengths (languages known / experience levels etc.). The right type systems and programming language should be relatively easy to decide.

And yes, provability is important. So long as you know exactly what you wish to prove first. The software or your business model.

Oh, lest you feel confused which type systems I prefer, I am ambivalent. They exist because they all have a role to play. But I do believe each context has one right choice. Just make the right choice for your context. And yes the programming languages I program in currently are Python and Scala. And yes, I have found in my experience dynamically typed languages help very substantially when fast TTM or iteratibility is needed (a view not everyone shares)


15
Aug 11

Contrasting Performance : Languages, styles and VMs – Java, Scala, Python, Erlang, Clojure, Ruby, Groovy, Javascript

Major update

This blog post is now formally retracted. A part of the original post remains. As does a record of the contributors and a log of the annotations of the updates. The code also remains on github. What are deleted are the actual published results, and some other sections that are now less relevant after this retraction. Added is the narrative about the reason for retraction.

The way this post started was a casual exercise in measuring performance of my earlier python code in pypy. I found the results interesting, so soon tried the same in Erlang. Having found the same to be interesting as well, very soon I found myself adding more languages and different coding styles resulting into a set of exercises which caused me to think (naively in hindsight) – hmmm, these results do tell me something. And probably the learnings are useful enough to share. It disappoints me to not be able to continue to share the same. Yet thats what I am just doing.

Why ?

  • Insufficient detailing of constraints : As Isaac points out in the comments here, I had defined the set of constraints around the coding styles a little loosely. Once I had opened up the results and opened up the entire topic for a number of submissions – these started becoming an issue. As an example the early versions of the benchmarks used lists – yet some of the languages have have no exclusive lists, so one uses array like structures as well. These started a set of suggestions that one should use arrays instead of lists in other languages as well. That was just a starting point being cited as an example of how a lack of clear constraints started influencing the code. It also points out to the difference in rigour required between conducting exercises for self understanding and publishing the same.
  • Inability to keep code in other languages updated : It was rather painfully obvious to me that as some of the contributions were being implemented, the same were also candidates for leveraging in other languages. Yet the lack of time prevented me from being able to do so. Thus even as I was adding contributions to remain fair to each implementation, I was being unfair by not applying the underlying efficiencies leveraged by these contributions to other languages simultaneously.
  • Running out of time : I am likely to be extremely busy over the coming month. Yet there still were a number of contributions still coming in. Given the priorities I need to work upon, this blog post was not amongst the more important ones. Yet I imagined, if new submissions came in, I simply would no longer have time to deal with the same. Again resulting in a degree of unfairness that would not be acceptable to me.
  • Letter or Spirit : Performance Benchmarks need to be written to reflect the letter and the spirit of the language. I would’ve preferred to measure the performance implications using the features as advertised. There’s no point advertising a luxury car packed with a ton of features and a cost, and then simultaneously publishing speed results using a rally car version – which has a lot of the weight thrown out (and would actually cost a lot more than the commercial car to maintain). (The analogy fails partially, since in programming we can combine the two based on varying contexts). The judgement of the right spirit is extremely subjective. A decision I did not find myself (and perhaps most others given the subjectivity) capable of making fairly. And yet keeping benchmarks focused on the letter without the spirit was rather uninteresting for me (YMMV).

I find myself a little wiser and a lot humbled. The learnings above are unlikely to be forgotten. I also find myself much more aware of the performance implications of code thats consistent with my subjective understanding of idiomatic. The original reason I started this exercise has satisfied its objective in terms of an improved personal understanding, perhaps even more so because of a number of contributions I received. Thats a useful learning as well.

The results that were published earlier on this page need to go. The code continues to remain on github for your perusal and further tweaking based on your definition of letter and spirit.

Sincere thanks to all who contributed. Particularly to Isaac. I learnt a lot from him. Further activity on this post shall cease.

Parts of the Original Post

There’s a better place to specifically look at performance comparisons across languages than this post – The computer languages benchmarks game. But this post attempts look at performance comparisons a little differently. Based on coding idioms as well. And for a much narrower range of problems (namely one).

There are languages which are tightly opinionated on a particular way of doing things. And there are languages which allow you to implement a given logic in multiple ways. Yet, depending upon the language (and as we shall see, the runtime), the performance could vary quite substantially based on the nature of the code we write. This post attempts to take a small piece of logic, and implements in upto 3 different styles in 8 languages (10 if you count the runtime variations as well).

[..]

Problem

Quoting from The Josephus Problem,

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.

In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor. In the code I benchmarked, n = 40 and k = 3.

Update. Note: Some are getting confused that I start by striking out the very first soldier in the chain and then starting to count up to the k value. This is one of the variations of the Josephus problem I had introduced this time. All the versions implement this logic consistently.

Idioms

I have considered three idioms :

  • Object Oriented : This code has classes reflecting a person (or a soldier) and the chain. The objects of person maintain reference to their prior and next people in the cirlce (a doubly linked list, and as the counting progresses, whenever they need to eliminate themselves, they do so by updating the next / prev references in the prev / next objects. This style results perhaps in the least operations involving mutation or memory allocation / deallocation. One would’ve imagined it to be the fastest, but as you will see that is not necessarily true.
  • List reduction :This code starts with a list of integers, each element representing a soldier. It performs an operation which effectively creates a subset of the list by removing every third soldier. The result of one such pass is a smaller list. Rinse and repeat if the smaller list is more than 1 element long. It emphasises looping over lists (using comprehension or other constructs) and focuses on reducing the list by conducting an operation on the entire list, every pass.
  • Element recursion :This is a more fine grained logic which emphasises recursion (and often accumulation) for every element in the list. This is particularly apt scenario to use pattern matching (both the erlang and scala code use pattern matching). One would imagine this to be always slower than list reduction since it is much more fine grained and involves many more function calls.

I’ve attempted to implement code in all languages using the styles above as long as reasonably feasible and appropriate. Since (barring C/C++), Java continues to be the language to beat from a performance perspective, I’ve attempted to implement roughly equivalent logic in all styles using Java as well. All programs typically run the code once to print the results (to verify correctness), and then 100000 or a million iterations to warmup, and then again repeat the iterations and measuring the elapsed time. There is a slight inconsistency between the various code snippets. The counter either varies between 0 to 39 or between 1 to 40.

Contributions

I can’t write the fastest possible code across all these languages. This is the best I could do. However if you can find a better way to implement the code, do let me know in the comments (or send me a pull request on github). I shall certainly include better solutions here if and as they are identified. At the point in time of publishing this, at least two authors had contributed to the code. I imagine (based on my experience with the prior post), more might be interested in suggesting tweaks to further improve performance. These are all listed here.

  • Paddy3118 had suggested some python code in the comments in last blog post, which I have substantially reused for the python list-reduction logic
  • Rahul Göma Phuloré (missingfaktor) contributed substantial improvments to the scala code
  • Viktor Klang contributed a improved version for the scala element recursion code
  • David Nolen (swannodette) contributed a substantially improved version for clojure element recursion, and the java like versions for clojure
  • Fred Hebert suggested native compilation by adding “compile(native).” and a couple of other minor improvements over github
  • Isaac Guoy offered an improved Java, Python and Javascript versions and code for an alternative oo + element-recursive style. The alternate style code is to be found in the contrib directory.
  • Alex Tkachman offered code for use with Groovy++
  • Stuart Halloway submitted clojure element-recursion implementation

Hardware / Software

Removed

Metrics :

Removed

Observations : (Updated)

Removed.

Full Source code is available on github at https://github.com/dnene/josephus

Finally, thanks to a number of folks I had a chance to preview the post with and especially to Saager Mhatre to suggest moving the code from a attached zip file to github.

Updates

  • Updated metrics for groovy 1.8.1 (instead of earlier groovy 1.7)
  • Updated code to reflect suggestions by Eric Rozendaal and another almost similar one by Viktor Klang – Viktor’s code was very marginally faster. Leads to a reduction in Scala Element Recursive benchmark
  • Updated clojure element recursion code as per suggestion by David Nolen.
  • Thanks to the persistent questioning by Isaac, upgraded the metrics to jRuby 1.6.3. That turned out to be a very good step. There is a substantial improvements in the performance metrics which are now updated in the numbers above.
  • Fred Hebert submitted a pull request to turn on native compilation which required native compilation – which in turn required HiPE which Isaac had suggested earlier. After verifying that Erlang-HiPE is a valid synaptic target (thus a different readily available VM), I built the same and updated the readings
  • Isaac Gouy offered some helpful suggestions in terms of converting the main block also into a function. Also he demonstrated some potential issues in terms of whether the resulting performance was stable. I have made across the board changes now to run all the benchmarks ten times each for a million iterations and used the last 5 readings after visually ensuring that the readings did not vary much
  • Isaac further suggested improvements to the Java List Reduction and Element Recursion techniques which have now been incorporated. He has also contributed a perhaps faster version of OO code, which is less consistent with the other OO code being benchmarked. Need to identify how best to factor that in. Perhaps a yet other contrib section in the source?
  • Added newer versions of Java and Javascript contributions from Isaac, and Clojure contributions from David Nolen. I have only recently seen some more alternative implementations for clojure, and have received a Groovy++ contribution .. both I’ll explore over the weekend
  • Updated Pypy results to Pypy 1.6 now that it has beeen released.
  • Added contribution by Stuart Halloway for clojure using LinkedList for element recursion
  • Added code contributed by Alex Tkatchman for Groovy++. This I would remark has exceedingly good performance. You can find it in the contrib section.
  • Added further code contributions by Isaac for Python.

26
Jul 11

Why you should register to attend Python Conference Pune (Sept 2011) right now

This is a guest / cross post from original one as appeared on PuneTech written by Navin Kabra. Thank you Navin, for the permission to reproduce the same.

Disclaimer : Both I and Navin are on the organising team of PyCon India 2011. However we act in a volunteer capacity to help further encourage python and software development activities. We do not gain financially from this association. PuneTech is a blog specifically focused on encouraging technology developments in Pune.

PyCon India, the International Python Conference that happens in India every year, will be in Pune this year on September 16-18, 2011. Early Bird Registration – Rs. 300 (includes lunch, 3 days) is open until the end of the week. Register now! If you need convincing as to why you should attend Pycon, here are some reasons:

  • Raymond Hettinger, one of the top pythonistas in the world is the keynote speaker. Raymond (@raymondh on twitter) is a Python core developer. He is the author of the itertools and set modules and most of the collections modules in the standard library, the peephole optimizer for Python, and dozens of ASPN cookbook recipes. It will literally be many years before you get a chance to hear a technologist of this calibre.
  • Learn Python: This is your chance to learn Python. Start learning Python right now, and by September, you’ll be ready to get maximum value out of the tutorials in the conference (including Twitter/Facebook/Linked-in/Google Data hacking, web scraping, image processing, and functional programming using Python). If you need arguments on why everybody must learn python check here, here and here.
  • Excellent Talks: There are 24 high quality talks, on all kinds of interesting topics including Data Analysis and Business Intelligence, Python-to-Javascript cross-compliation, Telephony apps, Robotics, Web Apps, Python in Biology and Life-Sciences, Cloud Computing, Android, testing, GIS, and much more. There is also one talk on using Python to do your homework.
  • Meet Smart People: Even if you don’t agree that people who choose to work with Python are smarter than most others, you will have to agree that this will be one pretty darn interesting bunch of 500+ developers from all over India and outside. Rs. 300 to get a chance for that kind of networking is nothing.
  • Hire Smart People: If you are having trouble hiring top quality technology talent for your company, you definitely need to be at PyCon, handing out your card, and telling everybody what a cool company you work for. Far better use of your time than going through resumes sent to you by your recruiter.
  • Just Rs. 300: Early Bird Registration closes on 1st August, so act now. That’s only Rs. 300 for a high quality conference and it includes lunch and snacks for the 3 days of the conference. That’s right, you’ll be paying less than the cost of the food! And, unlike the other, regular tech events that happen in Pune, this is not a cheapo event – there will be swag – T-shirts and other stuff being given away. Did you realize that PyCon sponsors are paying for the privilege of giving you free stuff?
  • Make PyCon Pune the biggest PyCon: Pune now has a reputation to keep up – whenever any tech event that happens in different cities, invariably, the biggest turn-out is for the Pune instance. PHPCamp with 1000+ registrations and 700+ actual attendance is probably the biggest ever tech unconference/barcamp style event in the country. DocType HTML5 in Pune had far more registrations than other places and the organizers had to close registrations. Recently GizmoMeet had their biggest turnout in Pune. The Python community in Pune is far younger than the Python community in Bangalore, so it will be tough for Pune PyCon to beat the Bangalore PyCon, but we definitely need to give them a at least a tough fight.

What are you waiting for? Register now

(We’d like to mention here that amongst the various sponsors of PyCon (including Google and GitHub), are these cool Pune companies/institutions: Venue sponsor: Symbiosis, Gold: Vayana, Silver: Druva and GSLab)


10
Jul 11

Google Plus : Getting close to the sweet spot by getting the basics right

My first reaction to google plus was Nice. But facebook has a lockin on the friends and people will not shift until their friends shift which will pretty much mean most will play around with google and then go back to facebook since thats where their friends are.

As I played with it a little more, I realised the Circles were a little confusing. It took me a while to realise who exactly got to see my posts when I posted them to my extended circle. While they allowed you a more natural way to constrain publishing to a private subgroup of your network, their organisation wasn’t exactly simple or natural (though I’m sure Google worked very hard even to get circle management to its current shape). Despite its bold move on circles that has been well received. I think circles will need to be better managed. It will perhaps take some time for the feedback from the community to find its way back into superior circle management.

So while I was willing to give it more credit than google wave, it didn’t quite blow me off my feet. And I imagined facebook to continue to be the unchallenged king of social networking.

As I used it over the week, I felt my perceptions changing. Not in a manner that I could feel the change strongly, but slowly and quietly. And then it struck me. The gauntlet had indeed been thrown quite credibly. And the king is now being challenged.

Here’s why. At a very fundamental level, google is changing the game. And it is not wanting to be the king of the social networks as understood today. It wants to be the be king of social networks that will be.

The Public / Private Asymmetric Network :

Facebook is a private network. Sure you can make your facebook status update completely visible for everyone to see, but thats really an atypical usecase. Facebook allows me to bring online the friends I made in the past. Friends I now therefore trust. Friends I can make remarks to that I wouldn’t want to make publicly. And barring a few exceptions (eg. when someone ranted about their job only to find their boss reading the rant), the model held. Privacy is important, critical and assumed. Any real or perceived violations of privacy are met with an uproar. What facebook does not allow me to do (at least based on the easily understood controls available to me) is create further subgroups within my network (it actually does that – but I rarely find myself using these features and doubt if many others use them). It takes my “offline photosharing” and makes it “online”. It takes my “offline gossip” and makes it “online”. Facebook “takes my offline network” and “makes it online”.

Twitter on the other hand is a public network. There should be a tagline which says “Everything you say, can and do will be used against you in a court of law, in your classroom, in your job, in your sports team and even within your friends and family”. But its not really required, and no one misses it, since most Twitter users very quickly understand it. With twitter, I can eavesdrop on conversations between two other twitter users, and neither of them are likely to object nor am I likely to feel guilty – since that is consistent with the conventional protocol. Twitter is a public network. It is perfectly normal for strangers to start engaging in a conversation. It therefore has this end result called “serendipitous discovery” of both information and relationships. Most of the people I interact with on twitter are people I met online. And later on I feel nice when I end up meeting them in real life. Twitter is all about “discovering an online network” and then when feasible “bringing part of the online network offline”.

Twitter is also an asymmetric network. While facebook carries friendships online, twitter enabled followings in an asymmetric fashion. Which allows me to fine tune the messages that I choose to listen to. Thus I am no longer required to read the status updates of those who wish to read mine. (Facebook allows this too via hiding- but again thats probably not a very typical behaviour). Offline conversations are constrained by space and time – the people who are conversing need to be at the same place at the same time. Thus they are often necessarily symmetrical (An example where that is not true is when a charismatic leader is say addressing a large crowd – the leader hardly knows each person in the crowd, though they all know him a lot better). Online networking removes the constraints of space and time. Thus asymmetric relationships are much easier to support.

While most understood facebook very quickly, for a lot of people the response to Twitter is “I don’t get it”. And part of the reason is that twitter does not convert an offline network to online network, but is really pushing at the boundaries of what an online network can be. While I value the privacy of my data on facebook, I actually value the serendipitous discovery and the astonishing learning I can benefit from with twitter so much more, because it has enabled communication and network patterns that were not earlier feasible in offline networks given constraints of space and time. And while the 140 character magic holds, it does become painful when tracking long and convoluted conversations.

Google Plus groks this. It has smartly enabled the capabilities and features of a public and private network. While you can push your posts on the public channel and eavesdrop and participate on conversations others are holding on the public channel, it simultaneously affords you the choice to constrain your messages to the specific group of people that you would choose to constrain them to. It also has the most rudimentary capabilities to start building an interest graph. And it is asymmetric which helps each person maximise the value he gets from the network. Moreover it has features to allow me to implement the appropriate communication patterns with people that I share my childhood photographs with, with those I work with, and those that inspire me. It has simultaneously enabled private gossipy conversation, listening into the people I respect (not befriend), and serendipitous discovery.

Ka Ching!

The unified consolidated channel

Once upon a time, I had a watch, a camera, a phone, an alarm clock, a map, a hi-fi audio system, a radio, a TV / DVD player to play my DVDs. Now I have a smart-phone. And while the really keen will continue to own each or some of the individual components, it is neither a surprise nor a secret, that my smartphone is ever encroaching into their territories. We want these together, and we want them to be easy to carry.

While online services are inherently portable and thus the physical form factor of having to carry them around is not applicable, the integration of various service is an important factor. Google Plus is but one element of a broader channel. Incoming notifications on google plus show up in a small red box at the top right corner of my gmail tab. Now imagine gmail, google docs, google plus, picasa, youtube, google maps, google music, google latitude, google storage all getting (social / interest) network enabled. And imagine them being available in google apps also. And imagine them simultaneously available on desktops, netbooks, notebooks, tablets and mobile phones. Google already has most of these pieces – what it lacked is that these were not “network friendly”. Imagine each of these assets becoming network friendly – I can publish google docs to a specific circle I had defined on google plus. It has been suggested that the google plus we see today is but an early stage view of what is likely to be a continuous rollout of features over a year. Whether you look at facebook, linkedin, slideshare, foursquare, skype, or quora – they are the watch makers, camera makers, radio makers. Google is building the online smartphone. And that will probably be the important advantage google will have to offset facebook’s entrenched user base. That will cause many of your friends on Facebook to also start using Google Plus. Google Plus is google’s one ring to bind them all.

Openness :

Facebook did a phenomenal job in opening up its screen real estate to a number of other applications through its platform API. However it has also attempted to lock-in the users by licensing constraints. (eg. Robert Scoble getting his facebook account disabled). On the other hand google has made a public commitment to keep its data under the users’s control all the time including being able to move it out. As we start utilising the graphs we build across many more applications, and especially the more serious ones than the FarmVilles at Facebook, data openness will continue to become more critical. Unless others play catch up with google, in terms of its commitment to open data – this will start becoming an increasingly stronger factor in google’s favour.

In summary

Google plus is credible execution to fill an important gap in google’s earlier offerings. However it goes well and beyond just private social networking. It is really building public / private, asymmetric networking built using social graphs based on friendships, work relationships, online discoveries and probably soon enough interest graphs as well. It is building the network that will be. While google wants to own the experience, it is liberal enough to publicly commit that the data is owned by the user. Combined with the awesome google portfolio and its evergrowing warchest built out of search advertising revenues – This is the network to beat.


31
May 11

Why Java folks should look forward to Scala

There’s an interesting series of blog posts in progress: Why Java folks should stop looking down on C# : Part 1 and Part 2 (at the point of time of writing this post). It offers an interesting and detailed set of contrasts between Java and C#. It is a detailed analysis and makes for very worthwhile reading. What really intrigued me was this comment :

“We notice that Java developers generally tend to look scornfully at C#, as a copycat created by Microsoft and used by dummies. In theses blog series, I am going to try to sweep this nonsense and show some of the C# goodness.”

At least in my experience, I do not recollect Java developers looking scornfully at C# (notice that I did not mention .NET or Windows or Microsoft – I said C#). But perhaps there are some out there and the blog series does attempt to provide them sufficient evidence to reconsider their opinions. But reading through the posts made me think, there’s at least one good reason Java programmers can choose to celebrate and look forward to rather than feel worried about Java. And all these capabilities are available for the asking on their preferred platform – the JVM and in an interoperable and incrementally migratable way from their current code bases. – Scala.

Now, believe me you, scala capabilities extend far beyond those I describe below. But that further discovery is an adventure which readers are encouraged to conduct later. Also – I am only a little along that path of Scala discovery and still have distances to cover. So I won’t feel surprised if people are able to offer healthier and superior solutions to the ones I describe below. In fact, I would encourage them to do the same. But here’s something for all ye java citizens to feel good about.

Please note that it will be helpful if you review these two blog posts referred to above – since almost every example I refer to below, that I write using Scala, is based on the examples in these posts which are written in Java and C#. Thus, even after reading those posts, it might be useful to keep them open in other tabs. So that should you prefer to do so, you can refer to those Java and C# examples simultaneously as you read the Scala examples.

Unified Type System

Scala has types for the java primitives eg. Byte, Short, Int. Thus you can continue to deal with them as objects instead of having to specifically deal with them as primitives. All value classes inherit from AnyVal, whereas all others inherit from AnyRef, both in turn inheriting from Any. In addition there are a number of additional methods that are available on these types (eg. RichInt). At the same time these types have exactly the same ranges as java primitives. Thus the scala compiler can choose to transform instances of such value types into corresponding Java primitive types.

Farewell Checked Exceptions

Goodbye. Adios. Au revoir, Vale. Checked Exceptions have been bid farewell. And I don’t think anyone’s less happy for it. But if you need to call Scala code from Java, you can choose to use the @throws annotation to mark your methods so that java code may treat these as thrown exceptions.

Double Rainbow Accessors

Another heavy weight boilerplate thats been waived goodbye is java bean style getter setters. Every non private member declared automatically gets a getter and setter free by default. In situations where you would want to override the default behaviour, you can choose to do so as well. Besides a particular construct called a case class further simplifies this and helps you create a class trivially with reasonable implementation of equals, toString etc. already rolled in. It is likely, that new keyboards continue to retain their gloss and continue to function far longer when coding in scala than in Java. A simple example of double rainbow accessors :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
case class Meme (var catchPhrase : String,  var url: String)

// Notice: class members are declared as default constructor parameters. No further
//         declaration required again.
class MemeAdvanced (private [this] var cp: String, private [this] var u: String){
  // Since these are private fields, they can be wrapped using
  // getters / setters as follows which can be modified to suit any non-typical
  // expectations eg. validations
 
  // if the above parameter declarations did not have private qualifier,
  // the methods below would be automatically provided, whereas if declaration contained
  // val instead of var, only the getter would get provided.
  def catchPhrase = cp
  def catchPhrase_=(s: String) {cp = s}
 
  def url = u
  def url_=(s: String) { u = s}
}

object DoubleRainbow {
 
  def main(args : Array[String]) : Unit = {
    // using case classes
    var meme = new Meme("foo","bar")
    meme.catchPhrase = "Rick roll'd"
    meme.url = "http://www.youtube.com/watch?v=EK2tWVj6lXw"
    println(meme.catchPhrase)

    // using normal classes
    var meme2 = new MemeAdvanced("foo","bar")
    meme2.catchPhrase = "Rick roll'd"
    meme2.url = "http://www.youtube.com/watch?v=EK2tWVj6lXw"
    println(meme.catchPhrase)
  }
}

Initialisers

There’s a bunch of new initialisation capabilities eg. initialising using field names

1
2
    var meme = new Meme(catchPhrase="blub", url="blub2")
    println(meme.catchPhrase)

or you can initialise a collection by passing a series of arguments to the constructor

1
    val digits = List[Int](0,1,2,3,4,5,6,7,8,9)

or you could choose to load up a map with a bunch of associations at instantiation time

1
2
    val keywordsMapping = Map[String,String] ( "super" -> "base", "boolean" -> "bool", "import" -> "using" )
    println(digits.head + " " + keywordsMapping("boolean"))

Verbatim (Multiline) Strings

That isn’t too hard with scala too. Just use triple consecutive double quotes delimiter (“”")to make the string span multiple lines.

1
2
3
    val input = """Multiline
                            325-532-4521"
""
    println(input)

Methods as first class citizens

Missing function pointers after you moved away from C / C++? Thats available too .. in a typesafe manner.

1
2
3
4
5
6
7
8
9
10
11
    // declare a function pointer which takes a string as an input
    // and outputs the same
    var normalizeOp = (input: String) => input reverse;
    println(normalizeOp("abcd"))

    // compiler error : normalizeOp = (input: String) => 0;
    // thus the type of normalizeOp can no longer be changed

    // but it can be used to point to a different method instead
    normalizeOp = (input: String) => input trim;
    println(normalizeOp("  foo  bar  "))

Event

Now, scala doesn’t have a built in Event class which has its automatic hardwired publish subscribe capabilities. But one of the themes you will find in this post is that Scala lives up to its name which really refers to it being a scalable language. So a whole set of language constructs can actually be built into the language by writing other code. Thats why sometimes it feels somewhat like a meta-language, a language to write your own language structures in. The construct below is rather simple and straight-forward, but we shall see creating your own language control structure like constructs later as well.

Here’s a simple event class

1
2
3
4
5
6
7
class Event {
  var l = List[() => Unit]()

  def +=(f: () => Unit): Unit = l = f :: l  
 
  def apply() = l map (_())
}

It declares a list of listeners l and allows listening functions to be registered against the event getting triggered (which in this case would be the += method. Finally since the event is triggered using event() construct in C#, we here define the apply function, which will get triggered whenever e() is invoked, e being the instance of any event. Note: I’ve deliberately used single character names for the fields to allow you to focus on the other constructs – not a good coding practice.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Button {
  val action = new Event();
 
  def performClick() = onClick()
 
  private def onClick() = {
    // this should call the apply() method
    action()
  }
}

object EventDriver {
  def performAction() {
    println("Work!")
  }
 
  def main(args : Array[String]) : Unit = {
    val button = new Button()
    // add a method as a listener
    button.action += performAction
    // trigger the event
    button.performClick()    
  }
}

So does one really need to have a Event baked into the language ? Apparently not really when it is a scalable language.

Lambda expressions

Indeed one of the strong points of this language. A simple lambda function is given below

1
2
3
    def isUniverseAnswer = (x: Int) => x == 42
    println(isUniverseAnswer(42))
    println(isUniverseAnswer(43))

There, it isn’t so hard. But what when you want to define a lambda on the fly ? Turns out thats quite straight forward too.

1
2
3
4
5
    val users = List[User](User("Five",5), User("Fifteen",15), User("TwentyFive", 25),
                            User("Ten",10), User("Twenty", 20))
    // Note : the argument passed to the filter() method is the lambda, and the _ refers
    //           to each User object passed in
    println (users filter (_.age < 18))

And if we did want to explore closures, there’s good support there too.

1
2
3
4
    var counter = 0
    val action = () => {counter += 1; println("counter = " + counter)}
    action()
    action()

As Jack Sparrow would’ve said, “savvy?”

Extension methods

The exact details of how some of these capabilities are made to work are beyond the scope of this post, but scala allows for a lot of new extension methods on basic built in types through a variety of additional classes, and the same capabilities can be used by you to define any additional extension methods. As a teaser example, we’ll use the reverse method defined in the WrappedString type which helps extend the same method to be used with strings.

1
println("Example" reverse)

Return Multiple Values

Thats very intuitive and simple too.

1
2
3
4
5
6
7
8
9
object ReturnMultiple {
  // this method returns two values
  def addOneToAll(a: Int, b: Int) = (a + 1, b + 1)

  def main(args : Array[String]) : Unit = {
    val (c, d) = addOneToAll(3,4)
    println(c + "," + d)
  }
}

Null-coalescing operator

To the best of my knowledge scala doesn’t have this operator. The C# code looks like follows which allows the first non-null value amongst a, b, and c to be set as the result.

1
string result = a ?? b ?? c

Well, we’ll build that operator (I’m sure someone will suggest a better option than the one below)

1
2
3
4
5
6
7
8
9
10
11
12
13
class Nullable(val t: AnyRef) {
    def ??(b: AnyRef) = if (this.t != null) this.t else b
}

object NullCoalescing {
  implicit def objToNullable(a: AnyRef) : Nullable = { new Nullable(a) }

  def main(args : Array[String]) : Unit = {
    println("hello" ?? null ?? null)
    println((null: AnyRef) ?? "world" ?? null)
    println((null: AnyRef) ?? null ?? "foobar")
  }
}

If you see the three println statements, they show how the newly defined ?? operator can deliver the same semantics. Since null cannot help the type inferencing engine, in the latter two println statements – it has been explicitly cast to AnyRef.

Automatic Resource Management

Apparently C# has a built in capability of automatic resource management by using the keyword using. Alas Scala doesn’t. But building it is hardly much effort. So here goes :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import java.io.FileInputStream
import java.io.DataInputStream
import java.io.InputStreamReader
import java.io.BufferedReader
object AutomaticResourceManagement {
  // This is a structural type. Any type which has methods which match the two
  // desired signatures specified below can be used wherever this type is expected
  // ** no inheritance required **
  type ReadableAndCloseable = { def close(): Unit ; def readLine() : String}
 
  // And to be able to define our own control structure, the second parameter to
  // using takes a function block which takes a ReadableAndCloseable as input
  // and returns nothing.
  def using(resource: ReadableAndCloseable)(f: ReadableAndCloseable => Unit) {
    try {
        f(resource)
    } finally {
        println("Closing resource")
        resource.close()
    }
  }
 
  def main(args : Array[String]) : Unit = {
    val reader = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream("foo.txt"))))
    // Now you can be sure, the reader will get closed.
    using(reader){
      (resource) => println("First line is: " + resource.readLine())
    }
  }
}

Summary

Well there’s more in Scala. Lots more. And it takes some patient effort to learn it. We’ve hardly started to talk about the functional programming capabilities. Or its parallel collections. Or for that matters its pattern matching. Or even its ability to deal with Generics with specified Co and Contravariance. We haven’t gone that far. But all the seemingly distant capabilities – that seemed to be miles away in a different planet called .NET and a country called C#, are actually just a step away – using Scala. As a java programmer, I don’t think you should look down at C# .. just look forward to Scala :)


22
Apr 11

The cloud just got stronger, even as AWS went down

So some parts of the AWS EC2 specifically related to EBS were non responsive or down yesterday. This was of course also supposed to be judgement day after skynet became self aware. Links between facts and fiction aside, the number of sites that got impacted was just quite large and at least some (not most) started wondering if cloud indeed is the way to go.

I think the cloud just became stronger. Even as many services go impacted, some didn’t. As an example Netflix was able to continue with its services. See Slide 33 in a recently conducted presentation of its architecture. I am certain there were many others who also were able to ride the storm. Since they had not just planned for redundancy of equipments but the architecture had accounted for disaster recovery (redundancy of locations). And many others who got impacted and put their customers through undeserved anguish still perhaps learnt a lot. A lot that they could’ve done differently.

What is interesting about “a lot that they could’ve done differently” is that cloud infrastructure as a service opens up a few more options to ensure high availability. Amazon AWS apparently has 5 regions and the US East region which got affected has at least four availability zones. It was one of that availability zone which got substantially impacted. Though in fairness the impact seems to have spread into other availability zones, something that shouldn’t ideally have happened. I am sure there probably were a lot of things AWS probably learnt and perhaps would do differently. But the same applies now to AWS customers as well. Some may choose to lose faith in the cloud. But many others might choose to realise that cloud infrastructures have the potential to offer so much more redundancy and options to high availability. Its just that one needs to realise that cloud data centers are not infallible, and after being aware of all the redundancy options, one just needs to design the right way to leverage them.

And how does one leverage them ? AWS has multiple availability zones. An application should ideally leverage at least two. If you read the Netflix presentation I referred to, Netflix apparently uses three. Do not assume the servers will not go down. Assume it is possible that at least one availability zone could go down. Make sure you have the systems to quickly activate, systems in the alternative availability zone. For that you will need to find ways to keep data current across availability zones. Also find ways to ensure you have the ability to quickly switch to and fro between availability zones. More advanced options could include concurrently active systems across availability zones or those spread across AWS regions or even between AWS and other vendors.

There’s little if any learnings here that are specific to AWS. They are indeed applicable in general to cloud based infrastructure as a service providers. But unless you are a part of a Fortune 500 or equivalent it is very unlikely that your internal infrastructure will offer as many options for redundancy as at least some of the larger infrastructure as a service providers could. Which is why I believe even if some may choose to switch back to the seeming comfort of private infrastructure, many currently using private infrastructure are likely to look at today’s events and realise that the public cloud actually offers so many better options for building highly available systems. The issues are not yet fully resolved. And many customers still perhaps are not being served. And while one hopes not, it could so happen that there could be further disruptions. But I find them really aspects of dealing with a learning curve as one transitions across class and scale of infrastructures. Ignoring the short term pain, and looking at it a little bit in the longer term, whichever way one really looks at it, I think the cloud just became stronger.