Month: November 2008

A salute and a tribute – Mumbai, November 26, 2008

Posted by – November 29, 2008

Making a one-off exception to the focus of this blog being issues of relevance to Software Engineering.

A salute and a tribute to the Victims of Terror and Defenders of Freedom – Mumbai 26 November 2008. While horribly tragic and irreconcilable, may your sacrifices not go in vain and inspire everyone to value and defend freedom with a greater resolve and commitment than ever before. Farewell !

http://en.wikipedia.org/wiki/26_November_2008_Mumbai_attacks#Casualties

Some information on LinkedIn and LinkedIn Apps (InApps)

Posted by – November 26, 2008

DJ Patil, Chief Scientist & Sr. Director — Product Analytics at LinkedIn made a small presentation during the session on OpenSocial, at the Indicthreads.com Java Technology Conference today. Below are some of the noteworthy points he made regarding LinkedIn.

  • LinkedIn objective is to enable you to control your own brand
  • A strong focus on professional networking
  • Revenue model consists of Subscriptions, Advertising, Job Posting, Enterprise Recruiting and Surveys
  • Quote : “LinkedIn is a great way to find experts”
  • Hardware : 600 machines, cloud graph has 50+ servers
  • Software : Majority Java, also Ruby (did he mention polls ?), Python and R used for analytics, (did he mention PHP as well ?). C++ is used for caching primarily for the relationships
  • Internal Release Frequency : Once every week, every thursday.

In the context of LinkedIn applications (InApps) he mentioned -

  • Privacy is a very big thing at LinkedIn. They do not want to play around with it. 32 million users (1.57 million from India). Average user in LinkedIn makes > USD 90,000 pa. About 100,000 Indian members are at CXO or similar levels.
  • LinkedIn looks towards a clear focus on enhancing the professional value for applications. All applications are professional, preapproved, no noise applications with a clear value model.
  • LinkedIn works with partners to craft each application.
  • Integrating InApps with groups likely to follow in due course.
  • 3 weeks into the launch about 2.3% users visit the canvas pages
  • Currently LinkedIn receives requests for integration with many applications. An application developer would need to convince LinkedIn about the professional nature and the value proposition before getting approval. I think he also mentioned that most requests for such integration do not pan out successfully.

Footnote : Was intending to blog about Day 2 at the Indicthreads Java Technology Conference, but realised much of today’s proceedings didn’t make for good blog content (mostly operational details about various software). Hence decided to write a little bit about one of the more interesting segments of the day’s proceedings.

Multicore for Project Managers and Junior Developers

Posted by – November 26, 2008

What is the challenge posed by increasingly multicore CPUs for developers ? Why is it important ? And most importantly, will it affect you or your development teams ? If you are a non/lesser-technical manager or a developer who isn’t too sure of whats the fuss all about, read on.

Background :

Starting a few years ago, Intel (and other chip manufacturers) realised that they were really pushing the limits of laws of physics as they kept on making their CPUs faster and faster. For decades Moore’s law (the number of transistors on an IC will grow exponentially roughly doubling every two years), ensured that the CPUs kept on getting faster and faster and it seemed like the progress would never end. Many assumed that the law implied that the chip speeds would keep on doubling every two years, quite inaccurately since the law simply declared that the number of transistors on the chip would increase. Soon we got a party pooper – laws of physics. As the electron pathways became narrower and narrower, no longer was it feasible to keep on increasing the speed without substantially increasing the power consumption and therefore the necessity for heat dissipation (My old 3.06 Ghz notebook feels really hot when you put a palm near the fan vents). So quite simply this particular road had come to an end and Intel and AMD and others decided, that the road ahead was now not about increasing the clock speed, but increasing the number of CPUs in the same space. Thus started the movement towards putting multiple CPU cores onto a single die. Now as 2 and 4 core CPUs are commonplace and as people talk about upcoming CPUs with 80 or more cores, its time to reflect on what it will mean for software development and is it the end of free lunch ?

The project management analogy :

To help describe and analyse the situation a little differently, I shall use a perfectly inconceivable analogy which might help a non technical person understand the situation a little bit clearly. Let us assume you are a project manager who one fine morning was fortunate enough to hire a developer called Kal-El (referred to as Kal) from the Krypton school of Computer Science. Kal had one phenomenal talent – every year he would double his productivity (even though his salary got a little lesser with each passing year). Over a period of time you realised that even though Kal couldn’t keep pace with your dreams and wishes he was turning out much more software. You started depending upon Kal’s incredible growth, started taking on bigger and/or more projects, and given Kal’s super efficiency, even as your management started getting sloppy, Kal more than compensated for it and your project deliveries kept on getting faster and faster.

In case it is not already obvious – Kal is the CPU, you the PM are actually the developer, the projects you take on is the business functionality that the CPU is expected to service.

One fine morning Kal’s dad Jor-El knocked on your door and announced that Kal had a built in limitation that he was approaching, and that instead of doubling his productivity every year, he shall start cloning himself once each year (even though they would collectively draw the same salary). Having been used to too much of the good life you immediately exclaimed – “But thats preposterous – One person with twice the standard skill set is far superior to 2 persons with a standard skill set, and many years down the line One person with 64 times the standard skill sets is far far far superior to 64 persons with a standard skill set”. Even as you said this you realised your reason for disappointment and consternation – the collective Kal family was not going to be doing any lesser work than expected but the responsibility of ensuring effective coordination across 64, 128 and 256 Kals now lay upon you the manager, and that you realised was a burden extremely onerous to imagine and even more so to carry. However productive the Kal family was, the weakest link in the productivity was now going to be you the project manager. That in a nutshell is the multicore challenge, and that in a nutshell is the burden that some of your developers shall need to carry in the years to come.

Dimensions of the Multicore challenge

In the remainder of this post I shall talk about the various dimensions which influence the size of the multicore challenge and I shall perhaps switch between the real situation and the analogy I used above with the hope that the implications will still be adequately clear.

While a number of people have talked about the necessity for programmers to quickly upgrade themselves to meet the multicore challenge, the reality is that the size of the challenge will be influenced substantially by the nature of the programs and that for a large fraction of the community, the challenge will in fact be negligible. As increased number of cores becomes a reality (the madness of ‘king cores), a number of diverse opinions are being expressed. These include those who offer useful background material to assist one to gear up to those who believe this to be a non-problem for a certain class of problems. The nature of impact as expected is context specific. And it is important to understand this context of yours if you are to decide on an appopriate response.

Some of the dimensions of this context and its environment are explored below.

1. Number of concurrent, largely independent tasks :

Does your program attempt to work out all the possible chess moves from a given chess board to a depth of 3 moves further, or does it do a stock price look up for 10000 users connected to your application concurrently. The former is a “single massively compute intensive task”, the latter is a “large number of concurrent simple tasks”. From a multicore perspective the former situation is much more problematic since it is difficult to divide one cohesive task across multiple cores rather than distribute a large number of independent tasks across a large number of cores. Going back to our analogy if Kal had been working on one single super complex project, the task of dividing up the activities across his multiple siblings would be very onerous, but if Kal was working on a large number of small projects, it would be very easy to simply distribute the projects across the various Kal’s and the coordination and management effort would be unlikely to increase much.

This by far is the single most determining factor of whether you need to come to terms with multicore. If you are developing simple web applications which perform a large number of short lived activities concurrently, the movement to multicore shouldn’t impact you much. On the other hand even if you are working on web based applications which service a relatively small number of concurrent requests (say <40 for a 80 core scenario) but each such request servicing is a sufficiently complex process you need to think about multicore.

If your program is of the kind which is computing all the possible chess moves over the next 3 moves to find out the best chess move to perform, you will now need to figure out a way to conduct your search across all the possible moves not sequentially but concurrently. Why ? In case you work out each possible move sequentially, you shall be using only one core of your CPU and all or most of the remaining cores shall stay idle and you shall forthwith be indicted for the misdemeanour called CPU underutilisation. So as an example you could split the task into multiple concurrent “threads” each evaluating all possible moves by one of your 16 pieces (assuming you still have all 16 on board), and allocate each thread to one core – thus you should be fine so long as you have a 16 core CPU, and if the cores grow further, you might need to further subdivide your tasks. Now you need to not just plan for the 16 threads to compute possible moves, but in addition or or more coordinator threads which will keep track of the results of their computation and the board positions, and share them with the other threads as necessary without overwriting each others data. Note that while in both the situations (web and chess), multi threading will be used, in case of the chess program you will need to spend far more time either designing or rethinking your design so as to be able to allow the program to function in a multi threaded environment. If you write chess kind of programs, be ready to hunker down to take on the multi core challenge and write multi threaded programs (in all likelihood given the extreme example here, you already have been doing the same for ages).

2. Multi processing or Multi threading ?

Did I mention threads ? Well the way it turns out not everyone likes threads. While Intel is pushing for more multi thread programming (blog post, book), many others consider multi threading to be bordering on the evil (The Problem with Threads, Why you should avoid threads with a passion, and many other such similar opinions). My opinion is that multi threading is probably the most efficient mechanism for being able to leverage multiple cores even though writing good thread safe code can be quite difficult at times. There is another option which though not as efficient can still provide an ability to leverage the various cores – multi processing, ie instead of splitting your task across multiple threads in a single process, split them across multiple processes each process running one thread. Either way you look at it there does not seem to be any other easily available option on the horizon that will leverage multiple cores without using multi-threading and/or multi-processing. While multi processing is inherently easier, moving and synchronising data across processes is much less efficient than across threads. Multi threading leads to a lot of cooperative sharing which requires a high level of discipline, whereas multiprocessing can sometimes result into shared nothing architectures which while wasteful turn out to be quite robust.To restate : Multithreading – efficient but complex, Multiprocessing – lesser efficient, but safer.

3. Interdependence across multiple concurrent tasks or on shared resources :

Now maybe you are having a large number of concurrent tasks which can be easily multi threaded or multi processed. But you still may not be in the clear yet. There might be some interdependencies or such tasks might be dependent upon shared resource which can be accessed only serially (ie. one at a time). You might be serving 10,000 stock quotes, but if you are receiving these stock quotes from an upstream system which will give them to you so long as you make only one request at a time (each batching all the scrips you want the quotes for at that point in time), many of your tasks will wait on the availability of the window for making such a request and in the time that they wait for such a resource the corresponding cores shall remain unutilised. In real life this typically relates to blocking I/O eg. database or network calls, or shared data which may need locking and synchronisation (eg. counter indicating total requests being serviced at the moment). In our analogy it would be a situation where a large number of Kals who would need to meet frequently but be constrained by only one conference room. So even if you don’t need to worry about multi threading your code, you still need to worry about synchronization/locking of your resources. And since resource contention can worsen exponentially with the number of contending participants, you have to imagine that a program you write today may  face 10 times as many concurrent requests as today say three years down the road, and any resource contention in your program will lead to substantially reduced performance and/or a much higher number of aborted transactions due to resource unavailability or deadlocks.

4. Language and / or Operating platform :

Some environments lend themselves to easier multi threading / processing and some make it tough. Some may not support multi threading at all. So this will constrain some of your choices and the decisions you make. While Java and C and C++ all support multi threading, it is much easier to build multi threaded programs in Java than in C or C++. While Python supports multi threading building processes with more than a handful of threads will run into the GIL issue which will limit any further efficiency improvements by adding more threads. Almost all languages will work with multi processing scenarios. However multi processing typically requires all shared resources to be externalised from the process, leading to a resource contentions becoming likely at lower concurrency levels (eg. a counter of current in process concurrent requests can be accessed and released very rapidly in a single process multi threaded scenario, but becomes much more expensive when working with multiple processes since the same needs to accessed either using inter process communication or through some other kind of shared resource such as a flat file or a cell in a database table.) In some cases you would’ve had to deal with such problems anyways, but in others they got introduced or aggravated only since you decided to increase the number of concurrent threads/processes of execution in order to leverage multi core.

5. Horizontally scaled architectures :

Do you have a horizontally scaled shared nothing architecture which spreads the tasks across a cluster of machines ? If yes, you could celebrate a bit, since you’ve already taken upon most of the challenges that multicore would’ve introduced. You already work with shared resources across a network and already work with multi processing at the minimum. Very unlikely that you may have much work to do unless your design/architecture somehow constrains you to run only one process per machine. There is a small probability that increasing the numbers of processes or threads may somehow exacerbate some resource contentions quite severely, but if you’ve been able to build a scalable system so far, there’s a good likelihood you’ll be able to work through some of these issues without many hiccups.

So if you are a facebook or google developer working on massively horizontally scaled applications, chances are that you will not notice the multicore challenge and will infact leverage the multicore capabilities in your stride. If on the other hand you are writing a desktop based technical charting or statistical analysis program with some demanding performance requirements, chances are that you will need to beef up your multi threading / multi processing skills in order to leverage the multiple cores that will certainly find their way somewhere on or around your desk.

Day 1 : IndicThreads.com Java Technology conference

Posted by – November 25, 2008

Here’s my notes from the Day 1 of the IndicThreads conference on Java Technology. The first session focused on the state of IT industry in India, the second on some of the important developments in IT landscape, and the remainder started to get into the specific technology issues.

Session 1 : Inaugural Address : Ganesh Natarajan, Chairman NASSCOM

The day started off with Mr. Ganesh Natarajan, President NASSCOM delivering the inaugural address. With regards to the current state of the Indian IT-BPO industry some of the points he noted were (the words are mine)

  • Fantastic track record for last 4 years – however now under cloud of uncertainty
  • 41 billion USD exports (contrasted to 2 billion for China)
  • 62% BPO in India is not voice (call centers) as per popular perception but is in other areas such as transaction processing.
  • The Indian industry now offers a comprehensive set of services covering the entire continuum of the requirements from IT – BPO
  • Cost pressures forcing Indian IT to start focusing on Tier III cities. 43 tier III cities identified for growth in IT-BPO industries.
  • Training and skills is a huge issue. While education in Tier I colleges is very good, education in Tier III colleges still leaves a lot to be desired. While 200000 new jobs are expected to get added this year, and 6 million over the next 10 years, India will still have manpower surplus in 2020. The important challenge facing the industry was training and skills development.

Session 2 : Keynote Address : Anand Deshpande, CoFounder and MD Persistent Systems

The next session was by Mr. Anand Deshpande of Persistent Systems. He focused on providing a contextual changes within which the software development community shall be expected to operate in. This 10 contextual changes he elaborated are :

  1. Multicore
  2. Mobile Telephony
  3. Cloud and SaaS
  4. Web 2.0 and Social Networking
  5. Rich Internet Applications
  6. Large Volumes of Diverse Data (including BI and analytics)
  7. Open Source
  8. Gaming and Entertainment boom
  9. Green IT
  10. Community Software Development (contribute back to the community please!)

One point he made which was not as optimistic as Mr. Natarajan’s opinion was that there could be significant job cuts in IT industry in India over the next 6-9 months. The rationale was if customers simply told the vendors cut 20-25% of the costs – vendors would have no choice but to have some job cuts.

Session 3 : The Future of the Web: HTML 5, WebSocket, Java and Comet : Sidda Eraiah, Director of Management Services, Kaazing

The session focused on HTML 5 and the WebSockets capability. Sidda discussed how WebSockets rides on HTTP/HTTPS for the initial handshake but then upgrades itself into a fully duplex (bidirectional) channel between the browser and the server supporting server initiated communication as well (post the channel being established). While he mentioned that HTML 5 wont be ready till 2022 (see related link here), browsers were already starting to support WebSockets. Opera currently supports it and Firefox has a patch which might get into the mainstream sometime soon. He demoed an application with a server side infrastructure using the open source kaazing server and JMS which eventually fed stock quotes to the browser. WebSocket lends itself to a small number of applications but can help reduce client update intervals and network bandwidth requirements substantially where it is relevant.

Session 4 : Getting Events and Web2.0 into SOA based Solutions : Ramesh Loganathan, MD, Progress Software India

Ramesh talked a lot about how getting an event based design was going to be important in SOA based deployments. Many points he made were rather pertinent about the overall SOA landscape. Some of these were :

  • SOA infrastructure has become unduly complex. He later on mentioned that in most cases where it is used today probably a simple workflow engine would suffice.
  • REST is a good 80/20 example of SOA – the most important functionality at minimal cost
  • SOA is past the hype curve and into serious adoptions
  • In many cases SOA is being used on the boundary for integration purposes. As an example he mentioned it being used only for batch file transfer (breaking a batch into individual transaction) at one large financial institution. SOA needs to move beyond integration and into a bit of a paradigm shift
  • While early SOA deployments were on the border, we need to now start designing applications using SOA ground up to leverage SOA capabilities.
  • Various mediation elements in the SOA landscape (transformation, security, audit etc.) were now going to be an important challenge area.
  • Three important things to think about going forward for SOA architectures – Event Handling, Virtualisation and Web 2.0
  • In the context of Virtualisation and Cloud based computing (where the SOA elements could be massively distributed), performance would be key and SOA implementations would need to be fast, reliable, scalable and secure
  • Again in the context of Virtualisation and Cloud based computing, these would not really impact the SOA implementations themselves since not much really changed except that one would need to account for the fact that these services could be running on a much wider (global) scale network.
  • In the context of events, we should build events into the SOA processes. While events are raised by services or processes, services and processes should respond to these events. These events would eventually be consumed by BAM (Business Activity Monitoring) dashboards. These would take SOA into a scenario where rather than it acting in the classic request response model, the events would trigger the requests which other services or humans (in case of BAM dashboards) would respond to. Thus we need to have events onto the ESB.
  • In this context one needs a good event engine with good pattern recognition capability to work on the events. Thus services could raise events, which the event engine could analyse and in turn feed new events back into other services. As an example he mentioned a case where events from a weather monitoring system were analysed in turn leading to events / requests being triggered into an airline scheduling system.
  • In the context of Web 2.0, he mentioned that we need to put users in the middle. I am not sure if I understood it correctly but I guess what he was opining was that we need to keep the user in mind much more when designing SOA based system sets, in a manner which would help increase the capabilities and choices at the disposal of the user.

Session 5 : Spring 2.5: Enhanced productivity and production power : Nik Jones, Consultant, SpringSource Australasia

This session was about various Spring capabilities with a special focus on some of the newer things added in Spring 2.5. Since much of it was a description of Spring capabilities which is probably adequately covered by Spring documentation, I will only note that I was particularly impressed by Spring Osgi capabilities in Spring 2.5 and the fact that one can build Spring based POJOs and have them hosted in a J2EE server (Tomcat) or with an EJB container or an OSGI server provided by SpringSource (which would actually host one or more web tier application server such as Tomcat).

Session 5 : Mock Objects in Action : Paulo Caroli, Agile Coach, Thoughtworks and Snehal Jha, Thoughtworks

The session focused on three mock object testing frameworks – jMock, easymock and mockit and demonstrated an otherwise identical set of test cases implemented using each of the three frameworks. A very nice attempt I thought at bringing out the nuanced differences across the mock testing frameworks. Interestingly Paulo refused to take the bait on what his favourite framework was, but instead referred the audience to how things were different in some of the code snippets being displayed and to a comparison on Mockit home page. The final word was – it depends – just try them out and figure out for yourself what works best for you. However I suspect it could have had a slightly better impact if the audience was prepared with a better understanding of junit (some of the questions and diversions were about junit and unit testing, and were not about mock testing), and if perhaps some more time had been spent upfront on what exactly a mock testing usecase entails in terms of writing code.

IndicThreads.com Java conference : LiveBlogging on Twitter and Summarising here

Posted by – November 24, 2008

Shall be live blogging from the IndicThreads.com Java Conference on twitter, and shall be posting a daily summary here. After much consideration, decided twitter was a much better option for liveblogging (wordpress just doesn’t seem like the right tool to liveblog :( ). I shall be twittering on dnene_liveblog (Thats not my usual twitter account which is : dnene). Hope to be able to post a summary of the day each evening on this blog as well. Here’s the conference schedule to whet your interest.

Poll : Usage of web services (SOAP / REST / HTTP)

Posted by – November 19, 2008

While the web service meme and implementations has been out there for many years, statistics about how many developers publish or consume them are a little hard to come by. Many statistics talk about the % organisations which have adopted or intend to adopt SOA / Web Services etc., but these often are less than sufficiently useful  IMHO since it does not indicate how widely these get used.

This poll attempts to understand how many developers “actually” either publish or consume web services in the current projects that they are involved with, and if so, what is the nature of the web service APIs (SOAP / REST / HTTP-POX).

Please note, that the poll is in the top right widget of this blog. Multiple selections are acceptable. If this is an answer you seek as well, please invite other developers you know to participate as well through email and blog posts Just indicate to them that the poll can be found in the top right widget at http://blog.dhananjaynene.com. While you can follow the current results online as entries are added, the poll will close on November 30th when the final results will be published.

Update :

Due to apparent lack of interest (no votes for the last 3 days), I am closing this poll early. The final summary of the votes is as follows :

  • Publishing with REST (57%, 12 Votes)
  • Consuming / Using REST services (57%, 12 Votes)
  • Consuming / Using SOAP services (29%, 6 Votes)
  • Publishing with SOAP (24%, 5 Votes)
  • Publishing non SOAP / non REST HTTP services (24%, 5 Votes)
  • Consuming non SOAP / non REST HTTP services (14%, 3 Votes)
  • No web services being published or consumed (5%, 1 Votes)

Total Voters: 21

Fomenting unREST : Is RESTfulness a semantics game ? Why does REST require statelessness ?

Posted by – November 13, 2008

Background :

Roy Fielding recently wrote : REST APIs must be hypertext-driven. It is an excellent writeup which actually focuses on What REST is NOT and is written in the context of SocialSite Web APIs which are an implementation of the OpenSocial Restful Protocol. If you are interested in understanding what REST is in any substantial way I highly recommend Fielding’s post, and if you can spare some time do read up the comments and also Section 5 of his dissertation or perhaps the whole dissertation.

The essence of the post is found in the statement :

What needs to be done to make the REST architectural style clear on the notion that hypertext is a constraint? In other words, if the engine of application state (and hence the API) is not being driven by hypertext, then it cannot be RESTful and cannot be a REST API. Period. Is there some broken manual somewhere that needs to be fixed?

I agree with it entirely, and that is not what makes me uncomfortable. Its just that the rules of figuring out what is compliant with REST are a little porous, and the whole requirement of statelessness I find a little orthogonal to the driving principles of REST design.

Do semantics decide what is RESTful ?

Roy essentially states he has no difficulties in people following non REST approaches, but suggests that if something is called or classified as RESTful, then it must adhere to what he has described as characteristics of REST architecture, and if it doesn’t, thats allright, just don’t call it RESTful. I think this is a perfectly fair stance. But it does beget a question, what do I call an architecture style that is strongly inspired by REST, fairly close to REST but does not actually meet all the expectations that Roy laid out. But an even more important question – apparently there is some amount of semantic jugglery that one can conduct to make an architecture seem RESTful. Does it in such a situation become RESTful ?

I think REST as a architecture style is great, it really simplifies things in many ways, but (oh! the horror of it) I cannot seem to agree with it entirely. However that is my individual perception and opinion and not the topic of this post. The topic is the fact that I feel I am modeling something that does not meet the REST requirements in spirit but yet can argue my way through it in letter. If anyone of you gets the same feeling or has figured out a way out of it do post a comment below.

Setting the context. Quotes from Roy.

A few quotes from Roy before we get into the details. As per the post, clearly REST wasn’t intended to be a RPC substitute.

I am getting frustrated by the number of people calling any HTTP-based interface a REST API. Today’s example is the SocialSite REST API. That is RPC. It screams RPC. There is so much coupling on display that it should be given an X rating.

and from section 5.1.3 of the thesis on the matter of statelessness,

We next add a constraint to the client-server interaction: communication must be stateless in nature, as in the client-stateless-server (CSS) style of Section 3.4.3 (Figure 5-3), such that each request from client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server. Session state is therefore kept entirely on the client.

and from section 5.2.1.1 of the thesis, with regards to what a resource is :

The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author’s hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.

A sample application and REST like API :

Let us take a simple example – an ATM machine. A simple use case, user enters bank atm card, machine prompts for pin, user enters pin, machine shows menu, user selects make card payment, machine asks for card number and amount, user enters card number (destination account) and amount, machine transfers funds. Simple enough ?

Now lets build this using a restful (or not so restful) API.
Note: Purely for writing and reading simplicity I shall represent all HTTP requests using GET semantics.

1. User enters card : Client level activity. No API required here.
2. Machine prompts for pin : Client level activity. No API required here
3. User enters pin (and presses a OK / Submit button) :
Here’s where we need to send the following data to the server – card number and pin. As per REST, we should always access a resource on a server. So here’s how I am going to design the api for that


http://atm.bank.com/session?cardnumber=123456&pin=1234

This will initialise a new session, and the response shall return me a new session. The response xml shall be :


4. Client requests menu :
The response was hypertext and had the URI for the menu which the client shall now invoke. Note that in the example below, 999 is the session identifier, and it would be possible to pass 999 as a parameter rather than as a segment in the URI itself (take your pick).


http://atm.bank.com/menu/999

Here’s a sample menu that might be expected


  
  
  

5. The user selects card payment.


http://atm.bank.com/cardpayment/999

The system detects that the required fields are not supplied responds with the necessary information that the client needs to supply


  
  

So the user now enters the two fields and submits the data


http://atm.bank.com/cardpayment/999?destinationaccount=98765&amount=111.22

The system verifies that all is OK (ACL, valid destination account, adequate balance exists etc.) and performs the transfer successfully


Analysis :

Either you might have found the above API allright, or you might have one or more of the following issues :
1. The session id is a state tracking identifier. REST is stateless. Clearly this is not RESTful
2. The card payment was clearly a call to a remote procedure which could otherwise be represented as : transferMoney(sessionId, toAccount, amount)

One of the early things I learnt was that when designing transaction processing systems, many user requests result into some activities that need to be triggered on the server side (as in the transfer funds to credit card above). Intuitively it is a procedure, and because one is in the client server world it is a remote procedure. So how can it be RESTful ? Again, from what I have learnt, its easy – just convert the verb into a noun and the remote procedure is now represented as a resource. How so ? Simple – instead of transferMoney procedure, I now have a cardpayment resource which I activate. What else changed, the return format of the procedure (oops resource) is now a hypertext document which contains necessary state information about the transfer and hyperlink to the other associated resources. Voila ! We are RESTful. But is it still RPC ? The essence of RPC has not gone away.

But isn’t this a stateful architecture. By all means yes. Most certainly it is. But here’s how I am going to argue my way out of it – I actually created a resource called session, and subsequently was simply using it (I could use it by making a reference to it as a segment in the URI as I did above, or by passing in the id as a parameter, or by passing in a addressable URI for the session itself as a parameter eg. http://atm.mybank.com/session/999). This session has associated with it on the server side the card number and account number which I can use readily. Stateful ? Yes. But it is still consistent with the overall Resource Oriented approach of REST where the session is but one instance of that generic thing called a resource. Thats semantic skullduggery you might argue. Quite right .. But I couldn’t find a way to blow a hole in that argument.

However there is a small hitch in the above argument. The REST dissertation explicitly excludes the session from being treated as a resource (see the last line of Roy’s quote related to statelessness above). I still haven’t quite gotten over the asymmetry of treatment to session data and other resources. I cannot quite understand why stateless = RESTful and stateful = RESTless. It is not as if one ends up specifying all the resource data in all the cases. For all the associated objects (eg the Account object corresponding to the ATM Card number in the case above), one simply passes in the resource URI / identifier, and the server side then goes fetches the additional data as might be required to complete the transaction. So if one passes an invalid card number, the service will ensure that the appropriate validations will reject the transaction. So why can one not treat the session as a resource and have it being treated symmetrically with all the other resources. Why single it out ?

The need for statefulness :

Here’s one reason why I would really like to have statefulness – security. I do not want to keep on transferring again and again with each API call the ATM card number and the PIN, since these do not change frequently (if ever). I would like to transfer the same as few times as possible (notwithstanding any additional encryption or obfuscation I build on top of them). There have been cases where sniffing data over the wire has been suspected, and I would like to minimise the likelihood of a breach by minimising the number of times critical data is shipped over the wire.

Here’s another reason why I like statefulness – efficiency. In the above example, the (source) account number was never mentioned or passed, but yet I am sufficiently aware, it is actually required for every single non trivial activity. So why not just load it when the session is initialised, stuff it into the session and not worry about having to retrieve it from the database.

Clearly it is very easy to overuse and abuse statefulness, and I try to keep state data to the absolute absolute minimum, but completely statelessness is something I have yet to feel comfortable with.

Finally – and this is an argument more in principle than in practice, the need for statelessness is context specific. If I have a hefty server, and expect only a handful of active sessions at any point in time, it is possible to argue that stateless architecture simply may not be required at all in such a particular context. But to then imagine that the otherwise RESTful architecture is now not so makes me restless.

Final words :

In this particular post I am going to assume that those with more knowledge of REST and RESTfulness than the limited amount I carry, are going to blow holes in my semantic jugglery, and maybe prove that there is no amount of semantic jugglery which will make the architecture RESTful. That does beget the first question I asked – If an architecture style is largely inspired by REST but makes a few deviations what should one call it ? Maybe RESTlike or RESTspired but you might come up with a better word. I am certain I am going to be far more concerned about whats required in the given context and what is most appropriate for the users and the sponsors rather than whether the APIs are pure REST APIs. How about you ?