This was a presentation I recently conducted to an audience of programmers / architects primarily in the financial services domain.
web
13
Jul 10
Presentation : Recent trends in technology
9
Jun 09
REST is the DBMS of the Internet
After my fortunately rather successful post “Why REST ?“, I had planned to write another longish followup roughly titled “Implications of REST on software design and frameworks”. However I had an interesting exchange of Twitter DMs (direct messages) after the post which gave me the right words I was looking for to summarise this impact on software design. This simple example was so compelling, that I decided to make that into an independent post and delay the “Implications ..” post by another couple of days. So at the risk of giving away the very essence of my subsequent post, here’s the summary.
The Twitter DM’s exchanged were as follows :
@sbidwai : is REST like object oriented implementation for services, where as SOA is procedural ? thinking loud..
@dnene : a slightly better analogy would be SOA is like invoking stored procedures, whereas REST is like invoking SQL on the table
@sbidwai : agreed in parts.. but most impl will hv much more than CRUD.. eg, twitter rest apis..*
@dnene : CRUD is the interface. To extend the analogy, logic is implemented as triggers not SPs. (My Opinion)
* CRUD in our shared vocabulary stands for Create, Read Update, Delete.
As I subconsciously wrote the last DM, it suddenly dawned on me that this was the one concise way to express how REST architectures would impact software designs.
To summarise the exchange differently
“If WS-* is the RPC of the Internet, REST is the DBMS of the internet”
To expand on it a bit more :
Traditional SOA based integration visualises different software artifacts being able to interact with each other through procedures or methods. REST effectively allows each software artifact to behave as a set of tables, and these artifacts talk to each other using SELECT, INSERT, UPDATE and DELETE. (or if you wish GET, PUT, POST, DELETE). And where exactly is is the business logic ? Is it in the stored procedures ? Not Quite. Its in the triggers.
8
Jun 09
Why REST ?
Introduction
It is becoming apparent that even as it becomes popular, REST (Representational State Transfer) is not yet as well understood. This might seem a surprising statement, but a lot of us use REST thanks to many frameworks supporting REST like interfaces, have a sense of what REST like interfaces are like (even if such an understanding is not sufficiently accurate), and exercise our common sense in using such interfaces. Having said that, let me clarify that while the internet is full of documentation about the semantics of REST, its actually quite light on the rationale for REST (including Roy Fielding’s dissertation which is the reference document for REST). Thus I have had to intersperse REST semantics and historical narrative with some personal opinions. So treat this as a part opinion and feel free to question my thought where you think it does not make sense.
Audience
If you are a REST expert, you are likely to have figured out much of this any ways by now. If you would like to understand specific technical semantics about REST, again this may not be the best article to read. However if you are curious about REST and would like to read a perspective on why and how it makes sense read on.
Flow
I shall be meandering through a historical narrative in the first half before starting to make the points I wish to make in the second. Lot of the points I make in the first half are likely to be those you already are aware of. However these are being made to allow an immediate recall when you read the second half. It is quite important to have read the first half to understand the perspective I put together in the second.
Historical Narrative
FTP
I am sure you have used FTP (File Transfer Protocol) a few times even though nowhere as frequently as HTTP (HyperText Transfer Protocol). Let me quickly present some characteristics of FTP
- Given a FTP client, you can connect to any FTP server so long as you have a valid userid/password pair for the server (or anonymously if the server so supports).
- The home directory on connecting to a FTP server is typically your starting point. At this point typically you can execute the ‘ls’ or ‘List Directory’ command to list all the files and directories within the home directory.
- If a file interests you, you can get it by issuing a get command
- If a subdirectory interests you, you can further navigate into that directory by issuing a ‘cd’or ‘Change Directory’ (or often by double clicking on the directory in case of a graphical client).
- If you would like to add a file to the current directory you can issue a ‘put’ or ‘Upload’ command.
- While you have the flexibility to navigate from one directory to another, you soon realise that every file and directory is uniquely addressable by its fully qualified path (either absolute or relative) and you can refer to each file and directory by its path. You are also aware that a valid path will uniquely resolve to only one directory or file.
- At each stage as you navigate into a separate directory, the server allows you to retrieve the list of subdirectories and files within your current directory. It always shows you the current state of that directory. Thus even if you were to list the same directory twice, and someone else uploaded a new file or created a new subdirectory successfully between the two requests, you will see the reference to the new file / directory when you request for the listing the second time.
- At every stage you issue a command, the FTP client+server work together to service the request (or issue the appropriate error message as necessary), and then pretty much forget about what you did. In other word the server keeps no track of and shows no awareness of what you have done earlier in your session (though it does remember who you are primarily from a security perspective.
- To work with a FTP server using a command line client, you primarily need to understand the usage of four commands (verbs) post a successful connection. These are ‘ls’ to list the contents of a directory, ‘cd’ to navigate to a different directory, ‘get’ to download a file, and ‘put’ to upload a file.
So FTP allows us to upload and download files. But does it allow us to ‘do things’ ? Sure, so long as you combine it with a few more pieces in the puzzle. Let us say, we are back in the late 80s (prior to the invention of HTTP) and I want to send and a list of purchase orders collected by a local office to a central office for further processing. This requires the following elements to be added to the mix.
- A shared understanding of where the files will be uploaded, how they will be uniquely named, their specific file extensions (optionally) and the specific format of the file eg. Comma Separated or Lotus 1-2-3 or WordStar or WordPerfect (the popular application software of the day) including the positioning of the various fields in the file, and
- Preferably a daemon process on the central office computer (the FTP server) which regularly scans the directory, parses each file as it comes it, does the relevant processing on it, and generates the appropriate result files and places them in the appropriate directories using the shared understanding of the directory structure and the file naming convention to communicate back the results of the processing.
Ho Hum. This was all stuff you knew. But the reason I brought it up is that FTP and how it was leveraged then has a lot to do with the principles that govern REST as we shall later see.
RPC
Back then in the 80′s FTP wasn’t the only mechanism to transfer data between machines. One more of the many other options was RPC (Remote Procedure Call). It not only allowed you to transfer data across machines, it actually had built into itself, a contract to remotely execute software. Unlike FTP which merely transferred data (in well understood units called files), RPC allowed you to invoke remote procedures by supporting an ability to pass messages which included the message name and the values for all the parameters necessary to be supplied to the message. Unlike FTP which was meant to do data transfer across a network, RPC was geared to do things remotely.
A contrast between FTP and RPC
If the objective of network computing is to use the computers and networks to ‘do things’ one would assume that many more people would use RPC than FTP for the same. While RPC did get used as a technical substrate, at a business processing level FTP got used far more (eg. to send and remotely process the list of purchase orders). There are some important reasons that we need to understand here.
FTP required understanding of very few basic verbs (ls, cd, get, put). Thus the training required to understand FTP semantics was far less than that for RPC. This was partially due to the fact that RPC had a programmatic interface. To the best of my knowledge there are no widely used command line clients for human interaction with RPC services. In addition, each procedure required a set of semantic data (parameter) associated with it. This was no different than FTP which also required similar data to be shipped over the network. Turns out there were a few distinctions. First, the nature of design of RPC services often required combining application data with control data, and there was also often a sequential expectation due to the RPC business transaction being broken up into multiple RPC calls perhaps for the sake of efficiency. Moreover each time, new procedures were added or parameters added, these required programmatic changes. FTP on the other hand was simpler. In most cases the entire data (including some redundant data perhaps) was sent in one block (or file in FTP parlance). By dealing with a file as the least common denominator, the FTP stack decoupled itself from any application specific semantics. Moreover, depending upon the agreed upon format, the files could be edited at either end by by human actors using specific software such as plain text editors or word processors or spreadsheets. Moreover if the formats changed, wherever such files were being manually edited, no programmatic changes were required. Irrespective of the changes you made to the file formats, file processing software, the FTP stack itself did not change – it remained stable.
Less is more
A theme I shall come back to again is many a times less is more. FTP had far fewer training requirements (few basic verbs). FTP did not deal with parameter value formatting (though other pieces of software subsequently might have to). FTP was just so much easier to start working with. FTP did not actually preclude any of the capabilities of RPC from being introduced, it merely allowed this to be added subsequently as additional optional layers (or subsequent elements in the processing pipeline). Finally FTP allowed users to deal with the data in the units and the formats and the tools they understood the best – their day to day application software components and simply focused on only transferring files, while imposing only one requirement – each end should work with a file as a unit, and both ends should understand the file formats. By focusing on file as a unit, each business user could focus on the data he/she wanted to deal with in the format that was most appropriate (an analogy in REST would be a resource .. but I’m getting ahead of myself). And at the end of the day, by doing less, FTP ended up being much more popular and thus doing more.
Other protocols widely used on the net were SMTP / POP which were used for email. Email eventually was considered the killer app for the internet. Similar to FTP email focused on the users getting to learn only a few basic verbs and exchanging the basic unit of data transfer (messages) using these verbs. Again, even though email itself didn’t get things done, it contributed far more heavily than RPC to getting things done, by having other manual or software actors at either end of the messages who did the necessary processing required.
WWW (World Wide Web)
While email was the killer app for the internet, the one that really brought it to masses was the world wide web which was based on the HTTP protocol. While HTTP could be used to ship documents of a variety of types (often classified by their mime-types), the defacto type of document used the HyperText Markup Language (HTML). Unlike FTP and email, this required the authors to understand a new language, but used a simple markup syntax to keep the learning curve to the minimum. It however introduced a very powerful element – the embedded hyperlink. While the earlier technologies supported a uniform identifier for each document / message, the hyperlink allowed references to other documents / messages to be embedded thus converting the document pool into a document web. We now had the ability to navigate from one document to another and such navigation retained the contextual relevance by embedding the hyperlinks. There were other enhancements as well such as introduction of more verbs (eg. POST and DELETE, the latter not really being supported by any of the browsers). Allow me to state the salient points of WWW despite the obvious duplications with some of the points I listed under FTP (for the sake of emphasis). Note that the scenario I describe below is primarily describing static web serving (except to the extent of file uploads) and does not address the presence of a dynamic web application.
- Given a web browser you can connect to any web server optionally using the appropriate authentication credentials.
- Typically the home page of the web server is your starting point. At this point you are shown the document which usually include embedded hyperlinks to other associated documents on the web server.
- You can get/view/download/save a document by clicking on a hyperlink pointing to the document.
- Some web servers may be configured to allow you to browse a directory. Clicking a hyperlink pointing to the directory allows you to see a directory listing which shows all the subdirectories and documents within the current directory. Each such subdirectory or document is also shown as a hyperlink to allow you to navigate to it.
- Some documents have an form including an embedded file field and a button which allow you to upload a new document onto the web server.
- Each document (and directory if directory browsing is turned on) has at least one identifier which uniquely identifies the document – the URL. It is feasible to directly navigate to the document if you are aware of the URL.
- Navigating to a different document often provides you with a different list of embedded hyperlinks which are contextually relevant to the document being viewed.
- At each stage the web server is not aware of any other information about you apart from your authentication credentials, and is not generally aware of your browsing history (except what may be stored for audit purposes on the web server logs).
- As a user the primary skills you need to grasp is the ability to enter a starting URL, and then being able to navigate from document to document by clicking on the hyperlinks. If you are uploading documents, you may in addition need to know how to specify a local file path and press the Submit button to upload the file.
- While HTML documents are the defacto default, the same capabilities can be used to serve any types of documents. The server usually identifies the document types by the registered mime types, and the browser may either render the document itself or call upon the necessary add-on plugin application to render the document based on the appropriate type or may in some cases simply save the document locally in case no such application is available for further processing.
- Usually but not necessarily the document name have the characteristics of a noun
Again the reason I listed these characteristics is that these have a tremendous commonality with those of REST (except that what I refer to as a document above may get referred to as a resource in REST parlance).
SOA : DCE, Tuxedo, CORBA, RMI
Even as the web was evolving other technologies which allowed for more sophisticated remote service invocations were being developed. Along with RPC, these were essentially different technical manifestations of Service Oriented Architecture (SOA) principles. While these are substantial developments in their own right, the relevant points to be made in the context of this article are :
- Each SOA service supported the ability to define a set of service semantics which included the service name, the parameters to the service, an ability to expose the metadata of such semantics, an ability to leverage such metadata and invoking such services either statically or dynamically from a remote client.
- Many services were usually expected to “do something” though quite often some services would simply return the requested data. Usually but not necessarily the services were identified by using ‘verbs’.
- Some of the SOA services allowed maintenance of a client state on the server, and allowed the server to do processing conditional on the client state.
- These technologies almost invariably required some kind of programmatic effort at both the client and the server end. Manual specification of the service parameters and manual invocation of the service was simply not a typical use case. Neither was a default rendering of the results easily available to be manually viewed by an end user.
- Unlike retrieving or storing a document, these services often were expected to have a far more complex functionality.
CGI, dynamic web applications and Web Services
Clearly as WWW started getting used far more, people were only too keen to use it for much more than storing or retrieving documents. This led to the development of CGI and subsequently other dynamic web application technologies (eg. LAMP, J2EE etc.) which would allow us to use the web to ‘do something’. Since these were clearly offshoots of the SOA world, being mapped onto the WWW infrastructure, the characteristics of such dynamic applications often had a lot in common with SOA, and they started dropping many characteristics of the traditional static WWW. Thus was born the child of the world wide web and distributed service oriented architectures – web services. This led to newer SOA technologies such as WS-* and SOAP.
Like the typical scenarios after the discovery of any highly profitable opportunity, the early rush was to leverage the opportunity and it was only a little later when the dust died down, that people started wondering if they had sacrificed something in the heat and dust of the moment. That stock taking resulted in the realisation, that some of the very basic characteristics of the extraordinarily successful internet technologies (FTP / SMTP / WWW) had been diluted, and even if such dilution still allowed immediate progress to have occurred, some of them would need to be corrected to be able to continue the explosive growth that had been seen so far. One such exercise in my opinion is the laying down of the REST architecture style.
REST Semantics
While REST brings back many of the characteristics that made internet so successful back to application design, it should be noted that many of these are not precluded by Web Services or SOA. However what are mandatory characteristics in REST are in some cases missing from but in most cases quite feasible to implement in traditional (non REST) web services by using additional best practices. Also note that each characteristic is not necessarily universally superior. So do evaluate it in your context to see if it makes sense. However before we get to the benefits of REST, a quick synopsis of REST technical characteristics might be in order.
While a full description of the REST technical aspects is completely beyond the scope of this post, I summarise these below. You might notice the strong parallels between the characteristics of FTP and WWW and those of REST even as REST adds a few more capabilities. The reason I portray them in the form below in a manner quite similar to the way I portrayed the characteristics of FTP and WWW is to emphasise that REST actually continues to leverage the same characteristics that made these technologies so popular and globally scalable, even as it just adds those few minimally necessary capabilities to achieve the same scalability for not just transferring documents or rendering pages but to ‘do something’. In other words it brings together the characteristics which made the internet technologies so popular and applies them to the inter application integration, component and service orientation, and application mashup scenarios to allow them to achieve similarly large adoption and to perform the tasks necessary in the given context (or ‘do someting’ as I have continuously referred to).
REST characteristics
- Resource and media types as the basic units : REST treats a resource as the basic unit of data transfer. Such resources could refer to anything in the particular context eg. a flight reservation, an invoice, a video etc.
- Unique resource identifiers :REST requires that each resource have at least one identifier which uniquely identifies that resource. This makes it easy to be able to bookmark resources or make them searchable.
- Each resource has at least one representation :Each resource can be expressed using a variety of representations. This could include HTML, XML, CSV, JSON etc.
- Each resource has often one default manually readable representation : In most cases but not all, each resource has a representation that can be manually consumed using a web browser. Such manual representations are often either XHTML representation with associated CSS, but could theoretically be some custom representation rendered by a delivered on demand custom javascript or Java applet. Note that this is not a requrement of REST but is a practice often followed.
- Each resource has a type. REST supports self describing media types : Each resource has a type (referred to as media type since REST refers to the resource web itself as hypermedia). The type influences the data semantics of the resource, and the type itself can be self documenting using a variety of technologies (eg. one possible way is to specify XML schema descriptors).
- Each resource representation optionally includes contextually relevant hyperlinks to other resources :This not only allows the clients to auto discover associated resources, but also allows the server to clearly communicate the contextually relevant links based on an application state.
- REST resources are indexable and search engine friendly : A consistent resource naming and representation allows for easy indexation and search engine integration.
- REST requires minimal starting point intelligence : Typically one only needs the initial URL for being able to integrate with a REST implementation. All newer resources are often dynamically discovered. Since these media types also document their own metadata, client agents can automatically discover more information about them. Thus media type metadata rather than being compiled into the REST client can be dealt with dynamically or by using code on demand agents for dealing with the appropriate media type (similar to browser plugins)
- REST encourages a uniform interface. :Typically this manifests itself by the minimal verbs being used to describe REST operations.When used with HTTP these are GET, PUT, POST and DELETE. This reduces the intelligence requirements on the client. Additionally clients may be capable of parsing metadata for the resources based on standard formats such as ATOM or XML schemas. The context specific intelligence required on the part of the client is no longer in the verbs it has to understand (method names) but is now in the resource types that it may need to manipulate. Thus if a client can deal with resource identification, resource representation, self descriptive messages and hypermedia, it can start dealing with REST.
- REST supports value addition by intermediate processors : REST supports the scenarios where intermediate processor units can provide additional value addition. These could include processors which provide caching support or those that provide resource enrichment capabilities.
- REST encourages usage of scalable practices :By precluding usage of conversational state and sequential assumptions, REST implementations tend to be easier to scale even as they compromise on efficiency at times (due to redundant data transfer or additional processing requirements)
REST benefits
Having described many of the REST characteristics the following could be interpreted as the benefits of adopting a REST style architecture.
- Default RenderingIn case of most REST implementations, you can quickly provide a default HTML rendering capability. Thus even as you provide a REST interface to allow inter application integration, customers of such an interface do not have to wait for building the programmatic capabilities for leveraging it, they can get started immediately by being able to manually view all the resources and their states manually and by navigating around the interface by using a plain web browser. This substantially reduces the entry barriers for your customers, and allows them to get more conversant with your media types even as they are still figuring out how to programatically leverage the capabilities.
- Self describing / auto discovery of media types and capabilitiesThe traditional web service semantics rely upon clear upfront documentation of media types, their schema and the API semantics. Thus the metadata about the service is often communicated ‘out of band’ from the actual service itself. This is required so that the clients can understand all the valid end points and service semantics up front before they can leverage the services. Not so with REST. Given an initial starting point, REST greatly encourages a contextual provision of the relevant additional interfaces (hyperlinks) as a part of the the document / resource data itself. Thus clients do not upfront need to be aware of all the end points (resource URIs) to be able to leverage the services. Moreover REST supports self describing media types as well. Thus the schema information for the resources can be shipped ‘in band’ with the resource representation itself. This allows for clients to discover new media types or changes to their schema and even allows the default rendering of the same without having to upgrade the programmatic components to leverage the newly discovered or modified media types / schemas. Finally the code on demand capabilities (these are optional) of REST can allow code to be downloaded to automatically parse or render such newly discovered or modified media types.
- Encourages scalability even at the cost of efficiencyAspects such as non maintenance of conversational state, greatly increase the scalability of REST applications even if they do incur a minor cost in efficiency (which can be due to repeated redundant communication of data elements, or additional processing requirements due to preclusion of conversation state). This makes it relatively easier to set up multiple servers as the demand for the REST capabilities increases. Having said that, let me quickly add a caveat that designing clustered applications even if with REST interfaces is not always trivial, and while REST makes it “easier to scale” that should not be confused with “easy to scale”.
- Resource / Data semantics are much easier to understand than Service semanticsTo put it differently an invoice structure is much easier to understand from a data perspective than an invoice processor API. This makes it easy for the clients. This often also makes it easier for the server side implementations. Service semantics often bring in issues of sequence, client state and other control information, most of which can be avoided using REST. Generally speaking expectations are simpler to lay down and meet when specifying resources rather than services.
- Clear naming and accessibility of each resource in your universeWeb services don’t mandate clear unique identifier for all your resources. Thus sometimes it is not possible to reach a particular resource except through a convoluted series of steps. In some cases some resources are inaccessible for ever. As an example, many online shopping experiences end with an invoice being shown, but I have often found it impossible to later on pull up that invoice that was earlier shown to me at the end of a transaction.
- Extensible resource types which are optionally dealt with by clientsNot only are resource types self describing, REST makes it easier to convey additional extensions to such resource types by using additional URIs within the resource representation as well. Thus even as a representation lays down a variety of field values (say for an invoice), there might be other associated resources which might either be optional or variable media types based on the context (eg. purchase order / quality report etc.) which can be easily referenced by simply including their URIs. Such additional information does not require the basic media type to be enhanced or by introducing attachments to the media types. These can be implemented as additional navigable out of band media types. Thus clients don’t have to deal with them, but they can do so easily when they choose to do so. Thus the client has a choice to not deal with the additional media types when they do not make sense in the client’s context.
- Search engine friendlinessWhile resource directories help for smaller scale integration (eg. Yahoo when it started off, attempted to categorise the web), such directories or registries are often found to be tough to scale beyond a particular threshold (thats why Yahoo or Google now provide entry points by allowing us to search through all the web resources). Consistent resource naming and representation make REST resources search engine friendly and allow additional entry points into a REST service based on search criteria. This makes location of newer resources far easier than what might be feasible through a resource registry especially on a large scale.
- Easer layeringWhile it is possible to add intermediate proxy services for enriching the capabilities of a REST implementation, it just makes it seem a lot easier to implement these as and when required when the underlying architecture is REST based. Thus while mashups can be readily implemented using both REST and traditional SOA implementations, I would submit that these are much easier to implement on REST based architectures.
I have used the word scalability above in the context of the ability to service the runtime demands of a larger number of clients. However REST helps makes your software artifacts become scalable in one more way. By providing a basic and minimal uniform interface requirements, REST allows your applications / services / components a low entry barrier path into being a participant in a broad web of similar others who all agree on the basic REST semantics. This substantially increases the potential number of clients to your services since they can leverage these services easily and with low entry barriers. While traditional SOA technologies attempt to provide the universal access to all possible consumers, REST with its emphasis on minimalism, simplicity and low entry barriers actually makes it practical. Similarly REST makes it easy for you to start consuming other services and mashing them up with others to service your clients (pun intended) quickly. Finally REST takes the the very characteristics that made document and message sharing so easy to use and popular (characteristics which are not necessarily found in all conventional SOA implementations) and combines them with the necessary elements to achieve transaction processing, application integration and mashups (use the web to ‘do something’) on a truly global scale even as it makes these capabilities easily available and cost effective to leverage.
Some concluding comments
While not directly relevant as REST rationale or REST benefits, I thought it might be useful to add a few more associated comments within the context of REST usage and adoption.
Simplicity and bottom up adoption
I must confess, my biases show up quite strongly in this paragraph (so feel free to treat this as a partially prejudiced statement). Simplicity is not per se a characteristic of REST. However it does stem from the nature of genesis of the competing options. While most internet technologies using an incremental, evolutionary approach, most SOA technologies have been designed by a committee. This is why the consulting and development budgets required to implement FTP / email / Web especially on a per utility basis are far different than those likely for implementing DCE, Tuxedo, CORBA and SOAP. Part of the reason is also due to the fact that most internet technology adoption is bottom up, while that of SOA often is top down. While top down may seem attractive, it may seem sobering to realise that most top down processes break down beyond a particular scale. Thats why free markets on the whole have trounced centrally planned economies (though some recent happenings do point to limitations of the same as well). Thats why internet scale simple inter application API integration and mashups took off even as intranet scale application integration was mired in budgeting, territorial, enterprise modeling and governance issues. Thats why the LAMP stack (eg. PHP) which hasn’t been particularly strong in the non web arena, is deeply entrenched in the web based application space. Sometimes it just is more productive to quickly implement a simpler technology and incrementally enhance it rather than attempting to cover all possibilities, options, and border conditions by putting a committee in place. At its very core, REST requires only incremental understanding of newer technologies, is easier to incrementally adopt and is less likely to get mired in organisational issues. Precisely the characteristics that FTP, EMail and WWW had.
Simpler abstractions win
I have generally found that simpler abstractions even though harder to deal with initially, often win in the long run. Notice the fact that the bare bones rendering functionality of HTML/WWW completely trounced the rich UI and application integration capabilities then available (eg. Windows/Java and DCOM/CORBA/RMI). This is not to suggest that the extra capabilities are not required. That is why Rich User Interfaces on WWW continue to be a dominant part of the internet technology wishlist. However the simpler, cleaner and minimalistic abstractions often are far more important than feature richness. A point I would want to make in favour of REST even as I admit that conventional SOA technologies are far more feature rich than REST.
REST is not SOA
I must confess, for a long time I believed REST was merely a specific usecase of SOA. However recent thoughts lead me to believe otherwise. There is indeed a reason for such potential confusion. REST based architectures and SOA may often attempt to service similar goals. To the extent of servicing such goals, REST may look like a substitutable component for other SOA technologies such as SOAP. However even as they attempt to meet similar goals, REST attempts to view at your architecture artifacts differently. REST encourages you to view and model your architecture as a set of resources rather than services. There are important implications of this not just in terms of the many benefits I describe above available under REST but also in terms of the design and architecture characteristics of the implementation. Treating REST as just another way to implement SOA sometimes encourages one to miss out on the subtleties. These however are beyond the scope of this post, and I intend to cover the same apart from the implications of REST on software design in my next post.
22
Apr 09
Is a large corporate making money off open source or open standards an oxymoron ? In a Sun / Java Context
The recent acquisition of Sun by Oracle reignited a thought that had been going through my mind for a long time. It simply boils down to whether a large corporate can make money off open source or open platforms. Now quite obviously it in itself is not a truism. But the point remains that large corporates which have become large in the conventional economy do find the going a little difficult when trying to make money off open software.
The way I perceived it, Sun was going through similar difficulties. The hardware business was delivering dwindling margins post the dot com bust, and the software business was under threat from upstarts such as Linux which offered a substantially similar software stack at near zero licensing costs. One of the crown jewels asset Sun had in its stable was Java. And while it was (and continues to be) a wonderful asset, it just was incredibly difficult to make money off it. Now java hasn’t been open source exactly for most of its life, but it was a sufficiently open stack nevertheless to cause the same difficulties that open source software would in terms of monetisation.
There’s an excellent blog post written a year and half ago by Michel Bauwens, Can the experience economy be capitalist? which does refer to some of the underlying issues. I suggest you do read it, but would like to quote a few points from it below.
First of all, in the field of the immaterial, we are no longer dealing with scarce goods, but with marginal reproduction costs and non-rival goods. With such goods, sharing does not diminish the enjoyment of the good, since all parties retain their ability to use them. The emergence of peer production shows a new form of creating value, that is in fundamental aspects “outside the market”. Typically, in commons-based production we have a common pool, accessible to everyone (Linux, Wikipedia), around which an ecology of business can form to create and sell scarcities (usually services and experiences). In sharing-oriented production (YouTube, Google documents), we have proprietary platforms that enable and empower the sharing, but at the same time, sell the aggregated attention (a scarcity), to the advertising market. Finally, in the third crowdsourcing mode, companies try to integrate participation in their own value chain and framework.
So the good news is that indeed business is possible. But I would like the readers to entertain the following proposition, nl. That:
1) The creation of non-monetary value is exponential
2) The monetization of such value is linear
In other words, we have a growing discrepancy between the direct creation of use value through social relationships and collective intelligence (open platforms create near infinite value through the operations of the laws of Metcalfe and Reed), but only a fraction of that value can actually be captured by business and money. Innovation is becoming social and diffuse, an emergent property of the networks rather than an internal R & D affair within corporations; capital is becoming an a posteriori intervention in the realization of innovation, rather than a condition for its occurrence; more and more positive externalizations are created from the social field.
Quite eloquently argued here that open platforms create near infinite value which is difficult to be captured by business and money. I still remember companies such as Borland, HP, Sun, IBM making a fair amount of money through selling C/C++ compilers and IDEs (I remember it used to be almost $5000 per seat). However this was in the days when the community capability of creating similar competing software stacks was only in its infancy. No longer so today. Now communities, open standards, open processes and open source are fairly well established sources of delivering asymmetrically significant capabilities at a fraction of the cost, a fraction which is made even smaller when the same is incurred by a small motivated group of individuals or small highly agile companies. There in lies the difficulty. Companies are trained to and built to deliver linear and symmetric capabilities in the context of costs they incur. However they have a relatively poor handle on monetising in scenarios where small teams can deliver exponentially asymmetric capabilities.
The blog post I referred to above, identifies the right area to look at in this context – Identify what is scarce and Monetise the scarcity. So when any software has a sufficiently large utility and can be managed through open processes to be able to satisfy a globally substantial demand at a fraction of the cost (eg. apache httpd), that particular capability (eg. static and dynamic file serving over http) is no longer scarce enough for a commercially focused company to make money out off. Thats why the money shifts where the scarcity is – how to leverage such software and put it to good use. This is where the existing commercial successes around open source are based on eg. RedHat, JBoss, IBM, Oracle etc. They monetise the support, training, services and consulting around such software. In many ways Sun could be argued to be a victim of its own success. It not only went much further than any other large corporate in terms of creating open specifications, it also contributed substantially to sufficient knowledge dissemination and tooling to allow many other individual, SME and large corporates to be able to compete with Sun in these very areas that Sun could’ve monetised. Sun thus created its own competition for monetising the scarcity that got created around Java even as many others cried out hoarsely that Sun simply wasn’t open enough. Will there be an incentive for Oracle to reverse that somewhat ? I suspect that could be a logical option if it focuses on ensuring a good ROI for Sun’s Java investments and assets and more so to be able to make itself more capable in the space around Java than the competition (ie. IBM).
So is a large corporate making money off open source or open standards an oxymoron ? Viewed narrowly yes. Because while it is possible for a small team to make some money with adequate ROI off open source or open standards, it is unlikely to be feasible for large corporates. They really need to figure out what is the new scarcity that gets created off such initiatives, come to a judgment that the opportunity arising from such scarcity is large enough and worthy of their time, attention and investment, ensure that there is no current available capability of anyone else disruptively delivering exponentially asymmetric value in that space, and finally occupy that space before other large corporates do.
It does boil down to the fact that while individuals and small companies may find it easier to monetise open source and open standards, the same is going to be generally tough for large corporates. Or as Michel Bauwens said in the blog post I referred to (in my opinion perhaps a little exaggerated but not entirely off the mark)
For all of this, we will need new policies, major reforms and restructurations in our economy and society.
But one thing is sure: we will have markets, but the core logic of the emerging experience economy, operating as it does in the world of non-rival exchange, is unlikely to have capitalism as its core logic.
16
Feb 09
Why I deleted my Facebook data. Commentary on Internet data privacy rules.
Update: Facebook has since reverted the change in terms of service. Cool. On Feb 18th, a message on the home page said :
Terms of Use Update
A couple of weeks ago, we posted an update to our Terms of Use that we hoped would clarify some parts of it for our users. Over the past couple of days, we have received a lot of questions and comments about these updated terms and what they mean for people and their information. Because of the feedback we received, we have decided to return to our previous Terms of Use while we resolve the issues that people have raised. For more information, visit the Facebook Blog.
Mark Zuckerberg also blogged about the same issue in Update on Terms.
Original post begins here.
Facebook published a new Terms of Service on February 4th 2009 which has a strong implication for how internet / cloud based data privacy is likely to be viewed. This was very well publicised here – Facebook’s New Terms Of Service: “We Can Do Anything We Want With Your Content. Forever.”. There was some consternation on the net especially on twitter about this change in facebook rules. While I did not use facebook much, I was sufficiently appalled at the change in rules to go and delete pretty much most of my data one line at a time. It is unclear to me if the old data is still available to facebook for sublicensing from a legal perspective (I know all the data will be there in their archives), but I decided it probably wouldn’t hurt to nevertheless to go delete most of it. I didn’t actually delete the account since Facebook still helps me keep in touch with my friends. But it is quite safe to assume that any interactions with them with an assumption or requirement of any data privacy will no longer be done on facebook.
Whats wrong with the new terms of service ?
Some people in forums argued that most of the data on internet is likely to be there forever. So one just needs to be careful and not worry about it. I don’t quite agree with that line of thinking. When I blog, tweet, post to usenet or forums, I am upfront aware of the fact that that data is going to be cached by google and other search engines and that once I press the publish button, there’s often no way to revoke it. However in case of Facebook, there is a general expectation that the data will be shared only within a network of friends, a network that I have control over. There is an expectation that that data will not get cached by search engines and short of an accidental data breach or some intentional malafide activities that data will not become public. What is unnerving with the new terms of service is that Facebook changed these rules at will without even sending me an email about the same.
Asymmetry of Privacy Expectations :
It is interesting to note how asymmetric some of the terms are. For example in the section User Content, the following is to be found.
By using or accessing the Facebook Service, you represent, warrant and agree that you will not Post:
* User Content that violates the law or anyone’s rights, including intellectual property (“IP”) rights or other proprietary rights (such as rights of publicity and privacy);
* any Contact Information or private information of any third party;
Further down in the section Licensing, it states,
You hereby grant Facebook an irrevocable, perpetual, non-exclusive, transferable, fully paid, worldwide license (with the right to sublicense) to (a) use, copy, publish, stream, store, retain, publicly perform or display, transmit, scan, reformat, modify, edit, frame, translate, excerpt, adapt, create derivative works and distribute (through multiple tiers), any User Content you (i) Post on or in connection with the Facebook Service or the promotion thereof subject only to your privacy settings or (ii) enable a user to Post, including by offering a Share Link on your website and (b) to use your name, likeness and image for any purpose, including commercial or advertising, each of (a) and (b) on or in connection with the Facebook Service or the promotion thereof.
As you can see, you undertake to not violate anyone else’s IP or other proprietary rights, but information about you will not be treated with the same level of respect by Facebook, though its done quite legally by documenting the same in the Terms of Service.
Moreover anything you post or any information on your stream is now sub-licensable by Facebook. Now why would I exactly want to sign away all rights on status updates, photographs etc. on content which I posted assuming that it was secure and private ?
But the earlier terms were also quite onerous. So how come you did not complain ?
Apparently under the earlier terms, facebook also had the rights on the content, so whats the big deal ? Two main issues.
- The earlier TOS did not grant Facebook the right to sublicense the content : The possibility of sublicensing means you have no control or idea on who the eventual user of that data could be. I still get angry at so many commercial parties at having leaked my email and phone number data. The likelihood of a similar scenario where facebook sells that data for commercial purposes now cannot be ruled out, purposes on which I will have no control on that data.
- The earlier TOS had an escape clause of deleting the account Basically Facebook did not have the right on the data once you deleted the account. This is important as can be seen by another case on Twitter Privacy Disaster At Twitter: Direct Messages Exposed (Update: GroupTweet Is Likely Culprit). In this case private messages were apparently accidentally made public due to confusing software usability. The person immediately responded by deleting the account. This is a useful kill switch to have in case one makes a terrible terrible mistake of putting out something accidentally. This kill switch is also no longer available.
Bait and Switch : By not informing users of the change in terms of service especially since these were so important, I think this creates an impression that the user is a victim of bait and switch (even though the real underlying causes of the change which I am unaware of could be different). Facebook should’ve informed the users about the change in rules, offered a button to delete all prior data / photographs / content or at least made clear that the earlier content will continue to be governed by the earlier TOS – something thats a little unclear in this situation.
Implications for Internet Web sites and users : I think sites should very clearly document how they will control and use the data that they gather. Many of them do by explicitly document the same. Moreover any substantial changes to the same should be communicated to the users. Finally users need to be now aware of potentially changes of Terms of Services on a number of web sites that they interact with. Data that they assume to be private may no longer stay so and the user may not be any wiser about the same if the Terms of Services are changed without him being explicitly informed.
Updates :
Why did I delete the data ? Seems some readers are thinking I deleted the data believing that that will get rid of it. Thats not why I deleted the data. I am fully aware the data is likely to live perpetually in facebook archives and be accessible to facebook. I deleted it because that data had been submitted and generated under the old Terms of Service. Letting it be around to me seemed like an implicit acceptance of the new Terms of Service around old data, which I was uncomfortable with. So I deleted the data at the first available opportunity on realising that the Terms of Service around that data had changed. Any new interactions I do with facebook will be under an awareness of and therefore an acceptance of new Terms of Service.
Response from Facebook : Mark zuckerberg attempts to address the issue on facebook blog : On Facebook, People Own and Control Their Information. I could not find any rationale to why Facebook needs the privilege to sublicense the content. I also thought the way the blog post was written and the way the Terms of Service are structured are very very different. In my opinion its the Terms of Service that count.
This topic has been also getting a lot of traction on other blogs. Am quoting some other interesting opinions on the topic on the internet along with link backs to the posts below
- cnet News.com : The Open Road : Facebook changes terms of service to control more user data :
Google has had its own problems with user privacy, but this Facebook move calls into question the wisdom of clouds or, rather, storing one’s data in others’ Web services like Facebook. We need to come up with new licenses or new mandates for open data in the cloud. Facebook shouldn’t own our data.
- Mashable : Facebook: All Your Stuff is Ours, Even if You Quit :
The possible implications of this TOS change go beyond these concerns. Sure, you can choose not to use Facebook at all, but that doesn’t mean a thing. Someone can still take your photo, slap it on Facebook, and now neither you nor the author of the photo can stop Facebook from using the photo in whichever way they please. Looking at it globally, millions of people are uploading bits of information on everyone and everything, to a huge online database, and by doing so they’re automatically giving away the rights to use or modify this information to a private corporation. And not only that; they now also waiver the right to ever take it back from it.
Facebook should take a long, deep look into how it treats its users. Until now, users had options with regards to how the data they generated on Facebook was used. Now, they have no options whatsoever, rather than quit the service altogether. It’s a major difference; I’m not going to take it lightly, and neither should you.
- Wet Asphalt : The Facebook Freakout :
You are only granting those rights “on or in connection with the Facebook Service or in the promotion thereof.” What does that mean? Well, it means that you are licensing the use on Facebook branded websites or any other media and the Facebook Platform, which is the legal name for the APIs that allow third parties to create Facebook applications. So if there was a Facebook TV show, they could use your stuff on that. Or if they launched a Facebook concert series or a Facebook magazine, they could use your stuff in that. Presumably, if there were a Facebook dogfood, they could use your content on that. Or if they wanted to make an advertisement FOR any of those things, they could use your stuff in that. Precisely WHY Facebook would want to do any of those things, I leave to the reader to speculate on. What they most emphatically CAN’T do is what Walters claims, that “We can do anything we want with your content forever.” They can do anything they want with your content ON Facebook or to Promote Facebook forever. But if they said that it probably wouldn’t cause the internet panic and generate hits for the consumerist and readers to stroke Walter’s ego with diggs and trackbacks and twitterposts either.
10
Jan 09
Stop making SOA complex
Ever since Anne Thomas Manes set off a flurry of blog posts and tweets I was completely amazed at the discussions going around what SOA is. So at the risk of (no, with the very intention to) muddying the water furthers I decided to add my 2 cents knowing fully well that many will choose to disagree. Update: I am primarily referring to the definitional issues of SOA not the implementation issues. However even from an implementation perspective SOA is not necessarily complex. The implementation of SOA can range from the simple to the very complex depending upon the choices made from the perspective of the scope, the expectations and the technology.
The definition of SOA is quite simple – the name is the definition – it is “Service Oriented Architecture”. Services are generally software things (I don’t want to call it modules or objects or components since that will set off another debate) which offer one small focused set of capabilities through a service API. As a generally accepted set of practices such services are often intended to be coarse grained, reusable in multiple contexts and loosely coupled from each other. When an architecture is oriented around building something that is built on services it is service oriented architecture. Period.
Now that we have looked at the broad outlines of what SOA is, lets consider what SOA “per se” is not (though these aspects are not antithetical to SOA itself – they are largely orthogonal)
- SOA is not about business alone : SOA can be used for business services. But it can just as easily be used for building technical services. I can fully imagine a network management system which is not a critical offering of a business, yet can be built using SOA. Those who focus on SOA being for business only miss out on the big time benefits that can also be had by applying it to the technical infrastructure.
- SOA does not require workflow management / BPM etc. :These are often relevant, but their existence is irrelevant to whether an architecture is SOA. What is critical is if it is built using and around services. There are a whole range of possibilities of building systems or even system sets without getting anywhere close to Workflow Management or BPM.
- SOA need not even be distributed :While 99% of SOA systems are likely to used remotable services (ie. distributed, over the network) there is no fundamental mandatory requirement that this be so. One could potentially build a desktop based personal finance application which is built using SOA (though I do not necessarily see the economic prudence in going down such a path)
- SOA does not prescribe messaging or event based processing :There is no such requirement, however many enterprises choose to have such services in their enterprise mix
- SOA is not about web services, REST etc.: DCE, Tuxedo, CORBA, REST, XML, json etc. are all implementation protocols and/or transports and/or data formats for SOA (I have designed and built systems using each one of them, so at least I am not using their names loosely). SOA doesn’t dictate that a particular technology / protocol / transport / data format be used. It is the enterprise architecture which makes the choice. In fact, the most successful SOA transport till date is HTTP. Twitter API which offers data formatted using json or XML or HTML over HTTP is a great example of a “simple service” and to the best of my knowledge its method to post a message gets invoked more than many many million times on a busy day.
SOA is a classic case of a simple idea being force fitted into a grand panacea. If one goes by the premise that a twitter API is indeed a SOA service, how can one justify the expenditure on training, consulting, outsourcing, software stack purchasing, hardware etc. etc ? Its not that the more advanced things like BPM or WS-* or “business services” are not important. It is just that these ride on SOA – they are not SOA.
So why is there a grand confusion on this issue. IMHO it shouldn’t have ideally been about SOA at all. It stems from the following :
- Service identification is hard : Identifying good, clean reusable services is a non trival task. Moreover when attempted at an enterprise level, there are too many stakeholders so agreement on what constitutes a service itself can be hard.
- Reuse is powerful. It creates territorial issues : Once identified, the services do not necessarily lend themselves to neat classification in an organisational structure. Moreover a piece of software that is reusable is worth far more than a silo’ed piece of software. This can lead to turf and clout issues whose resolution falls in the domain of Human Oriented Architecture
- The Hype :Thanks to the hype one now needs to get in an army of consultants, trainers, new software and hardware stacks, simply because it is supposed to be something really new and something big.
So you essentially have service identification and building which does not necessarily align itself with organisational structures and business functions by default within the context of territorial and ownership issues getting raised while absorbing a whole bunch of new consultants, trainers and vendors. And as a result we hear things like – SOA is difficult, SOA is complex, SOA sucks and most importantly we cannot agree on what SOA is. I am not saying that these problems are not important – its just that laying them at the doorstep of SOA is not being entirely honest.
So can I please put in a humble reminder – SOA defines itself quite nicely – it is Service Oriented Architecture, ie. an architecture built around services.
And finally, couldn’t help but include one of my tweets related to current debate on whether SOA is dead – “Knock, Knock, whos there ? SOA. SOA who ? So-out-of-fashion everyones calling me dead.”

