Posts Tagged: rest


2
Nov 09

Five important trends on the enterprise architect’s radar

It is no secret that the internet architectures are influencing enterprise architectures. This post attempts to summarise some of the recent trends in the internet space, which seem to be carrying some momentum sufficient enough to influence the enterprise. So without further ado, these trends are :

  1. REST : The Representational State Transfer architecture style builds on the essential elements of those constructs which made the internet so globally scalable. A detailed explanation of the rationale and strengths of REST are completely beyond the scope of this article. If your job requires you to be continuously aware of emergent trends and whether they fit your enterprise architecture needs – this is the one must explore trend.

    Impact : Web based architectures, Service Oriented Architectures, wide availability and immediate usability of data and processing requests (resources) through simple HTTP URIs and minimal integration effort

  2. Interoperable Cloud : The interoperable cloud is the ability to create a private cloud and also leverage a public cloud. This has been made possible by offerings such as the Ubuntu Enterprise Cloud which allows you to build a private cloud or use a public cloud such as Amazon EC2 while being able to access them using the same set of APIs thanks to open source efforts such as Eucalyptus. This allows you the flexibility of initially using either a private or public cloud and then subsequently shifting to the other, or being able to use both simultaneously.

    Impact : Large servers vs cluster of commodity servers, virtualisation, elastic deployments, flexible hardware procurement / provisioning, infrastructure management in organisational hierarchy.

  3. NoSQL : While I am unhappy with the name, it has stuck. This refers to a set of options now available to store your data unconstrained by many RDBMS requirements (eg. flexible schema, key value pairs etc.). Some of the databases also allow you to store data in a distributed manner over a number of servers with an intent to support high availability in write intense scenarios even as they may require you to move towards eventual consistency. These options increase your manouverability / flexibility as an architect even as they require you to meet a different set of challenges.

    Impact : Relational databases, data storage strategies, data distribution strategies, vertical vs. horizontal scalability, transactionalisation, consistency and availability

  4. Polyglotism : Developer costs now occupy an increasing percentage of total costs, development time is being an increasingly dominant factor for time to market, and ability of software to change and adapt quickly to newer demands is now a critical success metric. One of the solutions is to write different parts of the software in a different languages most appropriately suited for concise and rapid coding as well as supporting quick reaction changes to each part appropriately. Thus it is conceivable to have some of the business rules written in a dsl written using jruby and some of the algorithms written in clojure in a software built on the JEE platform.

    Impact :Development culture and processes, minimum developer skill and scalability, risk management for managing required vs. available skills.

  5. Decentralised processing : Thanks to many developments which are leading to increasingly distributed processing including REST and NoSQL, applications will need to be a set of collaborating network based components (we’ve heard this before with distributed objects as well). However especially given some of the lesser guarantees that such architectures can provide around immediate guaranteed processing, latency issues, distributed control and asynchronous processing, a particular piece of business logic may get satisfied in a staggered fashion across a number of collaborating components. This may increase challenges in terms of currency of available data even as it helps actually deliver on the vision of distributed objects and simplifies individual component development. While asynchronous capabilities such as those supported by MQ series and the like have been used in the enterprise for ages, I do anticipate increasing use of lighter messaging constructs such as PubSubHubbub within the enterprise.

    Impact : Application partitioning, network based components, difficulty in supporting fully synchronous workflows.


11
Jun 09

ReST : SOA, WOA or ROA ?

This is part 4 of a continuing series of posts on ReST. So you might want to read up the earlier ones as well (chronologically they are the three posts before this)

Nice alphabet soup indeed. But what style of architecture does ReST (Representational State Transfer) correspond to.

Before we get into any definitional issues which are hugely referred to in case of such debates – I shall be referring to the currently available definitions and descriptions as available on Wikipedia viz. Service Oriented Architecture, Web Oriented Architecture and Resource Oriented Architecture.

Service Oriented Architecture

As per the definition on Wikipedia (emphasis is mine):

In computing, service-oriented architecture (SOA) provides methods for systems development and integration where systems package functionality as interoperable services. A SOA infrastructure allows different applications to exchange data with one another.

Service-orientation aims at a loose coupling of services with operating systems, programming languages and other technologies that underlie applications. SOA separates functions into distinct units, or services, which developers make accessible over a network in order that users can combine and reuse them in the production of applications. These services communicate with each other by passing data from one service to another, or by coordinating an activity between two or more services.

On the topic of Service Orientation, it further goes on to state

Service-orientation is a design paradigm that specifies the creation of automation logic in the form of services. It is applied as a strategic goal in developing a service-oriented architecture (SOA). Like other design paradigms, service-orientation provides a means of achieving a separation of concerns.

While REST does attempt to solve many similar goals as SOA, I believe there lies an important distinction. The essential focus of SOA is to separate functions or automation services. An example here is to separate an authentication service from authorisation, monitoring, logging etc. A SOA architecture that consists of a number of SOA services assembles such “functionalities” into one feature consistent whole. But is that how REST works ? Only in a very vague sense even if arguably so. REST standardises the functions ie. GET, PUT, POST and DELETE in case of HTTP connectors. It is arguable that REST could be used with a different set of functions, but even in that case the function set is likely to remain consistent. This is further supported by Roy Fielding’s post “It is Okay to use POST” in which he argues

Some people think that REST suggests not to use POST for updates. Search my dissertation and you won’t find any mention of CRUD or POST. The only mention of PUT is in regard to HTTP’s lack of write-back caching. The main reason for my lack of specificity is because the methods defined by HTTP are part of the Web’s architecture definition, not the REST architectural style. Specific method definitions (aside from the retrieval:resource duality of GET) simply don’t matter to the REST architectural style, so it is difficult to have a style discussion about them. The only thing REST requires of methods is that they be uniformly defined for all resources (i.e., so that intermediaries don’t have to know the resource type in order to understand the meaning of the request). As long as the method is being used according to its own definition, REST doesn’t have much to say about it.

So across the board the functions remain the same and the data types (or media types) change. Sounds familiar ? Thats like supporting SELECT, INSERT, UPDATE, DELETE on a broad range of tables each having a different schema. An argument I make in an earlier post REST is the DBMS of the Internet. Moreover Roy further goes on to elaborate that in “REST intermediaries don’t have to know the resource type in order to understand the meaning of the request” and that “method is being used according to its own definition”. Each of this further introduces constraints into traditional service orientation. And these constraints make the field of use so narrow, that even though REST could be argued to be a teeny weeny specific use case of SOA, it could be argued to be Service Oriented to the same extent that a Database could be argued to be Procedure Oriented (since all tables support the procedures SELECT, INSERT, UPDATE, DELETE). In other words for all practical purposes REST is not Service Oriented.

Web Oriented Architecture

This is not yet a particularly widely used term yet, but I did come across a reference to it on a recent article in InfoQ “REST is a Style – WOA is the architecture“. Looking it up on Wikipedia leads us to the same sources as referred to by InfoQ – articles written by Dion Hinchcliffe. Wikipedia states it as follows

Web Oriented Architecture (WOA) is a style of software architecture that extends service-oriented architecture (SOA) to web based applications, and is sometimes considered to be a light-weight version of SOA. WOA is also aimed at maximizing the browser and server interactions by use of technologies such as REST and POX.

But I just argued in the earlier section that REST is only a very very specific use case of SOA, whereas the statement above says WOA extends SOA to web based applications. If we were dealing with objects and not architectural styles here, an argument that X is a specific (constrained) sub type of Y and X extends Y would instantaneously be flagged off as a violation of Liskov’s Substitution Principle (LSP). So something isn’t quite right here and the whole situation does not add up to a consistent whole for me. I will leave it to the reader to decide whether there exists a flaw in my reasoning here or elsewhere. My assessment is that if WOA is a collection of Web related architecture elements in addition to REST, then the only way to successfully and consistently resolve it is by saying if WOA builds on REST then it cannot be simultaneously extending SOA.

To be fair I couldn’t quite find the same worlds (WOA extends SOA) in Hinchcliffe’s writings. Referring to one of them “What is WOA ? Its the future of Service Oriented Architecture (SOA)“, he refers to another of his post “Beating a Dead Horse: What’s a SOA Again? All About Service-Orientation…” which in turn states

It’s here that John Reynolds’ well-known SOA Elevator pitch comes tantalizingly close to capturing the essence:

SOA is an architectural style that encourages the creation of loosely coupled business services. Loosely coupled services that are interoperable and technology-agnostic enable business flexibility. An SOA solution consists of a composite set of business services that realize an end-to-end business process. Each service provides an interface-based service description to support flexible and dynamically re-configurable processes.

This business view is right on, and doesn’t mean business in a traditional, white-collar way. In this context, “business” means the actual functionality of the system, apart from technical details.

There we go again on services. But Business Services cannot be GET, PUT, POST, DELETE. I would emphasise again that REST does not expose business services – it exposes some very basic CRUD services. So even if in this case there is no violation of LSP, the essential inconsistency still remains. WOA cannot be REST and SOA at the same time. This inconsistency is a bit worrying. But it is likely that Hinchcliffe meant that WOA is built on REST and is similar to SOA in terms of the goals when he says WOA is the future of SOA. But honestly I could not quite figure out how exactly he would want to describe the relationship between WOA and SOA.

Resource Oriented Architecture

The wikipedia page states :

Resource Oriented Architecture (or, ROA) is a specific set of guidelines of an implementation of the REST architecture.

REST, or Representational State Transfer (see Roy Thomas Fielding’s Doctoral Thesis “Architectural Styles and the Design of Network-based Software Architectures”), describes a series of architectural constraints that exemplify how the web’s design emerged. Various concrete implementations of these ideas have been created throughout time, but it has been difficult to discuss the REST architecture without blurring the lines between actual software, or the architectural principals behind them.

Since ROA is a set of guidelines of an implementation of a REST architecture, I think its a slam dunk conclusion that REST is consistent with ROA (for the silly reason that ROA seems to be defined using REST :) ).

Per Wikipedia, Leonard Richardson and Sam Ruby further provide the guidelines for ROA in “RESTful Web Services“, but again since the evolution of ROA stems from REST, it is unsurprising that REST is consistent with ROA.


10
Jun 09

Design Characteristics of REST / Resource Oriented Server Frameworks and Clients

This post is the third part of continuing series of articles on REST. The first one was Why REST ? and the next one was REST is the DBMS of the internet with hopefully some more to follow in the coming weeks.

Struts, Django, Ruby on Rails. We’ve worked with these and many other similar frameworks. Some time back I started thinking of what would a completely new ground up REST / Resource oriented framework would look like (ground up to ensure it had no legacy design to deal with). Would such frameworks be similar to the ones dominantly used today ? What about the ecosystem that surrounds and interacts with them (client libraries) ? And finally what about the implications on the fine grained object model (assuming there is one) and its relationship with the resource model ? This post deals with some of the thoughts.

There are some specifics the post does not address and is agnostic about :

  • Language : I shall be avoiding language issues as much as possible. Wherever I do bring in code constructs these may be assumed to be in Java (or pseudo-Java)
  • Convention or Configuration : I think both are valid choices in their appropriate contexts, and I don’t specifically emphasise one over the other in this post

The frameworks mentioned above are not the only ones out there. There are many, and some actually are very REST specific eg. Apache CXF JAX-RS or Restlet. It would certainly be interesting to contrast my thoughts with these, but for reasons of insufficiently detailed knowledge about them, I shall choose to skip it (better to not make any statements than making incorrect ones).

I shall be assuming a HTTP connector with GET, PUT, POST and DELETE as the constant set of operations. These four operations shall be collectively referred to as Resource Operations.

We shall first start with the server side characteristics, and the term ROF shall refer to a Resource Oriented (server side) Framework

A ROF will have a resource oriented interface : Certainly not a profound statement, but it was important to lay that down upfront. So what is a Resource Oriented Interface. Given a particular resource, a Resource Oriented Software will support or consume end points which allow you GET, PUT, POST or DELETE the resource. There is one reason why this particular constraint is relaxed just a little bit. Modern browsers do not support all the four methods easily eg DELETE and make it just slightly hard to use the PUT method. Hence these methods can also be invoked by using a URI segment containing the method name eg. delete.

A ROF will have an abstraction to represent a resource as an end point : Again, that seems to be pretty obvious. But there is a reason why I make it explicitly. In many situations we see controllers acting as end points. To the extent a controller acts as an abstraction for a resource end point which essentially only has the resource operations as public methods, it would fit this requirement. However if I was using an Order as a resource and if I introduced an approve method on the OrderController that would not be consistent with this requirement. That would need to be modelled as an OrderApproval resource which may on successful completion, effect a state change on the Order resource to the status ‘approved’.

This is where a potential differences with conventional frameworks arise. If I was to think of it from an EJB like perspective, I would model a OrderController as a Session bean and a Order as an entity bean. In case of lightweight POJO based model, I would have an OrderController as the endpoint exposed by say using Struts and model the Order as a entity POJO and map it to the database using Hibernate. In other non java frameworks, I would have a class to represent an OrderController and another one to represent the order along ActiveRecord pattern. But I would argue this separation is not entirely necessary, since what we want is something that implements a single abstraction mapping onto a Resource which also support the primarily lifecycle methods or resource operations of GET, PUT, POST and DELETE. But there is an issue to be worked through here. These resource operations are actually class level and not object level methods. Thus if we have an abstraction to represent the resource instance, the class level methods cannot be defined in the same class except as class level (static) methods. This is a tricky problem, and I would submit the designer may make one of two choices (a) Implement the resource operations as class level methods on the Resource abstraction (ie. they will get or return the resource references as method parameters and not rely on the ‘this’ or ‘self’ qualifier for getting access to the resource variables or (b) Implement the resource operations as methods on a separate one-to-one mapped class on the resource abstraction (eg. an OrderHome in case of an EJB like analogy)

Given consistent expectations of the Resource Operations these will actually be auto-magically implemented : Thats a bit of a turnaround from what I was just describing in the earlier paragraph. What I mean to suggest is that the class level methods I just referred to will be implemented within the framework. What the framework will allow are plugins to provide extended functionality at specific points. Thus a “public static Order Order.put(Order order)” method will be implicitly implemented by the framework. But before a put can be processed it needs to be validated. Thus the framework will allow the developer to plug in / override his own implementation for an Order.validate(Order order). There are multiple ways such plug-ins could be implemented. Depending upon the nature of abstraction, it could be an overridden method as I just described, or it could be a standalone method that is registered into the overall workflow (either through convention or configuration). The latter might be especially useful in case one wants to implement the functionality as stand alone methods or in case of functional programming languages. The plugin points provided could be framework specific. eg, One may want to validate a resource for consistency even at it is being read from the database. For the rest of the post I shall refer to these as plugins. In addition, framework will most certainly provide methods for for downstream handling of impact of PUT, POST or DELETE. This is covered in the next point. In case the framework chooses to not deal with persistence, it may choose to allow capabilities for integration with other persistence frameworks.

A ROF will provide capabilities to a developer to override or register methods to handle downstream impact of PUT, POST and DELETE : Before I get into the details of this, I encourage you to take a look at my earlier post REST is the DBMS of the internet in case you have not already done so. To summarise it quickly, I have drawn the analogy that a REST based system is like a DBMS where client applications can perform direct SQL such as SELECT, INSERT, UPDATE, DELETE (GET, PUT, POST, DELETE in case of HTTP/REST) on the Tables (Resources in case of REST), and the business logic is implemented as triggers. Thus the framework will need to allow the developer to define such triggers. Such methods will need to support ability to reject the request (in case of downstream validation failures), and update the resource state (to reflect the appropriate resource state after the completion of the downstream processing). It is also feasible to imagine scenarios where such methods are triggered asynchronously. Much of the logic of the traditional controllers which controlled interactions across multiple objects etc. is likely to now be shifted into these methods. I have no particularly good name for such methods. They could be referred to as triggers, event or message handlers, glue methods, extension points etc. For the rest of this post I shall refer to these methods specifically as ‘handlers’.

Note that the actual invocations to select, insert, update, delete the resource are *NOT* to be programmed by the developer. These are automatically handled by the framework. The developer essentially fills in the necessary logic to the plugin methods (eg. Order.validate) or handlers (eg. Order.onCreate)

A ROF will provide a mechanism to describe or map a resource abstraction to to the actual programming constructs : There are a number of ways this could be achieved. XML, YAML, DSL, Annotation – take your pick. Also the actual class could be defined (as in case of a POJO) and the resource characteristics mapped onto it, or the class may manifest itself at runtime based on metaprogramming around the metadata. Sample possibilities here are Hibernate like Resource-to-Object-to-Relation mapping (using either Annotations or XML) or a a completely metaprogrammed ActiveResource. One important aspect that the framework will need to cover is the situations where a Resource is a composite of many or partial underlying business objects. eg. an Order resource instance could theoretically span one Order instance and many OrderItem instances. Thus a one to one relationship between a resource and underlying business objects (or datastructures) is not assumed. What is assumed is that the framework will allow such relationships to be described or introspected.

A ROF will allow resources to be mapped onto URI or URI segments : This is too obvious an requirement to be explained and is mentioned here only for completeness.

A ROF will allow foreign keys across resources which manifest themselves as URIs to be mapped onto the underlying business object references : Resources refer to each other through URIs. The underlying business objects refer to each other through object references. Given the resource descriptions and URI mappings, the framework should allow for a transparent referencing/dereferencing between such URIs and the object references.

A ROF will allow factory methods for locating or allow injection of other resources / business objects : Within the handler functions, developers will need references to the associated resources or business objects. I say resources or business objects, since the developer may choose to interact with these at a coarse grained (resource) or fine grained (business object) level. The framework should allow the necessary support for such activities.

A ROF may provide additional support for typical aspects of lifecycle (eg. validation) : While I mentioned validate as a possible plugin function. However given the omnipresence of validations, the framework may provide additional support for such activities. Thus the framework may choose to automatically implement such capabilities using the resource descriptions.

A ROF may provide capabilities for domain specific extension of resource capabilities : Certain domains have standardised mechanisms of working with resources. As an example most banking systems based on the four eyes principle require approval activities. While this particular aspect is much tougher than it seems, a ROF may choose to allow extension of such capabilities using template like functions or mix ins. As an example in this situation, once an Order resource is defined, an OrderApproval resource will be automatically made available as will the GET and PUT methods on it (POST and DELETE in this particular case may not be relevant), as will the necessary and appropriate handler functions on OrderApproval.

A ROF will provide capabilities for automatically generating the resource representation from the resource and vice-versa : Resources manifest themselves in multiple possible formats eg. XML, JSON etc. An ROF will allow such transformations between the representation and the resource/business object instance automatically.

A ROF will provide capabilities for assembling more complex representations using templates : In many situations, especially when the representations are being composed for manual (browser based) consumption, additional resources may need to be pulled into a view. A ROF will allow for such assembly of resources to be composed into a final view using templates.

A ROF will allow for introduction of appropriate additional URIs in views using templates : Thanks to HATEOAS (I’ve really avoided it thus far :) ), the framework will need to allow some mechanism of describing what are the additional context specific URIs to be included in the final representation. The template logic should allow the developer to specify such URIs.

A ROF should allow for the resource / media-type descriptions to be shipped in band with the resource representation : Since REST allows media types to be auto discovered and auto described, the framework should allow for the metadata for such media types to be also presented to the client. While I think it is essential that such in band information should be conveyed on demand, the framework may also optionally support upfront interrogation for media types and their details, which will require such information to be shipped out of band as well. I am not aware of any specific standards around such interrogation APIs so the framework could implement a custom API for the same. The actual metadata could be represented using any of the typical appropriate standards such as RDFa, XML Schema snippets etc.

A ROF should optionally allow support for auto generation of bindings for clients : I really really cringe as I write this. I cringe because to me the great attraction of REST is the simplicity and the ease of introducing incremental integration. The client binding generation (especially if it is statically generated) flies in the face of many accepted lightweight design scenarios. However I think there are likely to be some situation where availability of such client side bindings would be helpful. When possible (eg. with dynamically typed, metaprogramming capable languages like Python or Ruby), such bindings should be dynamic. In such cases the client can automatically introspect the server side media types and make available the necessary client side objects on the fly. In cases where statically typed languages such as Java or Scala are used, the client side may choose to expose everything as generic datastructures (e.g trees of name value pairs) or may allow for generation and compilation of client side bindings. I have no specific thoughts around the API support needed on the client side, except that quite obviously this would include support for the resource construction, resource operations etc. and that they would allow the client to interact with the server using the underlying language constructs rather having to work at a raw HTTP level.

In addition to the characteristics described above, I suspect frameworks will have many other optional characteristics such as support for monitoring, auditing / logging, transaction management, object pooling etc. etc. However these are unlikely to be particularly interesting when focusing on the framework aspects especially from a resource oriented perspective, which is indeed the focus of this post.

Update : InfoQ covered this blog post here : Design Characteristics Of Resource Oriented Server Frameworks


9
Jun 09

REST is the DBMS of the Internet

After my fortunately rather successful post “Why REST ?“, I had planned to write another longish followup roughly titled “Implications of REST on software design and frameworks”. However I had an interesting exchange of Twitter DMs (direct messages) after the post which gave me the right words I was looking for to summarise this impact on software design. This simple example was so compelling, that I decided to make that into an independent post and delay the “Implications ..” post by another couple of days. So at the risk of giving away the very essence of my subsequent post, here’s the summary.

The Twitter DM’s exchanged were as follows :

@sbidwai : is REST like object oriented implementation for services, where as SOA is procedural ? thinking loud..

@dnene : a slightly better analogy would be SOA is like invoking stored procedures, whereas REST is like invoking SQL on the table

@sbidwai : agreed in parts.. but most impl will hv much more than CRUD.. eg, twitter rest apis..*

@dnene : CRUD is the interface. To extend the analogy, logic is implemented as triggers not SPs. (My Opinion)

* CRUD in our shared vocabulary stands for Create, Read Update, Delete.

As I subconsciously wrote the last DM, it suddenly dawned on me that this was the one concise way to express how REST architectures would impact software designs.

To summarise the exchange differently

If WS-* is the RPC of the Internet, REST is the DBMS of the internet

To expand on it a bit more :

Traditional SOA based integration visualises different software artifacts being able to interact with each other through procedures or methods. REST effectively allows each software artifact to behave as a set of tables, and these artifacts talk to each other using SELECT, INSERT, UPDATE and DELETE. (or if you wish GET, PUT, POST, DELETE). And where exactly is is the business logic ? Is it in the stored procedures ? Not Quite. Its in the triggers.


8
Jun 09

Why REST ?

Introduction

It is becoming apparent that even as it becomes popular, REST (Representational State Transfer) is not yet as well understood. This might seem a surprising statement, but a lot of us use REST thanks to many frameworks supporting REST like interfaces, have a sense of what REST like interfaces are like (even if such an understanding is not sufficiently accurate), and exercise our common sense in using such interfaces. Having said that, let me clarify that while the internet is full of documentation about the semantics of REST, its actually quite light on the rationale for REST (including Roy Fielding’s dissertation which is the reference document for REST). Thus I have had to intersperse REST semantics and historical narrative with some personal opinions. So treat this as a part opinion and feel free to question my thought where you think it does not make sense.

Audience

If you are a REST expert, you are likely to have figured out much of this any ways by now. If you would like to understand specific technical semantics about REST, again this may not be the best article to read. However if you are curious about REST and would like to read a perspective on why and how it makes sense read on.

Flow

I shall be meandering through a historical narrative in the first half before starting to make the points I wish to make in the second. Lot of the points I make in the first half are likely to be those you already are aware of. However these are being made to allow an immediate recall when you read the second half. It is quite important to have read the first half to understand the perspective I put together in the second.

Historical Narrative

FTP

I am sure you have used FTP (File Transfer Protocol) a few times even though nowhere as frequently as HTTP (HyperText Transfer Protocol). Let me quickly present some characteristics of FTP

  • Given a FTP client, you can connect to any FTP server so long as you have a valid userid/password pair for the server (or anonymously if the server so supports).
  • The home directory on connecting to a FTP server is typically your starting point. At this point typically you can execute the ‘ls’ or ‘List Directory’ command to list all the files and directories within the home directory.
  • If a file interests you, you can get it by issuing a get command
  • If a subdirectory interests you, you can further navigate into that directory by issuing a ‘cd’or ‘Change Directory’ (or often by double clicking on the directory in case of a graphical client).
  • If you would like to add a file to the current directory you can issue a ‘put’ or ‘Upload’ command.
  • While you have the flexibility to navigate from one directory to another, you soon realise that every file and directory is uniquely addressable by its fully qualified path (either absolute or relative) and you can refer to each file and directory by its path. You are also aware that a valid path will uniquely resolve to only one directory or file.
  • At each stage as you navigate into a separate directory, the server allows you to retrieve the list of subdirectories and files within your current directory. It always shows you the current state of that directory. Thus even if you were to list the same directory twice, and someone else uploaded a new file or created a new subdirectory successfully between the two requests, you will see the reference to the new file / directory when you request for the listing the second time.
  • At every stage you issue a command, the FTP client+server work together to service the request (or issue the appropriate error message as necessary), and then pretty much forget about what you did. In other word the server keeps no track of and shows no awareness of what you have done earlier in your session (though it does remember who you are primarily from a security perspective.
  • To work with a FTP server using a command line client, you primarily need to understand the usage of four commands (verbs) post a successful connection. These are ‘ls’ to list the contents of a directory, ‘cd’ to navigate to a different directory, ‘get’ to download a file, and ‘put’ to upload a file.

So FTP allows us to upload and download files. But does it allow us to ‘do things’ ? Sure, so long as you combine it with a few more pieces in the puzzle. Let us say, we are back in the late 80s (prior to the invention of HTTP) and I want to send and a list of purchase orders collected by a local office to a central office for further processing. This requires the following elements to be added to the mix.

  • A shared understanding of where the files will be uploaded, how they will be uniquely named, their specific file extensions (optionally) and the specific format of the file eg. Comma Separated or Lotus 1-2-3 or WordStar or WordPerfect (the popular application software of the day) including the positioning of the various fields in the file, and
  • Preferably a daemon process on the central office computer (the FTP server) which regularly scans the directory, parses each file as it comes it, does the relevant processing on it, and generates the appropriate result files and places them in the appropriate directories using the shared understanding of the directory structure and the file naming convention to communicate back the results of the processing.

Ho Hum. This was all stuff you knew. But the reason I brought it up is that FTP and how it was leveraged then has a lot to do with the principles that govern REST as we shall later see.

RPC

Back then in the 80′s FTP wasn’t the only mechanism to transfer data between machines. One more of the many other options was RPC (Remote Procedure Call). It not only allowed you to transfer data across machines, it actually had built into itself, a contract to remotely execute software. Unlike FTP which merely transferred data (in well understood units called files), RPC allowed you to invoke remote procedures by supporting an ability to pass messages which included the message name and the values for all the parameters necessary to be supplied to the message. Unlike FTP which was meant to do data transfer across a network, RPC was geared to do things remotely.

A contrast between FTP and RPC

If the objective of network computing is to use the computers and networks to ‘do things’ one would assume that many more people would use RPC than FTP for the same. While RPC did get used as a technical substrate, at a business processing level FTP got used far more (eg. to send and remotely process the list of purchase orders). There are some important reasons that we need to understand here.

FTP required understanding of very few basic verbs (ls, cd, get, put). Thus the training required to understand FTP semantics was far less than that for RPC. This was partially due to the fact that RPC had a programmatic interface. To the best of my knowledge there are no widely used command line clients for human interaction with RPC services. In addition, each procedure required a set of semantic data (parameter) associated with it. This was no different than FTP which also required similar data to be shipped over the network. Turns out there were a few distinctions. First, the nature of design of RPC services often required combining application data with control data, and there was also often a sequential expectation due to the RPC business transaction being broken up into multiple RPC calls perhaps for the sake of efficiency. Moreover each time, new procedures were added or parameters added, these required programmatic changes. FTP on the other hand was simpler. In most cases the entire data (including some redundant data perhaps) was sent in one block (or file in FTP parlance). By dealing with a file as the least common denominator, the FTP stack decoupled itself from any application specific semantics. Moreover, depending upon the agreed upon format, the files could be edited at either end by by human actors using specific software such as plain text editors or word processors or spreadsheets. Moreover if the formats changed, wherever such files were being manually edited, no programmatic changes were required. Irrespective of the changes you made to the file formats, file processing software, the FTP stack itself did not change – it remained stable.

Less is more

A theme I shall come back to again is many a times less is more. FTP had far fewer training requirements (few basic verbs). FTP did not deal with parameter value formatting (though other pieces of software subsequently might have to). FTP was just so much easier to start working with. FTP did not actually preclude any of the capabilities of RPC from being introduced, it merely allowed this to be added subsequently as additional optional layers (or subsequent elements in the processing pipeline). Finally FTP allowed users to deal with the data in the units and the formats and the tools they understood the best – their day to day application software components and simply focused on only transferring files, while imposing only one requirement – each end should work with a file as a unit, and both ends should understand the file formats. By focusing on file as a unit, each business user could focus on the data he/she wanted to deal with in the format that was most appropriate (an analogy in REST would be a resource .. but I’m getting ahead of myself). And at the end of the day, by doing less, FTP ended up being much more popular and thus doing more.

Email

Other protocols widely used on the net were SMTP / POP which were used for email. Email eventually was considered the killer app for the internet. Similar to FTP email focused on the users getting to learn only a few basic verbs and exchanging the basic unit of data transfer (messages) using these verbs. Again, even though email itself didn’t get things done, it contributed far more heavily than RPC to getting things done, by having other manual or software actors at either end of the messages who did the necessary processing required.

WWW (World Wide Web)

While email was the killer app for the internet, the one that really brought it to masses was the world wide web which was based on the HTTP protocol. While HTTP could be used to ship documents of a variety of types (often classified by their mime-types), the defacto type of document used the HyperText Markup Language (HTML). Unlike FTP and email, this required the authors to understand a new language, but used a simple markup syntax to keep the learning curve to the minimum. It however introduced a very powerful element – the embedded hyperlink. While the earlier technologies supported a uniform identifier for each document / message, the hyperlink allowed references to other documents / messages to be embedded thus converting the document pool into a document web. We now had the ability to navigate from one document to another and such navigation retained the contextual relevance by embedding the hyperlinks. There were other enhancements as well such as introduction of more verbs (eg. POST and DELETE, the latter not really being supported by any of the browsers). Allow me to state the salient points of WWW despite the obvious duplications with some of the points I listed under FTP (for the sake of emphasis). Note that the scenario I describe below is primarily describing static web serving (except to the extent of file uploads) and does not address the presence of a dynamic web application.

  • Given a web browser you can connect to any web server optionally using the appropriate authentication credentials.
  • Typically the home page of the web server is your starting point. At this point you are shown the document which usually include embedded hyperlinks to other associated documents on the web server.
  • You can get/view/download/save a document by clicking on a hyperlink pointing to the document.
  • Some web servers may be configured to allow you to browse a directory. Clicking a hyperlink pointing to the directory allows you to see a directory listing which shows all the subdirectories and documents within the current directory. Each such subdirectory or document is also shown as a hyperlink to allow you to navigate to it.
  • Some documents have an form including an embedded file field and a button which allow you to upload a new document onto the web server.
  • Each document (and directory if directory browsing is turned on) has at least one identifier which uniquely identifies the document – the URL. It is feasible to directly navigate to the document if you are aware of the URL.
  • Navigating to a different document often provides you with a different list of embedded hyperlinks which are contextually relevant to the document being viewed.
  • At each stage the web server is not aware of any other information about you apart from your authentication credentials, and is not generally aware of your browsing history (except what may be stored for audit purposes on the web server logs).
  • As a user the primary skills you need to grasp is the ability to enter a starting URL, and then being able to navigate from document to document by clicking on the hyperlinks. If you are uploading documents, you may in addition need to know how to specify a local file path and press the Submit button to upload the file.
  • While HTML documents are the defacto default, the same capabilities can be used to serve any types of documents. The server usually identifies the document types by the registered mime types, and the browser may either render the document itself or call upon the necessary add-on plugin application to render the document based on the appropriate type or may in some cases simply save the document locally in case no such application is available for further processing.
  • Usually but not necessarily the document name have the characteristics of a noun

Again the reason I listed these characteristics is that these have a tremendous commonality with those of REST (except that what I refer to as a document above may get referred to as a resource in REST parlance).

SOA : DCE, Tuxedo, CORBA, RMI

Even as the web was evolving other technologies which allowed for more sophisticated remote service invocations were being developed. Along with RPC, these were essentially different technical manifestations of Service Oriented Architecture (SOA) principles. While these are substantial developments in their own right, the relevant points to be made in the context of this article are :

  • Each SOA service supported the ability to define a set of service semantics which included the service name, the parameters to the service, an ability to expose the metadata of such semantics, an ability to leverage such metadata and invoking such services either statically or dynamically from a remote client.
  • Many services were usually expected to “do something” though quite often some services would simply return the requested data. Usually but not necessarily the services were identified by using ‘verbs’.
  • Some of the SOA services allowed maintenance of a client state on the server, and allowed the server to do processing conditional on the client state.
  • These technologies almost invariably required some kind of programmatic effort at both the client and the server end. Manual specification of the service parameters and manual invocation of the service was simply not a typical use case. Neither was a default rendering of the results easily available to be manually viewed by an end user.
  • Unlike retrieving or storing a document, these services often were expected to have a far more complex functionality.

CGI, dynamic web applications and Web Services

Clearly as WWW started getting used far more, people were only too keen to use it for much more than storing or retrieving documents. This led to the development of CGI and subsequently other dynamic web application technologies (eg. LAMP, J2EE etc.) which would allow us to use the web to ‘do something’. Since these were clearly offshoots of the SOA world, being mapped onto the WWW infrastructure, the characteristics of such dynamic applications often had a lot in common with SOA, and they started dropping many characteristics of the traditional static WWW. Thus was born the child of the world wide web and distributed service oriented architectures – web services. This led to newer SOA technologies such as WS-* and SOAP.

Like the typical scenarios after the discovery of any highly profitable opportunity, the early rush was to leverage the opportunity and it was only a little later when the dust died down, that people started wondering if they had sacrificed something in the heat and dust of the moment. That stock taking resulted in the realisation, that some of the very basic characteristics of the extraordinarily successful internet technologies (FTP / SMTP / WWW) had been diluted, and even if such dilution still allowed immediate progress to have occurred, some of them would need to be corrected to be able to continue the explosive growth that had been seen so far. One such exercise in my opinion is the laying down of the REST architecture style.

REST Semantics

While REST brings back many of the characteristics that made internet so successful back to application design, it should be noted that many of these are not precluded by Web Services or SOA. However what are mandatory characteristics in REST are in some cases missing from but in most cases quite feasible to implement in traditional (non REST) web services by using additional best practices. Also note that each characteristic is not necessarily universally superior. So do evaluate it in your context to see if it makes sense. However before we get to the benefits of REST, a quick synopsis of REST technical characteristics might be in order.

While a full description of the REST technical aspects is completely beyond the scope of this post, I summarise these below. You might notice the strong parallels between the characteristics of FTP and WWW and those of REST even as REST adds a few more capabilities. The reason I portray them in the form below in a manner quite similar to the way I portrayed the characteristics of FTP and WWW is to emphasise that REST actually continues to leverage the same characteristics that made these technologies so popular and globally scalable, even as it just adds those few minimally necessary capabilities to achieve the same scalability for not just transferring documents or rendering pages but to ‘do something’. In other words it brings together the characteristics which made the internet technologies so popular and applies them to the inter application integration, component and service orientation, and application mashup scenarios to allow them to achieve similarly large adoption and to perform the tasks necessary in the given context (or ‘do someting’ as I have continuously referred to).

REST characteristics

  • Resource and media types as the basic units : REST treats a resource as the basic unit of data transfer. Such resources could refer to anything in the particular context eg. a flight reservation, an invoice, a video etc.
  • Unique resource identifiers :REST requires that each resource have at least one identifier which uniquely identifies that resource. This makes it easy to be able to bookmark resources or make them searchable.
  • Each resource has at least one representation :Each resource can be expressed using a variety of representations. This could include HTML, XML, CSV, JSON etc.
  • Each resource has often one default manually readable representation : In most cases but not all, each resource has a representation that can be manually consumed using a web browser. Such manual representations are often either XHTML representation with associated CSS, but could theoretically be some custom representation rendered by a delivered on demand custom javascript or Java applet. Note that this is not a requrement of REST but is a practice often followed.
  • Each resource has a type. REST supports self describing media types : Each resource has a type (referred to as media type since REST refers to the resource web itself as hypermedia). The type influences the data semantics of the resource, and the type itself can be self documenting using a variety of technologies (eg. one possible way is to specify XML schema descriptors).
  • Each resource representation optionally includes contextually relevant hyperlinks to other resources :This not only allows the clients to auto discover associated resources, but also allows the server to clearly communicate the contextually relevant links based on an application state.
  • REST resources are indexable and search engine friendly : A consistent resource naming and representation allows for easy indexation and search engine integration.
  • REST requires minimal starting point intelligence : Typically one only needs the initial URL for being able to integrate with a REST implementation. All newer resources are often dynamically discovered. Since these media types also document their own metadata, client agents can automatically discover more information about them. Thus media type metadata rather than being compiled into the REST client can be dealt with dynamically or by using code on demand agents for dealing with the appropriate media type (similar to browser plugins)
  • REST encourages a uniform interface. :Typically this manifests itself by the minimal verbs being used to describe REST operations.When used with HTTP these are GET, PUT, POST and DELETE. This reduces the intelligence requirements on the client. Additionally clients may be capable of parsing metadata for the resources based on standard formats such as ATOM or XML schemas. The context specific intelligence required on the part of the client is no longer in the verbs it has to understand (method names) but is now in the resource types that it may need to manipulate. Thus if a client can deal with resource identification, resource representation, self descriptive messages and hypermedia, it can start dealing with REST.
  • REST supports value addition by intermediate processors : REST supports the scenarios where intermediate processor units can provide additional value addition. These could include processors which provide caching support or those that provide resource enrichment capabilities.
  • REST encourages usage of scalable practices :By precluding usage of conversational state and sequential assumptions, REST implementations tend to be easier to scale even as they compromise on efficiency at times (due to redundant data transfer or additional processing requirements)

REST benefits

Having described many of the REST characteristics the following could be interpreted as the benefits of adopting a REST style architecture.

  • Default RenderingIn case of most REST implementations, you can quickly provide a default HTML rendering capability. Thus even as you provide a REST interface to allow inter application integration, customers of such an interface do not have to wait for building the programmatic capabilities for leveraging it, they can get started immediately by being able to manually view all the resources and their states manually and by navigating around the interface by using a plain web browser. This substantially reduces the entry barriers for your customers, and allows them to get more conversant with your media types even as they are still figuring out how to programatically leverage the capabilities.
  • Self describing / auto discovery of media types and capabilitiesThe traditional web service semantics rely upon clear upfront documentation of media types, their schema and the API semantics. Thus the metadata about the service is often communicated ‘out of band’ from the actual service itself. This is required so that the clients can understand all the valid end points and service semantics up front before they can leverage the services. Not so with REST. Given an initial starting point, REST greatly encourages a contextual provision of the relevant additional interfaces (hyperlinks) as a part of the the document / resource data itself. Thus clients do not upfront need to be aware of all the end points (resource URIs) to be able to leverage the services. Moreover REST supports self describing media types as well. Thus the schema information for the resources can be shipped ‘in band’ with the resource representation itself. This allows for clients to discover new media types or changes to their schema and even allows the default rendering of the same without having to upgrade the programmatic components to leverage the newly discovered or modified media types / schemas. Finally the code on demand capabilities (these are optional) of REST can allow code to be downloaded to automatically parse or render such newly discovered or modified media types.
  • Encourages scalability even at the cost of efficiencyAspects such as non maintenance of conversational state, greatly increase the scalability of REST applications even if they do incur a minor cost in efficiency (which can be due to repeated redundant communication of data elements, or additional processing requirements due to preclusion of conversation state). This makes it relatively easier to set up multiple servers as the demand for the REST capabilities increases. Having said that, let me quickly add a caveat that designing clustered applications even if with REST interfaces is not always trivial, and while REST makes it “easier to scale” that should not be confused with “easy to scale”.
  • Resource / Data semantics are much easier to understand than Service semanticsTo put it differently an invoice structure is much easier to understand from a data perspective than an invoice processor API. This makes it easy for the clients. This often also makes it easier for the server side implementations. Service semantics often bring in issues of sequence, client state and other control information, most of which can be avoided using REST. Generally speaking expectations are simpler to lay down and meet when specifying resources rather than services.
  • Clear naming and accessibility of each resource in your universeWeb services don’t mandate clear unique identifier for all your resources. Thus sometimes it is not possible to reach a particular resource except through a convoluted series of steps. In some cases some resources are inaccessible for ever. As an example, many online shopping experiences end with an invoice being shown, but I have often found it impossible to later on pull up that invoice that was earlier shown to me at the end of a transaction.
  • Extensible resource types which are optionally dealt with by clientsNot only are resource types self describing, REST makes it easier to convey additional extensions to such resource types by using additional URIs within the resource representation as well. Thus even as a representation lays down a variety of field values (say for an invoice), there might be other associated resources which might either be optional or variable media types based on the context (eg. purchase order / quality report etc.) which can be easily referenced by simply including their URIs. Such additional information does not require the basic media type to be enhanced or by introducing attachments to the media types. These can be implemented as additional navigable out of band media types. Thus clients don’t have to deal with them, but they can do so easily when they choose to do so. Thus the client has a choice to not deal with the additional media types when they do not make sense in the client’s context.
  • Search engine friendlinessWhile resource directories help for smaller scale integration (eg. Yahoo when it started off, attempted to categorise the web), such directories or registries are often found to be tough to scale beyond a particular threshold (thats why Yahoo or Google now provide entry points by allowing us to search through all the web resources). Consistent resource naming and representation make REST resources search engine friendly and allow additional entry points into a REST service based on search criteria. This makes location of newer resources far easier than what might be feasible through a resource registry especially on a large scale.
  • Easer layeringWhile it is possible to add intermediate proxy services for enriching the capabilities of a REST implementation, it just makes it seem a lot easier to implement these as and when required when the underlying architecture is REST based. Thus while mashups can be readily implemented using both REST and traditional SOA implementations, I would submit that these are much easier to implement on REST based architectures.

I have used the word scalability above in the context of the ability to service the runtime demands of a larger number of clients. However REST helps makes your software artifacts become scalable in one more way. By providing a basic and minimal uniform interface requirements, REST allows your applications / services / components a low entry barrier path into being a participant in a broad web of similar others who all agree on the basic REST semantics. This substantially increases the potential number of clients to your services since they can leverage these services easily and with low entry barriers. While traditional SOA technologies attempt to provide the universal access to all possible consumers, REST with its emphasis on minimalism, simplicity and low entry barriers actually makes it practical. Similarly REST makes it easy for you to start consuming other services and mashing them up with others to service your clients (pun intended) quickly. Finally REST takes the the very characteristics that made document and message sharing so easy to use and popular (characteristics which are not necessarily found in all conventional SOA implementations) and combines them with the necessary elements to achieve transaction processing, application integration and mashups (use the web to ‘do something’) on a truly global scale even as it makes these capabilities easily available and cost effective to leverage.

Some concluding comments

While not directly relevant as REST rationale or REST benefits, I thought it might be useful to add a few more associated comments within the context of REST usage and adoption.

Simplicity and bottom up adoption

I must confess, my biases show up quite strongly in this paragraph (so feel free to treat this as a partially prejudiced statement). Simplicity is not per se a characteristic of REST. However it does stem from the nature of genesis of the competing options. While most internet technologies using an incremental, evolutionary approach, most SOA technologies have been designed by a committee. This is why the consulting and development budgets required to implement FTP / email / Web especially on a per utility basis are far different than those likely for implementing DCE, Tuxedo, CORBA and SOAP. Part of the reason is also due to the fact that most internet technology adoption is bottom up, while that of SOA often is top down. While top down may seem attractive, it may seem sobering to realise that most top down processes break down beyond a particular scale. Thats why free markets on the whole have trounced centrally planned economies (though some recent happenings do point to limitations of the same as well). Thats why internet scale simple inter application API integration and mashups took off even as intranet scale application integration was mired in budgeting, territorial, enterprise modeling and governance issues. Thats why the LAMP stack (eg. PHP) which hasn’t been particularly strong in the non web arena, is deeply entrenched in the web based application space. Sometimes it just is more productive to quickly implement a simpler technology and incrementally enhance it rather than attempting to cover all possibilities, options, and border conditions by putting a committee in place. At its very core, REST requires only incremental understanding of newer technologies, is easier to incrementally adopt and is less likely to get mired in organisational issues. Precisely the characteristics that FTP, EMail and WWW had.

Simpler abstractions win

I have generally found that simpler abstractions even though harder to deal with initially, often win in the long run. Notice the fact that the bare bones rendering functionality of HTML/WWW completely trounced the rich UI and application integration capabilities then available (eg. Windows/Java and DCOM/CORBA/RMI). This is not to suggest that the extra capabilities are not required. That is why Rich User Interfaces on WWW continue to be a dominant part of the internet technology wishlist. However the simpler, cleaner and minimalistic abstractions often are far more important than feature richness. A point I would want to make in favour of REST even as I admit that conventional SOA technologies are far more feature rich than REST.

REST is not SOA

I must confess, for a long time I believed REST was merely a specific usecase of SOA. However recent thoughts lead me to believe otherwise. There is indeed a reason for such potential confusion. REST based architectures and SOA may often attempt to service similar goals. To the extent of servicing such goals, REST may look like a substitutable component for other SOA technologies such as SOAP. However even as they attempt to meet similar goals, REST attempts to view at your architecture artifacts differently. REST encourages you to view and model your architecture as a set of resources rather than services. There are important implications of this not just in terms of the many benefits I describe above available under REST but also in terms of the design and architecture characteristics of the implementation. Treating REST as just another way to implement SOA sometimes encourages one to miss out on the subtleties. These however are beyond the scope of this post, and I intend to cover the same apart from the implications of REST on software design in my next post.


7
Apr 09

What is statelessness in REST ?

In my post Fomenting unREST : Is RESTfulness a semantics game ? Why does REST require statelessness ?, I had commented on some of my thoughts on dealing with some of the constraints prescribed by REST one of them being statelessness. Some of those thoughts continued to churn in my mind, and I had a few helpful interactions along the way which led me to what I believe to be the “Aha moment” on statelessness in REST. That doesn’t mean I’m necessarily right, feel free to comment on my thoughts in case you believe any differently or have a nuanced opinion.

Background

Per section 3.4.3 of Roy Fielding’s dissertation

The client-stateless-server style derives from client-server with the additional constraint that no session state is allowed on the server component. Each request from client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server. Session state is kept entirely on the client.

The same thought is further commented upon in section 5.1.3.

We next add a constraint to the client-server interaction: communication must be stateless in nature, as in the client-stateless-server (CSS) style of Section 3.4.3 , such that each request from client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server. Session state is therefore kept entirely on the client.

Like most architectural choices, the stateless constraint reflects a design trade-off. The disadvantage is that it may decrease network performance by increasing the repetitive data (per-interaction overhead) sent in a series of requests, since that data cannot be left on the server in a shared context. In addition, placing the application state on the client-side reduces the server’s control over consistent application behavior, since the application becomes dependent on the correct implementation of semantics across multiple client versions.

The potential confusion areas

Given the clear prohibition of storing client state on server, there are some typical idioms which do get challenged as REST unfriendly. eg.

  • Logging in to obtain a session token, which is subsequently a validation of an authentication status
  • Binding additional attributes to such a session token on the server side with variables such as :
    • Computed and cached access privileges of the user
    • Conversational state eg. fields entered on page 1 of a two page form

The question there is are all he above scenarios inconsistent with REST ? Many articles seem to suggest that that would indeed be the case. as an example one of them is Ajax and REST, Part 1 Section: Violating the “stateless server” constraint

My thoughts

Clearly in the above example computed data such as access privileges, or for that matter user’s time zone information are available to the server even if these are not in the session so long as the server has the user id. So long as a user id is being supplied (or a proxy such as the session token id), such data being stored in the session is simply a server optimisation and thus does not violate REST guidelines since this is application state and not conversational state.

To the extent the data is storing conversational state such as the fact that the user has authenticated himself and the fields that the user may have entered in page 1 of a two page form, such is incompatible with REST guidelines, and if such practices are adopted the resultant design may not be termed fully REST compliant.

So the existence of a user token does not make the design REST “uncompliant”. Storing conversational state in a user session usually associated with such a token does.

One way to ask if a data is conversational or application, may be to ask oneself, that can the request be satisfied properly if it was accidentally routed to a different server in a situation where the cluster was configured for session affinity. In a REST compliant design, the other server will have a capability to still be able to service the request. However this requires the presumption that the cluster is configured for session affinity and session state is not available to other servers. Should one configure the cluster with shared and distributed sessions, the cluster will be able to service the requests even when the APIs were not REST compliant.

The one thing that does leave me a little uneasy is that to be fully REST compliant one will need to always supply a userid / password with each API call instead of a server token. That does have implications on security, implications that I am not entirely comfortable with.

Update :Surya in a comment in earlier post refers to the fact that authentication need not be done through the URI but through the headers or through protocols such as OAuth. So what that leaves me with is the assessment that existence of a token or a session object is not a violation of REST but storage of conversational state in the same is.


17
Mar 09

Twitter / HTTP / REST API Invocation Infrastruture using data pipelines

This blog post is not about twitter API programming, though thats what it does deal with. It focuses on the intermediate level infrastructure (ie. higher than the HTTP/REST APIs exposed by other sites but lower than the class libraries that surround those) necessary to work with HTTP/REST based APIs being offered by various web sites.

I have been working on scouring and analysing twitter data for which I have been having to work with continuously accessing twitter on a sustained basis for a number of days. I started refactoring my code recently, and this blog post details the results of that exercise. More specifically in the context of invoking HTTP APIs it deals with the following aspects (many of them which are specifically introduced due to twitter.

  • Ability to make HTTP Calls and deal with HTTP error conditions
  • Ability to deal with Connection and other failures and allow for auto retries
  • Ability to make HTTP Calls in bulk in batches
  • Ability to spawn HTTP Calls across multiple threads
  • Ability to respect and deal with API Rate Limitations

I have specifically focused on creating a pipelined design which might be an interesting design aspect for many in this situation, and have primarily relied on python based generators for the same though similar functionality could be built using Iterators in other languages as well.
Continue reading →


27
Jan 09

Software / IT Terms in early stages of abuse or ripe for Abuse

Abuse : improper or excessive use or treatment (Merriam Websters)

Ahh! That is a term we haven’t found being used in IT context too often. Have we ?

Abuse in the past

Well let us start with some terms that have been used, and in my opinion abused in the past :

Web 2.0 : If there was indeed a term that actually set itself up for abuse this was it. There was nothing in it for people to refer to and hold on to as a basic and essential premise around which it gets applied. Eventually Tim O’Reilly the author of the term had to come out with a clarification What is Web 2.0, in which he said it refers to the Design Patterns and Business Models for the Next Generation of Software. Amongst the many trends he pointed out were

  • The Web as a Platform
  • Harnessing Collective Intelligence
  • Data is the next Intel Inside
  • End of the software release cycle
  • Lightweight programming models
  • Rich User Experience

He also defined the core competencies of the Web 2.0 companies

  • Services, not packaged software, with cost-effective scalability
  • Control over unique, hard-to-recreate data sources that get richer as more people use them
  • Trusting users as co-developers
  • Harnessing collective intelligence
  • Leveraging the long tail through customer self-service
  • Software above the level of a single device
  • Lightweight user interfaces, development models, AND business models

Did Web 2.0 get abused. You bet. In many ways it was too clearly defined. But inherent in the number of attributes that were associated with Web 2.0 was the risk that anything sharing but a fraction of the attributes would want to call itself Web 2.0 (given the popularity of the term). Moreover it was actually hard not to accidentally abuse it for the same reason. However people have got around that eventually and now treat Web 2.0 as simply a moniker enhancer, and then evaluate the application or company to figure out whether it indeed is Web 2.0. The overuse of the term is leading to it becoming less relevant today

Service Oriented Architecture (SOA) : While I’m pretty sure this term has been around since early 90′s and especially used in the CORBA and Tuxedo worlds, the earliest online web page I could find which described it is “Service Oriented” Architectures, Part 1” which describes it as A service-oriented architecture is a style of multitier computing that helps organizations share logic and data among multiple applications and usage modes.. A very ironic article was “BEA Fits Tuxedo With SOA Makeover” especially considering that CORBA and Tuxedo users were probably amongst the earliest proponents of the term. Instead of seeing SOAP-WS* as one more element in a continuum of possibilities, it became the “in thing” which got weighted down by so many promises, expectations, goals and so much scope creep to its perceived capabilities. Herein lay the seeds of the disappointment one sees today especially in situations when it is doing rather well.

Lifecycle of abuse :
The following are the stages through which a term that is subject to abuse goes through :

  • Initial formulation of seed thought and early implementations or practices.
  • Further refinement of thought into a notion or hypothesis that has been tested and refined through a few experiences.
  • Success visibility by neighbours leading to greater chatter along with active promotion of the term and its benefits.
  • A positive spiral of both field successes and good PR leading to a strong mindshare
  • High mindshare attracts a large and diverse community of “second stage thought movers”
  • Desire to brand or market oneself in terms of using that term to sound “in” or “leading edge”
  • Attempts to reevaluate the terms meanings in one’s own context to achieve the best fitment leading to a term meaning modification (either newer attributes being introduced or existing attributes associated with the term getting diluted)

A term’s essential value proposition is that it manages to convey one sometimes composite and sometimes complex notion succinctly. Yet successful people, successful companies, and successful terms act as great magnets. Once a term is successful, everyone attempts to use it in their own context. This is when the term starts becoming different things to different people. Thats when it goes down the Blind men and an elephant path. Thats when the term loses its core value proposition, and people using it get steadily more confused over a period of time. One way to avoid this disintegration is to be aware of the process and attempt to slow it down by drawing attention to the core notion and value proposition of the term.

Similar to the terms above, I believe other terms are now getting ripe for being abused. However sometimes abuse leads to improper expectations and sometimes even to “<Term> is Dead” which is actually an indication of the terms abuse dying rather than its basic set of offerings which continue to do well. A couple of times I have blogged about things *not* dying are SOA ain’t dead but it certainly is transforming and Java : the perpetually undead language.

I do see many terms in Software development and IT that are in the stage where they are being abused or are starting to be abused. So before we have to bury them eventually for not meeting the expectations such abuse creates, can we rein in the abuse. I doubt it, but this post is an effort in that direction.

REST : This one theoretically should be the least likely to be abused. Simply because it is so well defined in Roy Fielding’s Dissertation. But even he had to eventually write in his post REST APIs must be hypertext-driven.

I am getting frustrated by the number of people calling any HTTP-based interface a REST API. Today’s example is the SocialSite REST API. That is RPC. It screams RPC. There is so much coupling on display that it should be given an X rating.

What needs to be done to make the REST architectural style clear on the notion that hypertext is a constraint? In other words, if the engine of application state (and hence the API) is not being driven by hypertext, then it cannot be RESTful and cannot be a REST API. Period. Is there some broken manual somewhere that needs to be fixed?

Strong words indeed. And if you think if these were against some small application, the social site REST API is the OpenSocial REST API which is a rather prominent standard (and its kind of hard to imagine google and other participants wouldn’t realise the distinction). Yet the underlying shift is that for internet based applications people are increasingly relying on simple HTTP based API and since perhaps HTTP is too bland a name to term an API with, they go with the sexier sounding REST. Not really necessary. If I have an HTTP API, its all right to call it an HTTP API. Another term that did start getting used alternatively was POX/HTTP (Plain Old XML over HTTP) but the slight issue with this term is that JSON is starting to be another data payload vehicle so POX doesn’t necessarily cut it universally.

There is another issue as well with using REST as the moniker to describe HTTP based APIs. Roy Fielding eloquently points out that REST is so different from RPC. Yet in a logical sense, many requirements for HTTP APIs are very RPC like. It is possible to represent such an API using REST semantics (instead of user/delete/123 call it userdeletion/create/123 .. a name mangling trick). Moreover REST requires statelessness, a requirement that not all applications necessarily comply with. I am not suggesting they should, just that should they choose to not be completely stateless it would be difficult to term their APIs REST. These were both issues I had referred to in my earlier post Fomenting unREST : Is RESTfulness a semantics game ? Why does REST require statelessness ?.

Clearly REST is not going to cut it as a universal mechanism of describing simple HTTP based APIs. I would suggest call them HTTP APIs, or perhaps think of a better term which is not REST since it seems rather likely that many of the HTTP APIs we shall develop may simply not meet the requirements or expectations of REST. The more we use REST the way it should be, the less we are likely to abuse it. So may I request that we use REST only when the API adheres to all the expectations as specified by Roy Fielding ?

Cloud :

I am certain the term cloud began thanks to the abstraction of internet that got used in tons of network topology diagrams. So it perhaps referred to the segments of the network that formed the entire plumbing of the internet and all the external network segments which were not relevant to the topology under discussion, and to the applications and services that resided on these “external” network segments (primarily the internet). However further along the way additional attributes started getting implied. An example is the rapidly scalable infrastructure (the scale on demand attribute). This blog is hosted on a pre specified hardware by a web hosting company. Is it on the cloud ? Its a little unclear (for certainly it does not have the scale on demand attribute). Occasionally implied attributes are multi-tenancy. IMO thats stretching it too far, multi tenancy is an optional attribute of applications and services that can be hosted on the cloud, it isn’t the attribute of the cloud itself. It would be better to keep the scale on demand attribute away from the cloud term and use it as an additional qualifier (eg. elastic cloud). However one of the effects of that is that it is leading to a proliferation of cloud related terms which is if you pardon my pun, making the situation a little too cloudy. Some of the terms are for lack of a more civilised response quite amusing eg. “Internal Cloud”. I mean if cloud represents the external segments of the network and the application and services that are hosted on them, why would one call the internal network the internal cloud as if the external network would be the external cloud. But then whats the difference between the network and the cloud. Or does the network becomes the cloud or should it be that the cloud becomes the network. This highly avoidable tendency of clouding everything also has a cloudy term for itself cloudwashing – slapping the word “cloud” on products and services you already have. See the point ? Having said that, I still think it is preferred to have a proliferation of qualified terms rather than a proliferation of contextually implied attributes into a single term itself.

So can we keep cloud restricted to the “external network segments and the applications and services hosted on them” ? Thank You. And lets keep out an eye on all the additional terms that further qualify cloud. Many of them might not make sense, so just decloud them.

*****-as-a-service :

It all probably began with Software as a Service (Saas). Saas rocks and is a term which is catching on. But it is also now getting a ton of other “as a service” followers. Just google for this and the following words crop up as the first part of -as-a-service : platform, desktop, hardware, desktop, sql, data, ui, infrastructure, PC, research, os, malware, identity management, analytics, middleware … get the picture ? Now in this current global economy where more than a third of the economy is based on services, just imagine the sheer proliferation of terms that are likely to happen simply because someone doesn’t say I offer hosted middleware, no it is middleware as a service. This blog will recast itself as Gratis chronological opinion as a service. People are going nuts, and there’s no one to stop them. In the meanwhile whats happening to the one which started it all – SAAS ? Theres something called Level 1 SAAS where each customer has its own customized version of the hosted application and runs its own instance of the application on the host’s servers. I have read at least one blog post which refers to ISVs needing to customise Saas offerings for individual enterprise customers. So rather than newer implied attributes being added to the term Saas, existing ones are being removed in a manner where any outsourced software hosting will be Saas. Pretty soon, a business unit which outsources software development, maintenance and hosting to another intra corporate business unit will start calling it a Saas offering to keep up with the pressure to seem current. Pretty soon all software will tend to Saas.

Can we please keep Saas restricted to uncustomized offerings made by other service providers which essentially rent out the use of the software they’ve built or own on an as-is basis (with version upgrades from time to time of course) ?

Mashups :

Lets just for a moment look at how wikipedia describes Mashups.

a mashup is a web application that combines data from more than one source into a single integrated tool.

I see a number of enterprise vendors starting to offer Mashup products. That in itself is welcome. I however also anticipate a full wave of rebranding of Integration Services, SOA, Portal Servers into Mashups in addition to the few new mashup features being introduced. Also expect such buzzwords to drive newer budgets and customer excitement even though relatively little has changed under the covers. Eventually the term could become all encompassing where it covers all forms of system integration and service aggregation which will eventually make it useless as a term. In the meanwhile authors of existing Web 2.0 Mashups built on Google Maps, Flickr and Twitter will be left wondering, “Hmm.. was this was mashups was all about ?”


Agile : I don’t know that its terribly abused, but when I see presentations that combine elements of Agile with layered sequences (eg documented design followed by review followed by development followed by testing) and heavy management control procedures and call it the right way of doing agile – it smacks of abuse to me. Thats like taking a cheetah, cladding him with medieval knight armour, and saying ain’t it a very nice and strong fastest animal on earth ? I don’t think it is feasible to adapt a heavy process to agile or vice-versa to somehow take on the attribute of being agile. Agile requires a migration from processes which encourages repeatability through documentation, division of work and control into one which achieves high productivity and quality through capability enhancement. This requires a reduction in the control processes and an increase in the capabilities training in a manner where the quality and control is not maintained through processes built around a replaceable workforce, but instead through a set of (relatively) informal and frequent communication channels built around a increasingly capable team.

A lighter parting note.

Finally on a lighter note a friend suggested another term that is being abused. “Software engineering – a Term that should never have been applied to software and insults engineering.” I agree even though I know the comment probably stemmed from the usage of this word in this blog’s tagline. While Software Craftsmanship has been discussed for long, I find many software building efforts not worthy of being called craftsmanship. Therefore I contribute a new term to the lexicon – Software Construction And Maintenance. This is one term (or its acronym) I am not concerned of getting abused. In fact it is unlikely to get used.