Tag: nosql

Five important trends on the enterprise architect’s radar

Posted by – November 2, 2009

It is no secret that the internet architectures are influencing enterprise architectures. This post attempts to summarise some of the recent trends in the internet space, which seem to be carrying some momentum sufficient enough to influence the enterprise. So without further ado, these trends are :

  1. REST : The Representational State Transfer architecture style builds on the essential elements of those constructs which made the internet so globally scalable. A detailed explanation of the rationale and strengths of REST are completely beyond the scope of this article. If your job requires you to be continuously aware of emergent trends and whether they fit your enterprise architecture needs – this is the one must explore trend.

    Impact : Web based architectures, Service Oriented Architectures, wide availability and immediate usability of data and processing requests (resources) through simple HTTP URIs and minimal integration effort

  2. Interoperable Cloud : The interoperable cloud is the ability to create a private cloud and also leverage a public cloud. This has been made possible by offerings such as the Ubuntu Enterprise Cloud which allows you to build a private cloud or use a public cloud such as Amazon EC2 while being able to access them using the same set of APIs thanks to open source efforts such as Eucalyptus. This allows you the flexibility of initially using either a private or public cloud and then subsequently shifting to the other, or being able to use both simultaneously.

    Impact : Large servers vs cluster of commodity servers, virtualisation, elastic deployments, flexible hardware procurement / provisioning, infrastructure management in organisational hierarchy.

  3. NoSQL : While I am unhappy with the name, it has stuck. This refers to a set of options now available to store your data unconstrained by many RDBMS requirements (eg. flexible schema, key value pairs etc.). Some of the databases also allow you to store data in a distributed manner over a number of servers with an intent to support high availability in write intense scenarios even as they may require you to move towards eventual consistency. These options increase your manouverability / flexibility as an architect even as they require you to meet a different set of challenges.

    Impact : Relational databases, data storage strategies, data distribution strategies, vertical vs. horizontal scalability, transactionalisation, consistency and availability

  4. Polyglotism : Developer costs now occupy an increasing percentage of total costs, development time is being an increasingly dominant factor for time to market, and ability of software to change and adapt quickly to newer demands is now a critical success metric. One of the solutions is to write different parts of the software in a different languages most appropriately suited for concise and rapid coding as well as supporting quick reaction changes to each part appropriately. Thus it is conceivable to have some of the business rules written in a dsl written using jruby and some of the algorithms written in clojure in a software built on the JEE platform.

    Impact :Development culture and processes, minimum developer skill and scalability, risk management for managing required vs. available skills.

  5. Decentralised processing : Thanks to many developments which are leading to increasingly distributed processing including REST and NoSQL, applications will need to be a set of collaborating network based components (we’ve heard this before with distributed objects as well). However especially given some of the lesser guarantees that such architectures can provide around immediate guaranteed processing, latency issues, distributed control and asynchronous processing, a particular piece of business logic may get satisfied in a staggered fashion across a number of collaborating components. This may increase challenges in terms of currency of available data even as it helps actually deliver on the vision of distributed objects and simplifies individual component development. While asynchronous capabilities such as those supported by MQ series and the like have been used in the enterprise for ages, I do anticipate increasing use of lighter messaging constructs such as PubSubHubbub within the enterprise.

    Impact : Application partitioning, network based components, difficulty in supporting fully synchronous workflows.

Stop calling me NoSQL

Posted by – October 23, 2009

Dear Reader,

Apologies for sending this note to you completely unannounced and out of the blue. However I find myself in a peculiar situation of having a very weird name being dumped upon me. While I am indifferent to the name per se, I am greatly pained as I realise that it is a completely inappropriate name. What is even more confounding is the very bunch of people who have happily assigned me the name and continue to popularise it belong to that class of people some of whom actually are extremely particular about accurate nomenclature and have no hesitation in creating a 100 letter class or function name by concatenating 20 words just to make sure the name is unambiguous and conveys the intent clearly.

Ahh.. but I digress and impose upon you without introducing myself adequately first. I am a data storage style. I am not new, but lately far too many a software engineer have started taking a liking for me. Ever since I have been around, I have with great amounts of jealousy watched my cousin the RDBMS being courted by the finest of engineers (in all honesty there were some fine engineers interested in me too, but far too few compared to my cousin). But lately multiple concurrent developments have made a fair amount of attention come my way too.

You see unlike RDBMS, I don’t require that data be clearly split into tables, columns and rows. I can work with data the way it is most naturally represented. As a tree of individual data fields, lists, arrays, dictionaries etc. Also I do not require that you always clearly define each and every possible schema element before being able to store data corresponding to the schema. I can happily accept a schema dynamically or even work without a schema. Some of my early forms were based on key value pairs stored as B-Trees (eg. Berkeley DB). Over the years people have figured out ways to represent the data as a set of decomposed document elements, store data spread across a cluster, replicate it for better availability and fault tolerance, and even perform post storage processing tasks using map-reduce sequences. But really what separates me from my cousin and other storage systems is that I don’t make demands on the data – I take it in its naturally found form and then store it, replicate it, slice it, dice it and glean information out of it. And therein lies my true identity – I will work with data the way the data is best represented with all its arbitrary inconsistencies and inabilities to always clearly specify a constraining schema. And the engineers who’ve spent time with me seem to have enjoyed it quite a bit.

But the horror of it – they gave me a completely inappropriate moniker – ‘NoSQL’. First and foremost I exist to promote a storage style and thats what identifies me. I work with data in its natural and arbitrary forms. Therefore to make it seem like I represent a lack of something else is utterly missing the point. The SQL in NoSQL stands for Structured Query Language, which depends upon Fixed Structure Relational Data. Since I change the very nature of the data being stored, that SQL is not required or relevant is automatic and inconsequential.

Its like calling a under-the-ocean-mountain_range as NoIgloo. Its dead obvious igloos will not be found there. But calling that mountain range NoIgloo is a big disservice to visitors. You use that as a marketing term, attract people, then tell them that NoIgloo actually has nothing to do with Igloos – its got to do with mountains and oceans, and that they need to first unwind all the confusion they created in their minds due to NoIgloo and then go through a phase of reunderstanding mountains and oceans. And while they came prepared for a possibly warmer place given the name NoIgloo – it actually is a wet place so they need to again change their garments and equipment for the journey. A wholely avoidable situation.

Update: Brad Anderson pointed out this interesting post NoSQL: A Modest Proposal which traces the genesis of my name which leaves me very very disappointed. Almost seems to suggest that people are flocking together and naming me not based on something inherently powerful about me – but as a mechanism to demonise my cousin RDBMS. This is most unfortunate, since we actually end up being useful in very different situations and more often than not are likely to complement each other rather than compete with each other. I do hope a better moniker does prevail over time

What I would like is to see a better / more appropriate name for me. Hmm .. call me free form storage, natural persistence or flexi schema storage or perhaps something else even more appropriate (this blog owner prefers “natural persistence”). Each of these conveys far more about me far more accurately than NoSQL does. Basically please please call me something better than NoSQL. So can I request you to carry forward my plea by further forwarding and retweeting this to your friends and ask them how they can so callously call me by such a silly name when they take the utmost precautions in properly naming their classes and methods. Plead with them to stop doing this and please work with others to give me a better name. I think it will cause less confusion over the coming months and years, and the field of software shall recover its glorious tradition of maintaining precision in communication by using accurate naming.

Sincerely,
The one who doesn’t want to be called NoSQL

PS : As a background to this there was an interesting conversation earlier today between this blog owner dnene and Kent Beck on twitter, where Kent so kindly and graciously helped carry forward the thought process of helping identify my essential characteristics, and it is in no small part, thanks to this conversation that I was able to articulate myself and my grief. I reproduce that conversation below. (Update: though in all likelihood Kent’s intent was to help clarify the thought rather than contest the names. In hindsight, it makes sense to ask for permission to reproduce conversations .. even when such are on the public twitter stream – something that wasn’t done in this case. :( )

Twitter ID Tweet / Message
dnene NoSQL is such an inappropriate name. NoTables at least makes a little more sense.
KentBeck @dnene but what would nosql be called if you wanted to say something positive about it?
dnene @KentBeck Thats a great question .. still thinking .. best thought so far – FlexiStore (though not good enough yet :( )
KentBeck @dnene what can you do with a nosql store that you can’t do with an sql database? why would you be excited to use one?
dnene @KentBeck I see where u r going with this (a) unconstrained & composite storage (b) store resources not records (c) shard/scale horizontally
.@KentBeck I think there is a merit in attempting to define nosql in terms of what it is rather than what it isn’t
KentBeck @dnene there are many more people confused about what datastore to use than who hate sql. the positive approach appeals to the former.
dnene @KentBeck Agreed .. and I’m aware of many more who wonder why we need a different datastore than the RDBMSs. NoSQL as a name doesn’t help.
KentBeck @dnene well, why *do* we need a different data store?
dnene @KentBeck Primary Need : We need support for flexible/arbitrary schemas with complex depths – RDBMSs don’t dance well in this space.
@KentBeck Secondary Need : Support for deferred processing required for analytics (eg. Map/Reduce). RDBMS don’t do too bad a job here
@KentBeck Tertiary Need (not one that I’ve felt strongly yet) : Distributed and horizontally scalable storage on commoditized h/w.
KentBeck @dnene it seems like you’re looking for realistically structured data, not data twisted to fit a formula convenient for mathematicians.
dnene @KentBeck Yes.. thats it! I’m looking for realistically or naturally structured data storage / persistence. Rocks compared to the term nosql
@KentBeck Wonder if the term arbitrarily structured makes sense as well. This has been one heck of a conversation/Q&A so far +1:)
KentBeck @dnene glad you found it helpful. you get bonus points if the opposite of the name you pick is unattractive, a la “structured programming”

NoSQL – A fluid architecture in transition

Posted by – October 21, 2009

Lot of talk about NoSQL. Much of it well deserved. And while lot of the excitement around it is well understood by those in the know, some of it may actually be confusing to those who are relatively new to the matter. This post is actually for the latter group – not to argue for or against NoSQL – just to put it in perspective.

What is NoSQL : Some of the characteristics shared by most if not all the NoSQL engines are as follows :

  1. Schemaless or Hierarchical Schema Storage NoSQL assumes at its very basis a schemaless or a hierarchichal schema storage system. In most cases this consists of a simple key value pair storage. While some storage engines excel at storing small values (LightCloud/Tokyo Cabinet), some are strong at storing large documents (CouchDB).
  2. Distributed storage : This is one of the driving forces of NoSQL growth, though not a distinguishing characteristic of NoSQL. One of the areas these storage systems separate themselves from RDBMS’s is their ability to allow better horizontal scalability. This varies from the simple master-master replication for MongoDB, to multi node sharding using consistent hashing with LightCloud (a la memcached) to a multiple master eventual consistency model of Riak. The basic premise in using some of the NoSQL engines is that storage will scale horizontally.
  3. Support for deferred processing :Many of these engines allow for some degree of deferred processing. Whether this be simple lua scripting in case of LightCloud or map-reduce scripts in case of CouchDB, the general assumption is that some amount of latency in computation times is acceptable, and some of the computations (especially related to analytics based views) will be performed post storage.
  4. Eventual Consistency : This may seem like a necessary feature of all NoSQL storage systems but it isn’t. While clearly some such as CouchDB (in terms of its map-reduce views) and Riak are better placed for supporting and implementing eventual consistency, it is quite feasible to use others such as LightCloud or MongoDB to implement immediate consistency using a single master-master pair. Suffice to note that eventual consistency is not a necessary side effect of using a NoSQL storage system, though it wouldn’t be incompatible for the two to work together.

But the points I would really like to emphasize are :

  • NoSQL is not a direct competitor to RDBMS/SQL : It is actually a solution to many use cases where using RDBMS was perhaps a poor fit. Thus the decision for an architect is not which of the two competing options (RDBMS or NoSQL) Update: one should selectshould be the preferred standard storage strategy, – it simply is which one is the more appropriate storage system for the application under consideration.
  • NoSQL is still at a fluid stage of its development : All the NoSQL storage systems (but for LightCloud/Tokyo Tyrant) are still being quite actively developed. These have not reached v 1.0 (Update: MongoDB is at v 1.1) and it is likely that some time will pass before any of these get beyond the beta and release candidate stage and get a 1.0 in-production stamp. While there is a lot of interest, there still is a substantial amount of experimentation in terms of the right feature sets leading to differently focused developments in different storage systems. To an architect this represents an interesting challenge. I think the way to approach this right now is to not use these in mission critical (eg. life or health impacting) systems, and to focus on reasonable expectation management in terms of ensuring the right kind of SLAs around their availability (simply because many of these haven’t yet been put to intense use in production the way say an Oracle or MySQL have been). This is not an attempt to spread “FUD” about NoSQL – far from it it is an exercise in setting appropriate expectations. i would also recommend that it would be appropriate to evaluate the available NoSQL choices only when reasonable SLAs can be worked out for their usage. It is certainly preferred to using NoSQLs rather than using RDBMS’s in an inappropriate manner (large objects serialised into BLOBs or into name-value pair tables). However, I would suggest that you do not deeply bind yourself into a particular NoSQL engine. The future development of most of the storage systems is still unknown to a certain extent, as is the future landscape including any shakeouts. Should one recommend usage of a NoSQL engine – it is important to have a clear plan for switching over to an alternative engine should a need arise in the future. While this is easier said than done, deciding the appropriate level of abstraction to use (ie. code to the API directly vs. use a layer of abstraction for engine independence) is best left to designer / architect to dwell upon.