Stop calling me NoSQL

Dear Reader,

Apologies for sending this note to you completely unannounced and out of the blue. However I find myself in a peculiar situation of having a very weird name being dumped upon me. While I am indifferent to the name per se, I am greatly pained as I realise that it is a completely inappropriate name. What is even more confounding is the very bunch of people who have happily assigned me the name and continue to popularise it belong to that class of people some of whom actually are extremely particular about accurate nomenclature and have no hesitation in creating a 100 letter class or function name by concatenating 20 words just to make sure the name is unambiguous and conveys the intent clearly.

Ahh.. but I digress and impose upon you without introducing myself adequately first. I am a data storage style. I am not new, but lately far too many a software engineer have started taking a liking for me. Ever since I have been around, I have with great amounts of jealousy watched my cousin the RDBMS being courted by the finest of engineers (in all honesty there were some fine engineers interested in me too, but far too few compared to my cousin). But lately multiple concurrent developments have made a fair amount of attention come my way too.

You see unlike RDBMS, I don’t require that data be clearly split into tables, columns and rows. I can work with data the way it is most naturally represented. As a tree of individual data fields, lists, arrays, dictionaries etc. Also I do not require that you always clearly define each and every possible schema element before being able to store data corresponding to the schema. I can happily accept a schema dynamically or even work without a schema. Some of my early forms were based on key value pairs stored as B-Trees (eg. Berkeley DB). Over the years people have figured out ways to represent the data as a set of decomposed document elements, store data spread across a cluster, replicate it for better availability and fault tolerance, and even perform post storage processing tasks using map-reduce sequences. But really what separates me from my cousin and other storage systems is that I don’t make demands on the data – I take it in its naturally found form and then store it, replicate it, slice it, dice it and glean information out of it. And therein lies my true identity – I will work with data the way the data is best represented with all its arbitrary inconsistencies and inabilities to always clearly specify a constraining schema. And the engineers who’ve spent time with me seem to have enjoyed it quite a bit.

But the horror of it – they gave me a completely inappropriate moniker – ‘NoSQL’. First and foremost I exist to promote a storage style and thats what identifies me. I work with data in its natural and arbitrary forms. Therefore to make it seem like I represent a lack of something else is utterly missing the point. The SQL in NoSQL stands for Structured Query Language, which depends upon Fixed Structure Relational Data. Since I change the very nature of the data being stored, that SQL is not required or relevant is automatic and inconsequential.

Its like calling a under-the-ocean-mountain_range as NoIgloo. Its dead obvious igloos will not be found there. But calling that mountain range NoIgloo is a big disservice to visitors. You use that as a marketing term, attract people, then tell them that NoIgloo actually has nothing to do with Igloos – its got to do with mountains and oceans, and that they need to first unwind all the confusion they created in their minds due to NoIgloo and then go through a phase of reunderstanding mountains and oceans. And while they came prepared for a possibly warmer place given the name NoIgloo – it actually is a wet place so they need to again change their garments and equipment for the journey. A wholely avoidable situation.

Update: Brad Anderson pointed out this interesting post NoSQL: A Modest Proposal which traces the genesis of my name which leaves me very very disappointed. Almost seems to suggest that people are flocking together and naming me not based on something inherently powerful about me – but as a mechanism to demonise my cousin RDBMS. This is most unfortunate, since we actually end up being useful in very different situations and more often than not are likely to complement each other rather than compete with each other. I do hope a better moniker does prevail over time

What I would like is to see a better / more appropriate name for me. Hmm .. call me free form storage, natural persistence or flexi schema storage or perhaps something else even more appropriate (this blog owner prefers “natural persistence”). Each of these conveys far more about me far more accurately than NoSQL does. Basically please please call me something better than NoSQL. So can I request you to carry forward my plea by further forwarding and retweeting this to your friends and ask them how they can so callously call me by such a silly name when they take the utmost precautions in properly naming their classes and methods. Plead with them to stop doing this and please work with others to give me a better name. I think it will cause less confusion over the coming months and years, and the field of software shall recover its glorious tradition of maintaining precision in communication by using accurate naming.

Sincerely,
The one who doesn’t want to be called NoSQL

PS : As a background to this there was an interesting conversation earlier today between this blog owner dnene and Kent Beck on twitter, where Kent so kindly and graciously helped carry forward the thought process of helping identify my essential characteristics, and it is in no small part, thanks to this conversation that I was able to articulate myself and my grief. I reproduce that conversation below. (Update: though in all likelihood Kent’s intent was to help clarify the thought rather than contest the names. In hindsight, it makes sense to ask for permission to reproduce conversations .. even when such are on the public twitter stream – something that wasn’t done in this case. :( )

Twitter ID Tweet / Message
dnene NoSQL is such an inappropriate name. NoTables at least makes a little more sense.
KentBeck @dnene but what would nosql be called if you wanted to say something positive about it?
dnene @KentBeck Thats a great question .. still thinking .. best thought so far – FlexiStore (though not good enough yet :( )
KentBeck @dnene what can you do with a nosql store that you can’t do with an sql database? why would you be excited to use one?
dnene @KentBeck I see where u r going with this (a) unconstrained & composite storage (b) store resources not records (c) shard/scale horizontally
.@KentBeck I think there is a merit in attempting to define nosql in terms of what it is rather than what it isn’t
KentBeck @dnene there are many more people confused about what datastore to use than who hate sql. the positive approach appeals to the former.
dnene @KentBeck Agreed .. and I’m aware of many more who wonder why we need a different datastore than the RDBMSs. NoSQL as a name doesn’t help.
KentBeck @dnene well, why *do* we need a different data store?
dnene @KentBeck Primary Need : We need support for flexible/arbitrary schemas with complex depths – RDBMSs don’t dance well in this space.
@KentBeck Secondary Need : Support for deferred processing required for analytics (eg. Map/Reduce). RDBMS don’t do too bad a job here
@KentBeck Tertiary Need (not one that I’ve felt strongly yet) : Distributed and horizontally scalable storage on commoditized h/w.
KentBeck @dnene it seems like you’re looking for realistically structured data, not data twisted to fit a formula convenient for mathematicians.
dnene @KentBeck Yes.. thats it! I’m looking for realistically or naturally structured data storage / persistence. Rocks compared to the term nosql
@KentBeck Wonder if the term arbitrarily structured makes sense as well. This has been one heck of a conversation/Q&A so far +1:)
KentBeck @dnene glad you found it helpful. you get bonus points if the opposite of the name you pick is unattractive, a la “structured programming”

Related posts: (Automatically Generated)

  1. NoSQL – A fluid architecture in transition

Tags: , ,

18 comments

  1. I think nosql as a term is more like protestant. It describes a movement.

    • Manas,

      That would’ve been all right if the movement was focused on avoiding SQL as a primary intent. But thats hardly the situation – is it ? While I am not sure that it is being used to describe the movement and not the data storage style, that really is besides the point – imo, the moniker simply doesn’t do justice to its intent. Not even remotely.

  2. Our industry has its own marketing terms, thats why Livescript becomes Javascript eventhough it has nothing to do with Java.

    But 10-14 years later, Javascript became a scripting platform to Java,

    Should we wait for another 10 year ? :)

  3. how about “FreeStyle Data Storage” instead of NoSQL?

  4. I couldn’t agree more. nosql and lessql are not entirely descriptive of non relational, schemaless, document oriented data store. And deeply in my mind nosql and lessql are closely connected to ORM. So, when I first heard about nosql I thought it’s a new ORM or something.

  5. how about this instead of “NoSQL”?

    1. SQLess
    2. Hybrid Data Storage

  6. Good post. Since you make a distinction between “structuredness” versus “freedom”, I guess the best would be to deal with the “S” in “SQL”. After all, who wouldn’t want a query language on top of their favorite persistence mechanism?

    - FreeQL (“Freequel” has a nice ring to it :-) )
    - NaturalQL

    That still leaves the question why you would call a persistence mechanism a query language, but hey, who cares about details? ;-)

    Hmm. I guess it would make better sense to call it NoRDBMS!

  7. Hmm. Before this NoSQL term, people stored data in files. It seems storing data in files isn’t sexy even though more and more people are doing it.

    Why isn’t it called File Persistence?

  8. All,

    Thanks for the many suggestions. Rather than responding to every suggestion, I think it is best for other readers of this post to reflect on all of these suggestions to more comprehensively grok/discuss the characteristics of this database storage style. Keep the suggestions coming.

  9. NOSQL is perfect to remember and everyone knows what is ment when investing 10 seconds. All the other terms are far too clumsy for me. Sure the term is senseless but it is revolutionary. And that’s what we need to get people out of using the most popular RDBMS because everyone does (and so you can’t blame me), if non RDBMS is not the best fit.

    To me we should keep the name and make clear that #nosql is just a symbolic link to something very useful. Then we force attraction to concepts. I for myself will invest in this direction.

    Regards

    • Stefan,

      Let me draw a political analogy here. Lets assume conservatives (tories) have ruled for ages and become so dominant that everyone else has been marginalised. Some folks get together and combine all other parties and create a new party not_a_tory. The said group now includes liberals, labour, communists.

      While it provides a platform for everyone else to group around called not_a_tory, the fact is such a platform has an exceptionally poor identity since its chosen to define itself as what it is not. Everyone who now wishes to join this movement may think it only takes 10 seconds to grok what it is all about soon realises – the entire ‘grokking cost’ has been deferred. Theres a lot higher cost to be paid later in return for temporary platform. Soon enough people start wondering whether the greens should be a part of not_a_tory – since they are quite accurately not a tory.

      Soon enough the party grows maybe wins a few seats and then all hell breaks loose since the internal inconsistencies play themselves out. The greens start wanting to join it while imposing the green agenda of environmentalism as well which others want to keep out, while the liberals and communists launch an internal power battle which eventually results in the disintegration of this platform. No amount of forcing attraction to concepts will be helpful in such a situation since the concepts themselves will confuse each other.

      If instead such a platform were to identify itself based on what it stood for (a coalition) which essentially is a loose federation of specific parties, then people can easily identify with or away from the coalition and work with the coalition and the constituent parties appropriately. The coalition has an identity independent of the constituent parties clearly defined by what it is for (not what it is against). Thats the way to have a healthy democracy and more importantly a non-confused electorate.

      On a separate note investment is a subjective decision based on preferences and priorities. However because I may choose to invest in exercise ‘x’ doesn’t imply that an independent exercise ‘y’ is not worth investing in.

      On yet another note – here was another interesting (tongue in cheek) tweet today I saw – “Do filesystems qualify as NoSQL databases? key/value, scalable, distributable, schemaless, good query tools, hierarchical; got it all!” That shows the confusion we are heading towards.

      Cheers.

      • Dear Dhananjay Nene,

        > dominant tori example …
        Great counterexample you give. I missed this point. And I agree you might be right that the nosql might have to pay back later (as the technical dept discussion in source code…).

        But I think it is already too late. The term nosql is sinking into every search engine and twitter database deeper and deeper for far more then half a year. Even I am gathering all nosql material under http://nosql-databases.org to give something back to the community.

        So if the nosql community would find a clear name (which is also good for marketing) I would be happy about this new name. But I don’t see any way how this could ever happen. It’s always an evolutionary shift coming from within.

        Regards

  10. Hey Danny,

    This is a diversion from your serious blog but …

    Last week I organized a meet for DBMS/IM folks in Boston and someone in the audience asked about NoSQL, the future etc., The answer was this: “NoSQL movement is like the Ron Paul movement—they are unhappy, don’t like what they have, and have little to offer”.

    It is a point of view, not universally accepted (but personally I think it is fairly accurate). I’m still of the mind set that Dr. Codd was in fact correct and with reasonable data analysis, all data is in fact relational. I think that is the case not so much because of the nature of data itself but because it is a reflection of the way we think: through linkages and relationships.

    So, to me, NoSQL (call it what you will) is in some respects a 1 column table where the single column is a xLOB and you move the schema into the processing layer.

    Toodles!

    -amrith

    • Amrith,

      all data is in fact relational

      Looking at data in an abstract sense, relational is not the attribute of the data but the way it is modeled. All data ‘could’ be modeled as relational. However not everyone is comfortable doing so under all the circumstances. While relational may continue to be a valid way to model the data it is not necessarily the most convenient – eg. compromise solutions such as Entity Attribute Value Model

      NoSQL (call it what you will) is in some respects a 1 column table where the single column is a xLOB

      Coming from a relational background thats what it will seem like – since relational has this catch all exception handling construct called a blob (its a place to dump all data that is too inconvenient to model in a relational form). But again I would submit thats just looking at the solution from a very polarised (relational) set of lenses.

      Cheers.

      • Hi Dannny,

        It is important to distinguish between shortcomings of a language (or schema, grammar, tool, appliance, process, method) and the individual. Take the very example you mentioned, EAV. I submit to you that from the point of view of schema definition, the EAV is the perfect example of a simple star schema with an entity relation, a attribute relation and an entity-attribute-value relation. Again, strictly from a modelling perspective, the cardinality of the EAV relation will be large and the cardinality of the attribute relationship is likely to be large as well. Either on an SMP database or on an MPP database, you end up with a smaller representation if you were to use a star-schema.

        At a different level of looking at it, EAV in an un-normalized representation is still a relational representation.

        So, along comes the requirement/criticism that all values need not have the same general data “type”. Some are numbers, some are boolean and some are strings. Without loss of generality, these can all be represented as strings and again, without any loss of generality this can be represented with multiple columns of different datatypes, number (with suitable p, s), boolean or CHAR(1) and VARCHAR(N).

        The argument you propose for the EAV is the perfect example of a restricted case of my claim regarding NoSQL: you propose with EAV, a table with two key columns (or a composite key with two columns) and a string field (VARCHAR(N)) because it so happens that the data is potentially a string. Recently I did encounter one person who stored medical data in a EAV style schema and one of the cases for the value was an image (xray, ecg, mri) and therefore they used a LOB.

        So, I’m not so sure that the view that all data (when modeled appropriately) is relational.

        But, that does not mean that all data must be modeled in a relational database. There may be limitations in relational technology as it stands today that make it infeasible.

        To date, I know of only one class of such problems that are not well solvable with current relational databases. I still have not heard a specific example of something that justifies NoSQL as a “requirement” or for that matter even a “simplification”.

        If you have some examples where NoSQL (or by extension MR/Hadoop) are the only way to solve a problem due to limitations in relational databases, I’d love to know more.

        Thx,

        -amrith

  11. If you haven’t already seen it then check out the “NOSQL = Not Only SQL? (was Re: An open letter to the NoSQL community)” thread on the nosql-discussion list. Definitely with you on this one… hoping that NOSQL (as in Not Only SQL) gets adopted as it’s far less abrasive.

    Sam

    • Sam,

      Not sure whether not only sql clarifies or confuses. I would imagine that name to be appropriate in the contexts where a combination of NoSQL databases along with some RDBMS gets used. It may sound less abrasive, but I would’ve a preferred a name which illuminates or enlightens.