NoSQL is a term doing us a disservice

Although we edit NoSQL software, I always disliked calling our product a NoSQL database. I know we say in the third paragraph of our web page that wrpme is a NoSQL database, but see, right after it says that we prefer to call it a postmodern database as proposed by Dr. Richard Hipp.

The obvious reason why we use this fancy word is because we’re French which makes us pretty much the pinnacle of snobbery and pedantry. The other reason is because we weren’t able to come up with a better term.

There’s also the thing that we don’t like NoSQL is because it’s a negative term that suggests our software is here to destroy Relational Database Management Systems (RDBMS) which are the one true evil.

I think I can safely say that no one is trying to destroy anything.

Where I say good things about RDMBS

I guess the NoSQL term originated from someone who was fed up with yet another pervert use of relational databases.

I’ve seen my share, I can relate to that feeling.

The first thing that comes to my mind is any software that uses a RDBMS to serialize data. That's pretty overkill, isn't it? After all, I however think I prefer the “one table to rule them all” scheme. Wait. One table is good, but, I’ve heard primary keys are good too so if we make everything a primary key it must be very good. Oh, and more indexes please, I can’t get enough of them. I know that I will somehow need to research those binary blobs, so index them please. And index the indexes to make the indexes faster.

So yes, it can get very silly.

Let’s not forget one thing though: an incredible amount of intelligence has been poured for more than forty years into RDBMS. People much more intelligent than you, maybe almost as intelligent as us (and that’s saying a lot, see paragraph two), have worked very hard to solve extremely complex problems. And succeeded.

Please, please, please, make sure it dwells very well in your mind: RDBMS work, and they work reliability, and they can adapt to your business case very well, and they when you account everything they do. They have their limits - like everything in the universe - but every time you book a flight, order a book or send money they’re proving to yourself and to the world how dependable they are.

NoSQL engines are for the most part crude, useless and unreliable and as for us, we know we still have a long way to go in terms of flexibility, features and proven reliability.

When you complain that your relational database is too slow, the problem is not the database. The problem is most likely how you use it.

Let’s talk performances

So, am I killing our business? Not really.

RDBMS are fast, but ~~NoSQL~~ postmodern databases can be damn fast. Although you may not need the speed, you may like the fact you need less computing power to handle the same load.

Additionally, to be fast, relational databases have to be used properly. Let’s be realistic for a second, it’s hard to be good at SQL. Non-relational databases are “more obvious” and closer to how the typical programmer thinks and for simple use cases you are most likely to do the right thing with a NoSQL engine than with a relational one.

Ever tried hammering a reasonably sized RDBMS with one thousand distinct clients? A real one? With atomic, consistent, isolated and durable (ACID) transactions? With each client querying the database like there’s no freaking tomorrow? Did it also end up with a database administrator in the air vent with a crossbow aimed at you? I think I made my point.

RDBMS, in certain contexts, can be slow because one of their best features, ACID transactions, come with a hefty price. This is not because RDBMS are poorly done, it’s because to truly ensure that your transaction is atomic, consistent, isolated and durable the database needs to do a lot of work.

And while we’re on the topic of ACID transactions, this important feature is also the reason why they don’t scale very well. Distributing ACID transactions is difficult. You can’t just add commodity servers and expect a linear increase in performance.

Did I say difficult? I wanted to say near-impossible.

To scale a modern relational database, you partition data into buckets and spread the load over buckets (this is an over-simplification, but bear with me). This is called partitioning or sharding, depending on how you split the data. The limit with this approach is that it requires carefully planning the partitions as it’s much more difficult to adjust later. This is not unlike partitioning a hard drive when you install your operating system. This is generally not something you tweak later on production systems, even if you can.

I know, I know. Some new engines are coming out, claiming they can offer the speed of NoSQL and the reliability of ACID transactions.

Are they lying?

Well, we haven’t benchmarked them (yet), but one thing is certain: if they offer truly ACID transactions, they have a performance tax to pay. They can be clever about it and there’s clearly room for achieving great things, but they will always be disadvantaged.

In other words, state of the art relational databases with ACID transactions will always be an order magnitude slower than state of the art non-relational databases without ACID transactions.

Let’s talk money

The other big problem is that over time relational databases became bloated with “wtf” features, because you really need to be able to do Java inside SQL, right?

The dark shroud of enterprise software obfuscated the qualities of otherwise fine products. That means you will need someone to shepherd the weak through the valley of darkness: a database administrator.

Do you really need more weird people in your organization? I submit you do not.

Which brings us to a topic top management understands very well: storage is an order of magnitude more expensive on relational databases. You see that terabyte drive you can buy for 50 €? Want to add a terabyte to your RDBMS? That will be 5 000 €, thank you very much (I’m not making this up).

This last piece of information probably helps you understand why there is a strong interest for non-relational databases as we enter the yotta world.

When should you go non-relational?

RDBMS are great, but they’re not great at everything and sometimes they truly suck.

That’s why we have non-relational databases, because if you throw away relations and transactions, you can do interesting things in terms of processing, scalability and pure performance.

But do you really want to throw away relations and transactions? Do you need to?

How many gadgets has Batman? We will agree on a number much greater than one. Since you’re probably an order of magnitude below Batman in terms of awesomeness, you will agree that you need to be at least as much prepared as he is. In other words: tool up.

That’s why, if you start a new project, you should definitely consider non-relational databases and include them in your architecture. I guarantee you that you have non-relational data that will be cheap and efficient to store in a post-modern database, and you might even be able to go fully non-relational! Ask us or our beloved competition if you need help.

For existing projects however, we’ve seen that many performance problems can be solved with database tuning and proper caching.

Nevertheless, once you have done that, you may still have performance issues. The thing to understand is that the transition from a relational to a partially (or fully) non-relational schema can be extremely disruptive. One approach is to locate the “hot” data and either duplicate or relocate it into a postmodern database.

I could write a lot more on this topic actually and there’s much to be said.

What I wanted to show is that NoSQL is more about shifting the balance a little less on the relational side than killing RDBMS. Maybe the term AltSQL is a better one as it is a reminder that we’re trying to find new, not trying to demean existing ones.

As for us we will stick to the term postmodern database for now and throw a party for overloading another customer’s network (true story that will be the topic of another post).