Have the tables turned on NoSQL?
NoSQL climbed up the charts as the next big thing in system architecture in 2011, but overall interest in it has plateaued recently. You’ve may have heard of it and ignored it, safe in the knowledge that you can always have a an SQL command line at your fingertips. But what is NoSQL, what does it have to do with modern development, and is it worth implementing in your project?
Let’s find out.
Sysadmins managing big projects know a few things about traditional SQL databases. First, they are notoriously hard to scale, making it difficult to spread data across services or geographic regions. A small mistake in a single file can tank an entire database. And while SQL statements are fun, it’s easy to drop all tables while futzing with a key or corrupting an entire repository with a malformed query.
The goal of a NoSQL database, on the other hand, is to ensure ultimate scalability by making sure that the data is stored in a format that can be shared—or sharded—across multiple servers. NoSQL databases scale far more linearly than relational databases, i.e. ones that depend on various keys shared across tables. NoSQL databases come in a lot of flavors:
- Indexed document stores like MongoDB
- Graph databases like Neo4j
- Column stores like Cassandra
- Time-series databases, which index data by time stamps, like InfluxDB.
- Hybrid forms that use multiple of the previous paradigms
Some of them even store in a table format. The commonality between all of them, though, is that no matter what format they store data in, these databases don’t support relations between data.
NoSQL is no joke
Understanding NoSQL databases takes a minute of comprehension. Traditional SQL uses related tables connected by IDs. A single domain entity might be fragmented or normalized across multiple tables, which means the overhead necessary to ensure a record is accurate can be immense. Instead of, say, a table for user IDs and then a table for addresses, NoSQL lets you create a generalized user object that holds everything important about those users. The benefit of this is that you can easily replicate the database across multiple devices, ensuring the ability to scale and replicate.
Further, a NoSQL database allows for fast access to lots of data. An SQL, or relational database, is excellent for data processing—creating granular connections between pieces of data. A NoSQL database is great for finding one piece of data quickly and operating on it. There’s little to no searching; it just gives you the user data.
How it works
Many types of NoSQL databases are designed for fast data lookups. Instead of writing a complex query, many use a single value—a key, a timestamp, a document—and pull data stored under that value. That is, if you expect to want to know the details for a user account, then all the user data can be retrieved by reading that user’s record. The relationships between different records are unimportant and data can change—one record can hold multiple addresses while another can hold none.
Because companies like Google and Amazon created these databases for their own massive data stores, the goal was to reduce the time needed to grab a piece of data. In fact, NoSQL forgoes the traditional database expectations of atomicity, consistency, isolation, and durability—ACID—for a far looser interpretation of data storage.
Using a NoSQL database doesn’t mean you can’t use SQL; SQL is just the query language. In fact, NoSQL and SQL can be complementary. Some NoSQL databases use SQL to search the data. Those that don’t can be either analyzed using a SQL query engine like Presto or sent through a data pipeline to more analyzable data warehouses. To be fair, a good data pipeline requires sophisticated ETL processing to get the end data into a usable state.
Because an SQL database uses a schema or structure, this means changes are difficult. Say you’re running a production database full of a million records. Adding a single field is a nightmare and could trash the entire database. Further, connecting those million records through joins is hugely expensive. This means you can very easily search for a particular piece of data and connect it with another piece when you’re looking for a few records and a few tables. Multiply that out, however, and you’ve got a headache.
NoSQL databases like MongoDB just take data and store it. Want to add a field? Add it to the next record stored. Want to ignore a field? Just don’t read it. You can add multiple addresses to a user record, for example, or none. You can add a last name or avoid adding a last name. And because you can shard the data, you can send some data to a server in an untrusted jurisdiction and other data in a trusted jurisdiction. The database considers each chunk as part of the whole.
Querying data is a little harder. Apache’s Cassandra uses Cassandra Query Language or CQL which, interestingly, does not allow for joins. MongoDB just sends JSON objects in reaction to requests. Need all users in Ohio? MongoDB sends a big chunk of data. Want to delete all users in Spain? MongoDB will run the search and perform the action.
Further, there is no need to ping every server to get a piece of data. The closest server will share nothing with other servers and instead return what it has. At some point all the data replicates but each server works in a vacuum. This means that changes to records on one server won’t affect a query made on another server.

Benefits of NoSQL
NoSQL databases—MongoDB being the most popular—are great for scaling. Because the databases use sharding to partition data on multiple machines, you can ensure that the right data is in the right place at the right time. Further, an outage on one machine won’t take down the entire network. As data grows, the database can simply expand to another device as needed and shrink if things slow down. You can also, say, store geo-specific data in geo-specific servers, ensuring that calls from a certain country are faster on data relating specifically to that country.
Next, NoSQL databases offer high availability. Because the data is simply a single file, you can copy backups from other servers on the network. If a server fails, another server can take over that server’s shard and incorporate it. The data is constantly replicated and safe.
The problems
NoSQL databases don’t offer much in the way of transaction management or real coding. They are great for storing data that doesn’t change much or changes minutely with every transaction. NoSQL systems have been daunting for new users to approach. While hosted solutions are available, running your own simple instance isn’t as easy as, say, spinning up a MySQL server.
Finally, because the entire database can have a lot of duplicated data, the actual database is quite large. There are a number of types of NoSQL databases, with the document-based solution being the most prevalent. However, you can also use key-value databases like Redis as well as tabular ones like Hbase and Acculuo.
A key-based solution like Redis is a bit more familiar to admins and Redis in particular is performant because it stores much of its data in memory. Tabular databases like Hbase offer a slightly different system that focuses on, according to the documentation, “very large tables—billions of rows X millions of columns—atop clusters of commodity hardware.”
If NoSQL provides so much freedom and flexibility, why not abandon SQL entirely? The simple answer: Many applications still call for the kinds of constraints, consistency, and safeguards that SQL databases provide. In those cases, some “advantages” of NoSQL may turn to disadvantages.
Traditional relational databases have long caught up with the novelty that some NoSQL databases promised. They’ve massively improved their sharding functionality, so you’re no longer limited to scaling vertically. They introduced more lenient data types; you can store JSON in PostgreSQL, MySQL, SQL Server now, giving you a MongoDB-like experience.
There are a number of problems with NoSQL databases, the first one being a dearth of sysadmins who can maintain them. Implementing a NoSQL database is a real endeavor and picking the right provider and manager is tough. If you’re in the position to need a massive database you might be in the financial position to pay for that expertise but smaller companies may have to wait.
Further, understanding the NoSQL model is difficult for developers used to coding for SQL systems. Because much of the structure must happen in the application, a developer could go into a dev project expecting certain constraints to be met or errors to throw on duplicate rows. Instead, this logic must be managed in the application itself. NoSQL solutions offer faster and more performant data storage but that’s about it. You, the developer, have to step in to manage the various relationships.
Finally, because NoSQL is not consistent, roll-backs are impossible if something goes wrong. Further, some parts of the database may return inconsistent information—one example experts offer is that an SQL database will return the right bank balance all the time while a NoSQL solution might return a different balance based on the server. If that sounds scary you might want to rethink your choice. This happens in real life when you search for orders on ecommerce sites like Amazon. In some cases the data takes a few seconds to appear because it must be populated throughout the network.
Believe the hype?
First, we have to remember that NoSQL databases are probably great for Amazon and Google but not so great for your side hustle. The performance benefits become more obvious the greater the scale of your database. Implementing them sounds like fun and it’s a great way to become conversant in a brand new technology, but you could probably do that by reading a few FAQs and trying out a MongoDB install for yourself. Using a NoSQL solution for a small ecommerce site or recommendation engine might not work out so well. A consensus has emerged in conferences and blogs that SQL is the gold standard—with a lot of emphasis on PostgreSQL—and you should use it by default, only deviating if you have good reasons to use NoSQL.
That said, big companies that need the kind of speed that NoSQL offers use these databases, and NoSQL skills are in demand. You can grab a nice salary if you can support someone else’s NoSQL database. By the time you’re ready to implement a NoSQL solution of your own—in a side project or over a massive data store—you’ll be fully versed in the pros and cons and, to paraphrase Kenny Rogers, you’ll know when to shard ‘em, know when to JOIN them, know when to use a schema, and know when to use none.
Tags: databases, noSQL
46 Comments
“[I]t’s easy to drop all tables while futzing with a key.” Not only would that not be easy, I’ve never heard of it happening to anyone in my entire career. which has been long and full of SQL.
Yeah, this whole post is full of FUD, and sounds like it was written by someone that has never really spent any time developing database-backed systems.
Agreed.
Caught by the headline, thinking I was going to learn interesting things about NoSQL, instead I had to put up with stuff like, “A small mistake in a single file can tank an entire database,” “Adding a single field is a nightmare and could trash the entire database,” and “corrupting an entire repository with a malformed query.”
Since all those things aren’t true, how can one tell from this article that NoSQL solves these non-problems?
ION.. agree.. Havent heard of that happening either. Thank goodness!!!
Even after 10 years of commercial use, (and I have worked with Mongodb and Cassandra for almost the same time) the evangelists of NoSql still fighting for their share with RDBMS. That itself talks volumes.
Its not lost on me that every time I come across MongoDB in the wild and I ask why it was chosen, I’ll first get the usual propaganda line of it being Web Scale or whatever , but then after further probing the dev admits he doesnt really understand relational databases.
The idea that Relational doesnt scale has always been nonsensical. Some of the largest sites on the net have relational databases backing them as primary stores. Oracle doesnt get away with absurdly huge prices for their ridiculous databases by crumbling at a million records. All it requires is that you actually pay attention in database classes, and be prepared to hit the manuals for things like sharding or partitioning and perhaps the occasional wikipedia refresher on normalization.
There ARE some use-cases for nosql. Its a model thats traditionally been a poor choice for graph databases (relational query planners tend to perform badly at cyclic references), and KV stores make a lot of sense for fast and ephemeral data like session stores. I’m yet to be convinced on Time series. Postgres’s Timescale store can keep up with the best, and there are no obvious advantages at all to dedicated GIS databases, PostGIS is a monster.
But for 99% of the use cases, Relational is still the king for both performance, scaleability, and flexibility and in 30 years of coding professionally I’m yet to be shown any evdence otherwise.
Consistency is a tricky thing. There are applications where it does not really matter (like seeing your new order on amazon 30 seconds too late) but there is a lot of applications where hell breaks loose if they get inconsistent. Just think of all financial systems.
And for those there is an added complexity. They have working COBOL accessing a relational store and are certified for this. Nobody is touching this if is not necessary 😉
Financial systems are always brought up to defend RDBMS because they supposedly all require ACID transactions, but those of us who actually work on them understand that most transfers between accounts in the financial world don’t happen on the same database instance so relying on ACID transactions for consistency is useless. Every single ledger application ever written uses two phase commits on items stored in a specific table to manage transfers of funds between accounts.
Consistency is also not unachievable in a NoSQL database. Most of them offer Primary or Consensus Reads to guarantee consistency.
That doesn’t include the half-dozen accounting systems I’ve worked on!
More FUD. I don’t like software that is based on FUD. I wanted to hear what was good about NoSQL, not about anecdotal boogeymen hidden inside conventional SQL.
The one thing I learned about NoSQL in this article was scary, “an SQL database will return the right bank balance all the time while a NoSQL solution might return a different balance based on the server.”
Okay. I understand the need for “soft data” and probabilistic queries. I’m willing to get back a faster Google search that might be missing a few hits. I’m fine with Shamazon showing me things I might be interested in, without showing me all the related items — it might even be an advantage to get back different results with the same query in such cases!
But it seems one must still use SQL when one wants hard data and absolute queries.
This is definitely not true nor do ledger applications constitute the bulk of financial records.
Having an account isolated to single database is all I have seen… never what you describe…..not in trading
This looks to be extremely over-generalized and treats all NoSQL as if they were essentially the same.
NoSQL stands for “Not Only SQL” and it means nothing more than the idea that you should choose your data storage to fit your data, instead of trying to fit everything into rectangular tables. It encompasses literally *everything* that is not SQL. In fact, “not only” SQL arguably even *includes* SQL!
The article pretends that non-relational databases are the same as relational non-SQL databases (yes, they exist) or hyper-relational databases, or “post-SQL” relational databases like D / Tutorial D / Rel. (Yes I know, they are not technically databases but database query languages for relational databases.) The article pretends that a key-value store or a NF² relational database are the same thing.
Nothing could be further from the truth.
I had to particularly laugh at the claim that all NoSQL databases always store all data in a single file. I am imagining Google’s BigTable or Amazon’s Dynamo right now. Also, the most-widely used SQL database in the history of SQL databases with more installations than all other databases combined, SQLite, stores its databases as a single file.
“NoSQL is not consistent” – Some NoSQL databases offer *better* consistency guarantees than SQL. Some offer *different* consistency guarantees. Just because a database does offer different consistency tradeoffs doesn’t mean it “is not consistent”.
Some *do* have lower standards for consistency (for example *eventual consistency*). But I don’t need the view counter on my webpage to be consistent. All I need is an order of magnitude approximation. I *do* want my bank account to be consistent, but I simply would not use an eventually consistent DB for that.
Thank you. This is a great reply – choose the right tool for the job. People trying to pit one against the other are just stupid.
NoSQL stands for “Not Only SQL”
Thats marketing nonsense.
The Term literally meant “Non SQL”. Someone at some point backronymed it to “Not Only SQL”, but thats a very revisionist take. When Carlos Strossi coined the term in 1998, he literally said it means “Non SQL” and when the term re-emerged a decade later it was extended to mean “non relational” (Though Strossi’s original NoSQL db was in fact relational)
I’m ambivilent on alternatives to SQL syntax, but I do believe non relational should be chosen as a manner of last resort, not first choice because if 30 years of working with databases has taught me anything its that you’ll see a lot more migrations away from NoSQL databases than migrations to them, because except for very specific usecases, non relational databases are almot always the wrong choice.
“Because an SQL database uses a schema or structure, this means changes are difficult. Say you’re running a production database full of a million records. Adding a single field is a nightmare and could trash the entire database. Further, connecting those million records through joins is hugely expensive”
This is most of the times NOT true
Yep, and it also ignores the *very same thing* applies to NoSQL type databases as well. (And in fact is often harder). The Schema protects data integrity. If you add a field, you need to tell it what to do with pre-existing records. Nulls, defaults etc. With NoSQL , you add fields to a table, and suddenly you have a whole lot of records that dont have anything to satisfy the code that requires that new field. It requires a whole lot of extra handling and therefore code fragility.
And yeah, Joins are not slow. In fact on a lot of databases a well designed schema and query will be *faster* with joins.
“First, they are notoriously hard to scale”
Scaling is the moat over-hyped buzzword in existance. The simple truth is that *vast* majority of applications never outgrow a two-server setup.
“A small mistake in a single file can tank an entire database.”
Tell me how.
” And while SQL statements are fun, it’s easy to drop all tables while futzing with a key”
Again, how? At which point do you try to change a key and then inadvertently type “DROP TABLE xxx”? And how do you have permission to drop tables?
” or corrupting an entire repository with a malformed query. ”
malformed queries don’t run, and thus they don’t corrupt anything. You’re probably thinking of queries that don’t do what they are designed to do, and in that case: tht is equally true for NoSQL databases.
I get the feeling that this article is written by a NoSQL fanboy with no real experience in SQL. Just another one of those junk articles I guess. Stack overflow isn’t what it used to be.
> ” And while SQL statements are fun, it’s easy to drop all tables while futzing with a key”
> Again, how? At which point do you try to change a key and then inadvertently type “DROP TABLE xxx”?
But that would only drop ONE table! To actually drop all tables is difficult enough that there are dozens of questions on SO about how to do so. Also, in what world does anyone “futz with a key”.. what does that even mean?!
With all due respect to the authors, I found this article quite ridiculous.
While I agree with most of the comments here, I disagree with that last sentiment. Reading through to the end of the article, you find that the author’s position essentially is “horses for courses” – it is not that NoSQL is the be all and end all, but rather it has its place and perhaps doesn’t actually suit the majority of organisations anyway.
I’ve been trying to figure that one out. The best I can work out is maybe a Cascade constraint policy might trigger record loss if something trully ignorant has happened somehow.
Thanks for the comparison. I often feel like I’ve been missing out by not using NoSQL professionally. I haven’t yet had a need that justifies the overhead aside from personal projects. I imagine that will change though as my career advances.
Perhaps “interest in NoSQL has plateaued” due to its high snake oil content.
☝️ This!
The article is full of FUD and common misconceptions around NoSQL. The authors do not really understand NoSQL data modeling and have limited experience with the technology. In NoSQL database all objects are stored in a single table/collection and then indexed across common attributes to produce groupings of objects that are interesting to the application. This technique eliminates the need to “join” tables and dramatically reduces time complexity by converting all queries to a simple index scan. The impact of this is improved cost efficiency at any scale. The authors of this article clearly do not understand this concept.
Yes the schema works like how one wants to retrieve the data
E.g in C* , the where thing will just work how you have designed the schema.
NoSQL has its use cases. I’m actually much more familiar with working with NoSQL databases than SQL databases because of the job I happened to get as my first job out of school. It’s analytics and recommendations for e-commerce. Immediately, I was working in an environment with TBs of data coming in every day where the use cases were quick aggregation results over this data no matter how big it grew. A PostgreSQL database would really struggle with that.
That said, when I’m starting up a side project, my go to is a single PostgreSQL server. It always tends to do the trick. It’s like having an umbrella. Having the ability to do relational queries and not needing to is better than needing to do relational queries and not being able to. I only choose NoSQL from the beginning if I understand use case enough to know that NoSQL is best suited to solving it.
Good choice. Postgres is my prefered weapon of choice too (And just on the sneaky, look up JSON fields and querying them, but dont make a habit of it).
W/ regard to joins though, I do recomend everyone spend some time with a textbook or online tutorial on normalization. My general response to “We dont use Joins” is “Why the hell not?”.
Blah-blah, perhaps written by someone who owns stock in MongoDB Inc. “Technology X is great! You gotsta gitcha sum!”. (*yawn*) Been there, done that, got the T-shirt…
Did you read the article? I did not get the impression at all that the author is trying to promote MongoDB in any particular fashion – just merely pointing out that it is one of the more popular NoSQL technologies at present, which as far as I am aware is the truth.
love the title 😂
Whats interesting is your own research doesn’t back up the premise of the article.
Here is a summary of the most recent annual Stack Overflow developer survey that I wrote (disclosure, I work for MongoDB):
https://www.mongodb.com/blog/post/stack-overflow-research-most-wanted-database
NoSQL is an alternative. Long, long ago I wrote a paper about this and my conclussion is the same after all this years. These modern databases are quite good for horizontal scalability, flexibility and performance. But they also bring some disadvantages. So, in any case, software architechs and data architects need to decide which ones to use based on many criteria including peformance, consistency, partition tolerance, data formats, business’ data structure, complexity of queries, ACID, latency, frequency of reads/writes, etc.
Remember the original Object-Oriented DBs? Objectstore, Gemstone, etc.? Worked with both, SQL DBs were soon to be gone. Then worked with Mongo and Couchbase, again they predicted the end of SQL DBs. Cobol, mainframes and SQL DBs, rumors of their demise are greatly exaggerated. Just pick the right tool for the job, they all have their place.
Bravo on the Kenny Rogers callout!
I thought MongoDB and other object oriented dbs are meant to have fast writes (And compared to SQL this is truth, it is about 5 times faster), and simple scalability (But I met some SQL dbs that were well scaled), but in means of read speed, my experience is that on same _single_ machine SQL reads was faster (2x on simple list / read – even including joins in SQL to simulate complex document, much more faster when complex query was applied)
The author writes like a sales guy “thinking” he’s technical. Yeah, he may be good at talking to end-users with FUD content to get the organization to change and buy. However, as soon as he schedules a meeting with pure techies, he’s marking internal emails and messages as “URGENT: need technical pre-sales to support!!!”. An emphasis will be on technical and a lot less on pre-sales.
Betteridge’s law of headlines says no, so let’s just move on.
I have worked with both SQL and NoSQL databases. Each have their own strengths and weaknesses. Each is designed for a particular set of use cases, so one is not always better than the other.
I have also invented my own general-purpose data management system that handles unstructured, semi-structured, and highly-structured data; so it is a file system, RDBMS, and NoSQL hybrid. While there is still a lot of work to go on it, it does some amazing things with traditional relational table data as well as Json documents.
It can create a table (e.g. hundreds of columns x millions of rows) and query the data about 4x faster than Postgres, MySql, or SQL Server (all without needing to generate and maintain column indexes). You can add (or delete) columns to an existing table without needing to reload the data. It can create three-dimensional tables (e.g. a row can have 0 to N values in the ‘address’ column) so it can easily ingest and query Json documents.
Anyone interested can download the software and check it out on their own data sets. http://www.Didgets.com
“First, we have to remember that NoSQL databases are probably great for Amazon and Google but not so great for your side hustle.”
Interesting choice to put this in the newsletter next to a sponsored post from MongoDB: “Scaling Your Startup with MongoDB Atlas: A Series”
“running your own simple instance isn’t as easy as, say, spinning up a MySQL server.” – HAHaahaha
Say this one more time to MongoDB
This article appears to be a vehicle on which to deliver a dad joke.
Nice 🙂
Having been burned porting an application to Mongo mainly because of the ‘next big thing’ tag, I think there are some important, and really pretty obvious things not mentioned here.
NoSQL means NO SQL. SQL is surely the most prevalent programming language in business settings and applications. It’s a uniform and highly understood way to get at data, which is ingrained into workflows. Deciding you’re not going to do that anymore is a massive decision. And yes I know about SQL drvers for Mongo, but that is a) extremely weird and b) only on the enteprise edition. There can’t be a business anywhere that thinks training everyone to write those imposible json statements is a practical idea. Certainly none of our clients or BAs were interested.
The API’s aren’t much fun, quite apart from the queries themselves. After having had to make a C++ and C# app play nice together with a secure mongo connection, all I wanted to do was go back to good old ODBC.
We ended up ditching our entire ‘new’ architecture and going back to sql server, after seeing zero benefits in terms of performance and all the above problems. There are of course cases where one of the noSQL DB’s makes perfect sense (redis especially as it does such a specic thing) but my advice would be to think hard.
This is an extremly disappointing article. I expected to learn something about how to use NoSQL Databases or to migrate to NoSQL, istead I found too many misunderstandings and about SQL and NoSQL and practically nothing of interest for a working software engineer.
I use SQL databases in serious projects since about 20 years and I think I know the advantages and disadvantages very well. The authors obviously do not and just repeat common misconceptions.
Why is such an article featured in the newsletter? If this happens more often, I will surely drop the newsletter soon.
PS: The misunderstandings I am talking about are pointed out in previous comments, so I think I do not have to repeate all of them again.
I think the shine has worn off everyone going that way because it was the new thing. But, as many others have said, “the right tool for the job” is the guiding principle. I have an application that runs on RavenDB, and its most common task is appending to an array in one of the documents (or storing a new one, if it’s the first access). The structure works for that one. For another one, though, it started with MySQL, moved to PostgreSQL, then MongoDB, then RethinkDB (sigh – the Betamax of document DBs), then back to PostgreSQL. (Yes, this is one of my applications that I use to test technology.)
Re: the FUD claims – given ON DELETE CASCADE foreign keys and a regretfully-omitted WHERE clause on your DELETE statement, you can truly trash a database. The article isn’t saying that you can’t work around these issues with a relational database, but unless you know they are issues, they can bite you. NoSQL was supposed to fix all that – and, in some cases, it did, but it brought its own gotchas. You will always be bitten by not understanding the technology you’re using – which is good advice for those of us jumping on the web assembly train these days.
“The commonality between all of them, though, is that no matter what format they store data in, these databases don’t support relations between data.”
Take a look at FaunaDB.
It’s a game-changing NoSQL database that, among many other perks, can join data across collections.
This article appears to be written with very little regard for how SQL works – nearly all the arguments against SQL describe situations that do not happen in a SQL database. For example: “A small mistake in a single file can tank an entire database. And while SQL statements are fun, it’s easy to drop all tables while futzing with a key or corrupting an entire repository with a malformed query.”.
1. By “file” is the article talking about SQL scripts? Or corruption in data files? The authors should be aware of page checksums and security levels.
2. “futzing with a key” – how does modifying relationships between tables (keys) make it easy to drop all tables?
3. What is meant by “repository” in this context?
To second on the other comments, and with all due respect to the authors who I am sure have many other skills, I found this article ignorant with regard to the workings of SQL.