A NoSQL database is any kind of database that breaks away from the traditional design of SQL. NoSQL databases like the document-based MongoDB have become more popular in recent years. What’s all the hype about?
The Limitation of SQL: Scalability
SQL has been around forever—45 years. It holds up surprisingly well, and modern implementations of SQL are very fast. But, as the web has grown, so has the need for powerful databases that scale up to meet demand.
The easiest way to scale a SQL database is to run it on a more powerful computer. SQL databases can be replicated to reduce regional load on an individual instance, but splitting a table up (often called sharding) is much harder for SQL.
Document-based NoSQL databases fix this problem by design. Each document is independent from other documents in the collection, so the collections can be split over multiple servers much easier. Many document databases will include built in tools for sharding the data across different servers.
But, the scalability problem isn’t really an issue until you have a lot of data. You can easily run a SQL database with hundreds of thousands of users and have no issues, assuming your structure is sound and your queries are fast.
Both MySQL and MongoDB will likely get the job done for your application, so the choice between the two comes down to which structure and syntax you prefer. Ease of development is important, and you may find that the document model and syntax of the much newer MongoDB is easier to work with than SQL.
NoSQL vs. SQL Structure
Traditional SQL databases are often called relational databases because of the way they are structured. In a SQL database, you will have multiple tables, each containing multiple rows (called records), which themselves have multiple different columns, or attributes. Each separate table is linked to the other through a primary key, which forms a relation.
For example, imagine you have a table with each record representing a post made by a user. The primary key here is the username, which can be used to link the posts table to the users table. If you wanted to find the email of whoever made the post, you’d perform a search for “Jon1996” in the users table, and select the “Email” field.
But this data structure might not work for everyone. SQL databases have a rigidly defined schema, which can get in the way if you need to make changes or would just prefer to have a different layout. With complex datasets, the relations between everything can grow more complicated than the data itself.
The main kind of NoSQL database is a JSON document database, like MongoDB. Instead of storing rows and columns, all the data is stored in individual documents. These documents are stored in collections (e.g., a “user” document would be stored in a “all users” collection) and don’t have to have the same structure as other documents in the collection.
For example, a “user” document may look something like this:
The username and email fields are just key-value pairs, similar to columns in SQL, but the “posts” field contains an array, which is not something you will find in SQL databases. Now say we had a posts collection with documents like:
Now, when someone visits Jon’s page, your application can fetch three posts with the IDs of 1, 2, and 3, which is usually a fast query. Compared to SQL, where you may have to fetch all posts that match Jon’s user ID. Still quite fast, but the MongoDB query is more direct and makes more sense.
What Are NoSQL Databases Good For?
NoSQL is a broad category, and includes many different kinds of databases built with different goals. Each database is a tool, and your job may require a specific kind of tool, or even multiple different tools.
SQL databases like MySQL, Oracle, and PostgreSQL have been around since before the internet. They’re very stable, have lots of support, and can generally do the job for most people. If your data is valuable to you, and you want an established, consistent solution, stick with SQL.
JSON document databases, like MongoDB and Couchbase, are popular for web applications with changing data models, and for storing complex documents. For example, a site like Amazon may often have to change the data model for storing products on the site, so a document based database may work well for them.
Document databases are intended to be the generic replacement to SQL, and are probably what you think of when you hear “NoSQL.” They’re also more intuitive to learn than SQL, since you won’t have to manage relations between tables or complex queries.
RethinkDB is a JSON document database built for realtime applications. In a database like MongoDB, you’d need to poll for updates every few seconds, or implement some API on top of that to track realtime updates, which gets heavy quickly. RethinkDB solves this issue by pushing updates automatically over websocket streams that clients can connect to.
Redis is an extremely performant key-value database that stores small keys and strings entirely in RAM, which is much faster to read and write to than even the quickest SSDs. It’s often used alongside other databases as an in-memory cache for small data that gets written to and read from often. For example, a messaging app might want to use Redis to store user’s messages (and even push updates in realtime with their Pub/Sub methods). Storing many small messages this way might cause performance issues with other types of databases.
Graph databases are built for storing connections between data. A common use case is social networks, where users are connected to each other and interacting with other data, such as posts they’ve made.
In this example, George is friends with two people, Jon and Jane. If another kind of database wanted to understand George’s connection to Sarah, they’d have to query all of Jon’s friends and all of Jane’s friends. But graph databases understand this connection intuitively; for the friends of friends query, the popular graph database Neo4J is 60% faster than MySQL. For friends of friends of friends (3 levels deep) Neo4J is 180 times faster.
Wide column databases like Cassandra and Hbase, are used for storing massive amounts of data. They’re built for datasets that are so large you need multiple computers to store it all, and they’re much faster than SQL and other NoSQL databases when spread over multiple nodes.