Do NoSQL Databases Need Schemas?
If you’ve been working with databases at all, you know that NoSQL is the new hot topic. If, by ‘new’, you mean something that’s been around since the 70s. Jokes aside, NoSQL has largely filled a gap that SQL has had quite a lot of trouble filling. Traditionally, SQL databases tend to be very costly, from their vertical-only expansion to a large amount of design required to be done on the schema before the database is even made. As such, NoSQL was developed to counteract SQL, being both horizontally expandable, and not even needing to use a schema at all?
NoSQL and Schemas
For a lot of people just entering NoSQL, they get attracted by buzz phrases like ‘no need for SQL’ and ‘schema-less’, yet often fail to see the forest for the trees. While it’s true NoSQL was made as a response to SQL it was not made as a replacement, but rather as a way to enhance and compliment it. More specifically, this lack of a schema means that NoSQL is incredibly flexible, and can store data in tons of different NoSQL data models.
That doesn’t mean that NoSQL can’t use a schema though, and that’s where a lot of people get tripped up. After all, NoSQL data can be ugly, random, chaotic, and repeated ad infinitum (SQL is made specifically to route out duplicate data, which NoSQL does not). As such, unless the whole pipeline is dealt with only by a computer, which it won’t because data science isn’t perfect, having a schema can certainly be useful.
Designing a Schema for NoSQL
Since NoSQL is very much suited for expandability, probably the main scheme design considerations are scalability and performance in terms of the data model. Emphasis is especially placed on optimizing data access, which ultimately tends to rely a lot on querying. Therefore, schema design in NoSQL focuses on planning for keys and indexes that specifically complement workflow to be fast and efficient.
Of course, there are several ways to go about choosing a primary key or deciding which fields should be indexed. For this, you’ll definitely want to consider how the user deals with or will want to deal with, the data. Looking back at previous querying can give you a good hint of how users use the database on a day-to-day basis and work well as a launching-off point.
This sort of query-driven design generally requires, at a minimum, the inclusion of business data entities, user requirements & specifications, and finally the query patterns of said users if that sort of data exists.
Writing the Perfect Code for Your Database
Once you have those basic ingredients you can start designing the schema, and a good starting point is designing the custom, table-like structures of NoSQL databases. For this step, it’s important to find a balance between writing a code that serves a single function, and something which can satisfy several. After all, helping decrease the overhead is still an important step, even with NoSQL.
That last bit will require denormalizing the data, as it’s essential to any NoSQL schema designs. While it’s not necessarily an exact science, the two best ways to approach denormalizing data is either through referencing or embedding. This then can allow for core design patterns like 1:1, 1:N, or M-N relationships.
Specific Primary Keys
After that is established, the next step is to design the primary keys. Unfortunately, I can’t provide you much help here because each NoSQL database architecture is different, and knowing how each implements its primary keys is fundamental to this step.
Finally, you will need to design the indexes, and similarly to the step above, it varies a lot depending on what NoSQL database you are using. Nonetheless, there are a few design concepts you should consider:
- Creating a consolidated list of attributes as predicates for queries can help you design more efficient indexes. Of course, you should avoid creating too granular indexes, as that just decreases efficiency.
- To the point above, array indexes should only be designed if all the attributes in the array are required. Keeping the array size minimal is crucial if you plan to index.
- Special indexes should be avoided on indexes with complex datatypes.
Editing a NoSQL Schema
Given NoSQL’s propensity for flexibility, making changes to the schema is easy, and essentially leads to a life-time process of designing and implementing schema changes. This might sound like a chore when starting out, but ultimately is great when you’re a few years down the line and realize that you need to make a very important change. NoSQL left on its own without a schema can often lead to anarchy, and therefore creating some form of schema can be useful. You don’t have to, especially for smaller applications, but don’t think that going the NoSQL route is going to save you from not having to create a schema.
About the Author
Alex Williams, Writer/Researcher at Hosting Data UK, is a seasoned full-stack developer and an expert on all things NoSQL.
Sign up for the free insideBIGDATA newsletter.