Do NoSQL Databases Need Schemas?

If you’ve been working with databases at all, you know that NoSQL is the new hot topic. If, by ‘new’, you mean something that’s been around since the 70s. Jokes aside, NoSQL has largely filled a gap that SQL has had quite a lot of trouble filling. Traditionally, SQL databases tend to be very costly, from their vertical-only expansion to a large amount of design required to be done on the schema before the database is even made. As such, NoSQL was developed to counteract SQL, being both horizontally expandable, and not even needing to use a schema at all?

NoSQL and Schemas

For a lot of people just entering NoSQL, they get attracted by buzz phrases like ‘no need for SQL’ and ‘schema-less’, yet often fail to see the forest for the trees. While it’s true NoSQL was made as a response to SQL it was not made as a replacement, but rather as a way to enhance and compliment it. More specifically, this lack of a schema means that NoSQL is incredibly flexible, and can store data in tons of different NoSQL data models.

That doesn’t mean that NoSQL can’t use a schema though, and that’s where a lot of people get tripped up. After all, NoSQL data can be ugly, random, chaotic, and repeated ad infinitum (SQL is made specifically to route out duplicate data, which NoSQL does not). As such, unless the whole pipeline is dealt with only by a computer, which it won’t because data science isn’t perfect, having a schema can certainly be useful.

Designing a Schema for NoSQL

Since NoSQL is very much suited for expandability, probably the main scheme design considerations are scalability and performance in terms of the data model. Emphasis is especially placed on optimizing data access, which ultimately tends to rely a lot on querying. Therefore, schema design in NoSQL focuses on planning for keys and indexes that specifically complement workflow to be fast and efficient.

Of course, there are several ways to go about choosing a primary key or deciding which fields should be indexed. For this, you’ll definitely want to consider how the user deals with or will want to deal with, the data. Looking back at previous querying can give you a good hint of how users use the database on a day-to-day basis and work well as a launching-off point.

This sort of query-driven design generally requires, at a minimum, the inclusion of business data entities, user requirements & specifications, and finally the query patterns of said users if that sort of data exists.

Writing the Perfect Code for Your Database

Once you have those basic ingredients you can start designing the schema, and a good starting point is designing the custom, table-like structures of NoSQL databases. For this step, it’s important to find a balance between writing a code that serves a single function, and something which can satisfy several. After all, helping decrease the overhead is still an important step, even with NoSQL.

That last bit will require denormalizing the data, as it’s essential to any NoSQL schema designs. While it’s not necessarily an exact science, the two best ways to approach denormalizing data is either through referencing or embedding. This then can allow for core design patterns like 1:1, 1:N, or M-N relationships.

Specific Primary Keys

After that is established, the next step is to design the primary keys. Unfortunately, I can’t provide you much help here because each NoSQL database architecture is different, and knowing how each implements its primary keys is fundamental to this step.

Finally, you will need to design the indexes, and similarly to the step above, it varies a lot depending on what NoSQL database you are using. Nonetheless, there are a few design concepts you should consider:

Creating a consolidated list of attributes as predicates for queries can help you design more efficient indexes. Of course, you should avoid creating too granular indexes, as that just decreases efficiency.
To the point above, array indexes should only be designed if all the attributes in the array are required. Keeping the array size minimal is crucial if you plan to index.
Special indexes should be avoided on indexes with complex datatypes.

Editing a NoSQL Schema

Given NoSQL’s propensity for flexibility, making changes to the schema is easy, and essentially leads to a life-time process of designing and implementing schema changes. This might sound like a chore when starting out, but ultimately is great when you’re a few years down the line and realize that you need to make a very important change. NoSQL left on its own without a schema can often lead to anarchy, and therefore creating some form of schema can be useful. You don’t have to, especially for smaller applications, but don’t think that going the NoSQL route is going to save you from not having to create a schema.

About the Author

Alex Williams, Writer/Researcher at Hosting Data UK, is a seasoned full-stack developer and an expert on all things NoSQL.

Sign up for the free insideBIGDATA newsletter.

Comments

Pascal says

January 24, 2021 at 8:35 am

Hi Alex, Great article! As a matter of fact, schema design is even more important with NoSQL, given the power and flexibility without guardrails, than for RDBMS. If you schema grows to a certain level of complexity, architects and developers will want to use a tool such as Hackolade which supports all of the popular families of NoSQL and leading vendors, while facilitating an agile approach to development. take a look at https://hackolade.com

Akshay Raut says

April 7, 2021 at 1:22 pm

Very well articulated points. I was looking for such articles.

Do NoSQL Databases Need Schemas?

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Comments

Featured RSS Feed

More News from insideHPC

Do NoSQL Databases Need Schemas?

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Comments

Related Posts

Featured RSS Feed

More News from insideHPC