Understanding NoSQL Databases: Document Stores

Print Friendly, PDF & Email

Document-oriented databases (also called: aggregate databases, document databases or document stores) place each record, and its associative data, inside single documents. This database type is a subset of the NoSQL umbrella, which refers to the growing list of popular database management systems that use ‘non-relational’ models — i.e. databases that don’t rely on Structured Query Language, SQL

NoSQL was created to fill in the gaps found in SQL databases, which are sometimes referred to as RDBMSs, i.e. relational database management systems, as most use SQL. Relationals today still have the edge in popularity over non-relationals, because — while the term ‘NoSQL’ wasn’t coined until the 21st century, despite being around since the late 1960s — NoSQL only boomed in popularity in the last three decades due to exponential big-data-cloud innovations, and the internet’s extreme popularity and UGC-demand. 

That said, big data non-relationals like MongoDB and Redis are now in a similar league to old established relational giants like Oracle and MySQL, and trending. 

Uses of Document Stores

So, why are document-oriented databases relevant to big data? 

It’s tied up with general demand-increases for modern NoSQL databases of the last few decades. NoSQL offers a way to organize massive volumes of unstructured data, making it a more scalable and agile-developer-friendly partner, in many instances, than SQL. Plenty of exceptions exist, however, and should be noted: hybrids like PostgreSQL — and NoSQL graph databases (GDB) — in fact, have varying abilities to work with both structured and unstructured data without losing integrity: that means high scalability, high flexibility, with richly reliable querying.

My thoughts are that this melding together of two worlds will continue. At any rate, document stores have an wide range of uses today:

  • Web applications — blogging platforms, detailed web insights, e-commerce apps, user preferences, content management systems.
  • Gaming — accomplishments (eg. completed challenges, live stats), in-game chatting/messaging, leadership score tracking, subscriptions. 
  • User generated content — chat logs, user ratings, user comments, tweets, blogs.
  • General storage/logging — user accounts, cataloguing items, storing logs, real-time insights.

Among other uses. Examples of much-used document store systems include MongoDB, CouchDB, OrientDB, and DocumentDB. See here for a comprehensive list of modern NoSQL databases.

Structure of Document Stores

Let’s do a thought experiment: you’re using a relational database to store data that, as expected, is contained in various tables. The objective is to map various book authors. One table maps the author name, release date, and genre — with an associative author id. Some of these authors have series; so you have a separate table for album names and ids. And finally, a table for the different genres and genre ids. 

You diagrammatically relate these three tables, creating primary and foreign key fields; linking these tables together in a structured design for querying purposes. That’s your basic structure completed. 

In comparison, document databases do not depend on a table schema. Each entity is placed inside a single document, and associative data can be found inside that single document. In fact, you can begin loading data without having a schema already in place: your database doesn’t need to be organized and structured — you don’t necessarily have a need for columns, tables, primary/foreign keys, relationships, stored procedures, so on.   

On one hand, this allows for greater variation in data, integration, and modelling; on the other, there is less ability to enforce acute relationships between entities. For this reason, SQL remains the major force in OLTP transactions such as ATMs, where rigid and extremely foolproof systems are needed to ensure business logic and trustworthiness at the database level. 

Overall, document stores rely on key-value stores, which are not equally adept at creating these enforcement rules. SQL has better performance when targeting highly localized areas of your database, and works well for uniform data; i.e. for things like personal accounting. But for gaining meta-insights and speedy connections across vast swaths of ever-evolving, rather unpredictable big data stores — using this simple key-value program to quickly retrieve unstructured data — document-oriented databases are naturally powerful. 

Relationships & Scaling 

Let’s glance at a few final key benefits, and trade-offs, of this ‘schema less/free’ model.

First, as is generally true for NoSQL databases, document stores do horizontal scaling very well, unlike SQL databases. Sharding — the storing of shared data across many thousands of machines — performs well in this model. By comparison, relational databases scale better vertically (eg. adding storage, memory, so on). For this reason, NoSQL is seen as the more natural counterpart to agile development and hybrid/multi-cloud.

Finally, because document stores don’t use foreign keys — which relational databases use in order to relate tables to one another — relationships need to be established at the application layer for document databases. Nevertheless, relationships should be less vital for people who have chosen to use document stores.

The Takeaway

Regardless of the specific industry, it’s becoming harder to deny that the world is moving towards an evolution in the multi-cloud — and with that comes the spreading of big data horizontally, making document-oriented databases, and corporate workforce training in data design intelligence, a growing need.

Largely, data stores have less difficulty handling the ins-and-outs of caching and indexing data; in keeping pace and working hand-in-hand with the creation of new web applications; and in horizontally scaling. More documentation is needed however, to take this out of small niche communities and forums. And more guidance, on the different available document stores, needs to be disseminated. For those able to maintain an interest, to keep track of the race, it’s a reflexive tool for easily adding datasets or scaling up quickly with less hassle.

About the Author

Alex Williams, Writer/Researcher at Hosting Data UK, is a seasoned full-stack developer and an expert on all things NoSQL. 

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind