Photo by Samuel Scrimshaw on Unsplash
Photo by Samuel Scrimshaw on Unsplash

MongoDB: Godzilla of NoSQL

Frank Stepanski

--

King Kong ain’t got sh*t on me

This article assumes you have a basic knowledge of Node.

Databases let us work with large amounts of data efficiently. They make updating data easy and reliable, and they help to ensure accuracy. They offer security features to control access to information, and they help us avoid redundancy.

NoSQL

NoSQL databases were developed to provide an environment that supported change without requiring radical reengineering of the underlying data model, high volumes of data, and an architecture that was easy to scale.

There are 4 classifications of NoSQL databases:

  • key-value (Redis)
  • graph based (Neo4j)
  • column-family based (Cassandra)
  • document based (MongoDB)

Some strengths of NoSQL databases are:

  • Flexible
  • Scalable — vertical: improve capacity; horizontal: distribute over multiple servers; tolerance, caching, etc
  • Fast
  • Ease of deployment

NoSQL’s strengths lend it to a variety of modern applications including things like big data analysis, social networking, website personalization, user profiles and more. It’s super scalable. So it can operate as applications and websites grow to accommodate millions of users.

It’s flexible nature allows programmers to use agile methodology easily without needing to go through the paces of specifically modeling relational databases

SQL Databases (relational)

In comparison, a SQL database strengths are usually:

  • Long history in enterprise organizations
  • Structured
  • Reliable and consistent
  • Well-suited for most business transactions
  • Strict schemas for purposes of data integrity

Both types of databases store data differently and can be used for different purposes and also can be used together. So neither are replacing one another.

CAP Theorem

The diagram above refers to CAP theorem which is a belief from theoretical computer science that states that a distributed data store can only have two out of the three guarantees (Consistency, Availability, and Partition Tolerance).

Consistency would mean that data throughout the system is the same. It’s when accuracy is of highest importance, so you’re going to think banking, high-stakes transactions, and online deposits to your checking account.

Availability relates to users being able to write and read the data regardless of failures in the network.

Partition tolerance means that the system is up and running, working as expected even if some of the network is down.

But why should we care?

CAP Theorem helps us form intelligent decisions when choosing the right database for our needs.

Relational databases, overall don’t offer partition tolerance, nor do they offer high availability because they keep records that beta other users during updates. They do offer reliable, consistent data though.

NoSQL databases offer partition tolerance, along with subspecialties related to availability or consistency in general, depending on the specific NoSQL database chosen.

Relational Databases (SQL)

So relational databases offer extreme consistency but since data is found on only one machine generally, the partition tolerance suffers again. Also, because such systems lack records during updates, the availability isn’t very high.

  • Consistent
  • Not highly available
  • Not partition tolerant

NoSQL Databases

NoSQL systems that focus on partition tolerance and availability, or AP systems, allow users to write to one machine in this case, but they read from machine two, three, four, etc.

The consistent, most accurate data is on machine one. But the data, only makes it to the other machines in the systems after replication. Therefore, although there could be imperfect consistency in data, eventually as the machines that replicate data, the idea is that they will be consistent. So, they will have eventual consistency.

Examples of such systems include CouchDB, Cassandra, and DynamoDB.

Now, let’s look at NoSQL systems that focus on partition tolerance and consistency, or CP systems.

In this case, users read and write to machine one, and then replication occurs. The consistent, most accurate data here is on machine one. This data only makes it to the other machines in the system, after replication.

Examples of such systems include MongoDB, Hbase, and Redis, along with relational databases with failover.

  • Partition tolerant
  • Consistent
  • Not highly available

MongoDB: Godzilla of NoSQL

Originated in 2007, the ‘mongo’ is short for humongous.

Among NoSQL databases MongoDB stands alone as the most popular by a large margin, so yes, it is Godzilla.

So MongoDB is a database server that can run multiple MongoDB databases.

  • databases will contain collections
  • collections will contain documents
  • documents will contain fields

Collections are an organized store of documents, usually with common fields between documents. Documents are a way to organize and store data as a set of field-value pairs:

var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: {
first: "Alan",
last: "Turing"
},
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: ["Turing machine", "Turing test", "Turingery"],
views: NumberLong(1250000)
}

Behind the scenes on the server MongoDB converts your JSON data to a binary version of it called BSON (binary JSON), which can be stored and queried more efficiently.

BSON extends JSON model to provide additional data types, ordered fields, and to be efficient for encoding and decoding within different languages.

Every single document must have a unique _id field as a primary key. If you don’t MongoDB will generate it using ObjectID API. ObjectID is data type provided by the BSON format, uses timestamp, increment value, etc to create a unique ID every time. Other additional types provided by BSON are Date, Regular Expression and Binary Data.

Ecosystem

The company behind the MongoDB database solution is also called MongoDB. So what does the MongoDB Ecosystem look like?

MongoDB database — self-managed, cloud, mobile and enterprise solution

  • Compass — allows you to connect to your database and look into it with a nice user interface.
  • BI Connectors or MongoDB charts — allows you to connect different analytics tools.
  • Stitch — serverless backend solution that is a tool set which you can use to efficiently query your database directly from inside your client side apps.
  • Database triggers — service that allows you to listen to events in a database, such as a document being inserted and then execute a function in response to that and that function (i.e. send an email to a user)

High-level

Before we start, let’s review how MongoDB fits in a full-stack application.

In your application, you will have a front-end that can be a single page application (SPA). You’ll have a backend with your server side logic and then the data layer. The data layer will have the database and the MongoDB server.

On the backend server where you write your code you have drivers for the different languages (Node.js, Java, Python, etc). These drivers will interact with the MongoDB server.

The MongoDB server will actually not directly write the data into files but instead talks to a storage engine (the default is WiredTiger) which does that. The MongoDB server gets the query from your driver and then forwards that information to the storage engine and the storage engine retrieves or stores that data in files.

Taking a closer look at the data layer, which contains the server, the storage engine and the file system, we actually have to differentiate between writing and reading from files (slower) and writing and reading from memory (faster).

The storage engine actually does both.
It loads a chunk of data into memory and manages he data you often use is in memory if possible. It also writes data and memory at first, so that this is really fast, but then it always goes ahead and stores data in the database files.

So this is how the MongoDB server works behind the scenes:

Installation

MongoDB runs on all operating systems Mac, Linux, Windows.

You have a couple options in installing and running MongoDB.

  • Cloud — A cloud-based interface called Atlas requiring nothing to install.
  • Software — Download, install and run a MongoDB server on your local computer that runs the server via the command-line. The Community Server is the free tool most start with. On a Mac, you will be using Homebrew to install MongoDB in the terminal.
  • Compass — You can use the Compass GUI tool with the software, to give you a visual tool of your running MongoDB Server to enable you to administer your databases, collections, documents, and fields.

Official documentation

Specific instructions to install and run MongoDB server can be found on the official docs. I highly recommend reading it first.

Software: run server and create data directory

In running the MongoDB server for the first time, you have the option of creating a directory to which the mongod process will write data. By default, the mongod process uses the /data/db directory.

If you create a directory (instead of just using the default), you must specify that directory in the dbpath option when starting the mongod process.

For my Mac OS installation, I created a mongodb/datafolder and then ran my serve with the following command:

mongod --dbpath=data 

Now you can see the folders and files that have been created:

And what you will see in the command line:

And a mongod process will be running:

Watch out for errors

You may encounter error(s) when you try to run the MongoDB service, a good first check if you’ve already run the service before is to make sure the existing service has been killed (i.e. stopped).

The default mongodb port is 27017 so you can search for any running process on that port and kill them and then re-run the service.

Windows:

netstat -ano | findstr : <port number>taskkill /PID <PID> /F

Mac:

lsof -i: <port number>kill -p <PID>

Software: connecting to a MongoDB Server

You can connect to your server running on your own machine in two ways:

  • MongoDB Shell — mostly used for initial database creation and testing
  • Install MongoDB driver (e.g. Node.js) — most common way by providing a high-level API to use to interact from a running Node.js app. Supports both callback-based and promise-based interactions with MongoDB server.

MongoDB Shell

The shell is normally used to create your initial database and perform testing and/or seeding of your data.

To start the mongo shell, you use the mongo command:

mongo

When no parameters are provided with mongo command, the default functionality is that, the mongo shell tries to make a connection to the MongoDB server running at localhost on port 27017.

There is no “create” command in the MongoDB shell. In order to create a database, you will first need to switch the context to a non-existing database using the use command:

> use godzilla_db

Note that for now, only the context has been changed. If you enter the show dbs command, you will not see your newly created database.

MongoDB only creates the database when you first store data in that database. This data could be a collection or even a document.

To add a document to your database, use the db.collection.insert() command.

> db.user.insert({name: "Frank Stepanski", age: 205}) WriteResult({ "nInserted" : 1 })

But how did the insert command know to put the data into godzilla_db?

When you entered the use command, then godzilla_db became the current database on which commands operate.

To find out which database is the current one, enter the db command:

> db

The db command displays the name of the current database. To switch to a different database, type the use command and specify that database.

Now if you run the show dbscommand you will see the new database.

> show dbs

Node.js driver

Create a Node app that connects to a MongoDB database:

npm init

Install driver:

npm install mongodb

In MongoDB, creating a database is an implicit process (same as REPL shell). Once the client is done trying to make a connection, the callback function receives error and db object as arguments. If the connection is successful, the client object points to the newly created database godzilla_db.

const MongoClient = require('mongodb').MongoClient;
const assert = require('assert').strict;
const url = 'mongodb://localhost:27017/godzilla_db';// using callback interaction with server:
MongoClient.connect(url, { useUnifiedTopology: true }, (err, client)
=> {
// assert allows us to do checks on
assert.strictEqual(err, null);
console.log('Connected correctly to server');
if (err) throw err;
console.log("Database created!");
const db = client.db("godzilla_db");
console.log("client object points to the database : "+
db.databaseName);
client.close();});

Compass

When you want to use a GUI tool to administer your databases, just again, remember to connect to the MongoDB server, you need the mongod process running. Otherwise it will not be able to connect.

If you don’t have your MongoDB server running via the command line first, you will get this not so helpful screen:

So first make sure you have your mongod process running, then start Compass and you should have these processes running:

And then there is Atlas

If you use Compass with Atlas, then you don’t have to worry about running a MongoDB server on your local machine. The server is in the cloud and your connection string will be given to you within your Cluster screen.

Everyone loves the cloud, right? Where all the pretty birds fly.

That’s it.

This should get you started in your learning journey of MongoDB.

To continue learning, MongoDB University has lots of free courses you can take and also help you prepare for a MongoDB certification (ooh-aah).

In case you didn’t get the movie quote from the article title, it is from one of the best Denzel Washington performances, Training Day.

--

--

Frank Stepanski

Engineer, instructor, mentor, amateur photographer, curious traveler, timid runner, and occasional race car driver.