Search
NoSQL Data Processing

With MongoDB

NoSQL databases

NoSQL, Not only SQL

is a term used in several technologies where the nature of data does not require a relational model.

  • Hugh quantity of data
  • Higher availability
  • Scalability
  • Performance

NoSQL databases

Types of NoSQL stores:

  • Document store: data are organized as a collection of documents, e.g. MongoDB, CouchDB;
  • Key-value store: data are stored as key-value pairs without predefined schema, e.g. Apache Cassandra, Dynamo, Hbase, Amazon SimpleDB;
  • Graph-based store: data are stored in graph structures with nodes, edges, and properties, e.g. Neo4j, InfoGrid, Horton.

MongoDB

  • NoSQL database
  • Document-oriented database
  • High performance for storage and retrieval
  • BSON (binary JSON) storage
  • Supports ad hoc queries, replication, load balancing, aggregation, map-reduce, etc.

MongoDB compared to relational databases:

RDBMS MongoDB
Database ←→ Database
Table ←→ Collection
Row ←→ Document
Column ←→ Field

MongoDB installation

MongoDB Atlas on the cloud https://www.mongodb.com/cloud/atlas

MongoDB Community Server https://www.mongodb.com/download-center/community

  • On Windows, download and install with the MSI installer
  • On Mac OS (recommended method with HomeBrew):
brew tap mongodb/brew
brew install mongodb-community@4.2

MongoDB setup

Once MongoDB is installed:

  1. Create a /data/db directory:
    • Command on Mac: mkdir -p /data/db
    • Command on Windows: md \data\db
  2. Run mongod or mongod.exe to start the MongoDB server
  3. Run mongo or mongo.exe to start the mongo shell

Command line:

mongo
>

At the mongo prompt, use mongodb commands:

> show dbs

to show a list of databases

A MongoDB database

a physical container for document collections

You can open a database by:

> use DatabaseName

Show the list of collections in the database:

> show collections

A MongoDB collection is

a group of documents, similar to a table in a relational database

  • Can be split across multiple shards
  • Sharding for horizontal scaling

A MongoDB document

is a record and implements a schema-less model

  • Documents do not have to have same fields (as in relational databases)
  • Format similar to JSON (Javascript Object Notation)
  • Binary representation with BSON

There is a MongoDB web shell, similar to the command line mongo shell, which supports mongo operations in the browser.

Mongo Commands

Now let's go and test some of the features in MongoDB. Make sure:

  1. First command line window: mongod has been started and running.
  2. Second command line window: run mongo to start the mongo shell.

Note: You can connect to a different server such as:

mongo --host mongodb0.example.com --port 28015

Example screenshot of two command line windows on Mac: Mongo

Switch to a Database

List all databases:

show dbs
admin      0.000GB
college    0.000GB
config     0.000GB
education  0.000GB
local      0.000GB
students   0.000GB
test       0.000GB

Now you can switch to an existing database or a new database:

use college
switched to db college

Insert Data

Now let's use the insertOne() operation to create a new document in the database:

db.students.insertOne(
    {
        name: "John Smith", 
        program: "Data Science", 
        class_year: 2024
    }
)
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5eb9947b43c1c3c88a509515")
}
db.students.insertOne(
    {
        name: "Sue Anderson", 
        program: "Data Science", 
        class_year: 2020
    }
)
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5eb9932c43c1c3c88a509512")
}
db.students.insertOne(
    {
        name: "Dylan Johnson", 
        program: "Computer Science", 
        class_year: 2019
    }
)
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5eb9932e43c1c3c88a509513")
}
db.students.insertOne(
    {
        name: "Paul Sanderson", 
        program: "Information Science", 
        class_year: 2021
    }
)
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5eb9932f43c1c3c88a509514")
}

For a new database -- for example, the college database does not exist when we switch to it -- the system will automatically create it when the a piece of data is stored to it. In this case, the insertOne() opereation will create the college database, add the students collection, and create a new student record with the provided data.

db.students.insertOne(
    {
        name: "Mary Brown", 
        program: "Information Science", 
        age: 20
    }
)
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5eb9948b43c1c3c88a509516")
}

Although the students collection is like a table, there is predefine structure or schema. You can insert another document with a differetn set of fields.

Update

db.students.updateMany(
    { age: { $gt: 150} },  
    { $set: { status: "reject"} }
)
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

Delete

db.students.deleteMany(
    { status: "reject" }
)
{ "acknowledged" : true, "deletedCount" : 1 }

Find (Select)

db.students.find(
    { class_year: { $lt: 2021} }, 
    { name: 1, program: 1}
)
{ "_id" : ObjectId("5eb9932c43c1c3c88a509512"), "name" : "Sue Anderson", "program" : "Data Science" }
{ "_id" : ObjectId("5eb9932e43c1c3c88a509513"), "name" : "Dylan Johnson", "program" : "Computer Science" }

The first part of the query defines the filter condition, i.e. class_year $< 2021$, and the second part lists the fields to be shown in the result.

  • Selecting all documents from the collection:
db.students.find()
{ "_id" : ObjectId("5eb9932c43c1c3c88a509512"), "name" : "Sue Anderson", "program" : "Data Science", "class_year" : 2020 }
{ "_id" : ObjectId("5eb9932e43c1c3c88a509513"), "name" : "Dylan Johnson", "program" : "Computer Science", "class_year" : 2019 }
{ "_id" : ObjectId("5eb9932f43c1c3c88a509514"), "name" : "Paul Sanderson", "program" : "Information Science", "class_year" : 2021 }
{ "_id" : ObjectId("5eb9947b43c1c3c88a509515"), "name" : "John Smith", "program" : "Data Science", "class_year" : 2024 }
{ "_id" : ObjectId("5eb9948b43c1c3c88a509516"), "name" : "Mary Brown", "program" : "Information Science", "age" : 20 }
  • Getting the number of found documents:
db.students.find().count()
5
  • Find documents based on a field value
db.students.find({"program":"Data Science"})
{ "_id" : ObjectId("5eb9932c43c1c3c88a509512"), "name" : "Sue Anderson", "program" : "Data Science", "class_year" : 2020 }
{ "_id" : ObjectId("5eb9947b43c1c3c88a509515"), "name" : "John Smith", "program" : "Data Science", "class_year" : 2024 }
  • Find one document based on field value
db.students.findOne({"program":"Data Science"})
{
	"_id" : ObjectId("5eb9932c43c1c3c88a509512"),
	"name" : "Sue Anderson",
	"program" : "Data Science",
	"class_year" : 2020
}

Explain

db.students.find({"program":"Data Science"}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "college.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"program" : {
				"$eq" : "Data Science"
			}
		},
		"queryHash" : "1ACF0062",
		"planCacheKey" : "1ACF0062",
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"program" : {
					"$eq" : "Data Science"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "CCI-WK77-D01",
		"port" : 27017,
		"version" : "4.2.6",
		"gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
	},
	"ok" : 1
}
The explain method tests query operation and report information about the query execution.

References