Table of Contents

  1. MongoDB vs Couchbase, an Architectural Overview
  2. Scaling a NoSQL Cluster
  3. Migrating MongoDB Collections into Couchbase
  4. NoSQL for the Node.js Application Developer
  5. About the Authors
  6. Resources

MongoDB vs Couchbase, an Architectural Overview

Distributed computing often times is what’s needed to meet the demanding profile of modern applications. Distributed computing architectures require new thinking about how to handle data pipelines of semi-structured data in real-time. NoSQL database technologies addresses these fast-data challenges. Two prominent NoSQL offerings come from MongoDB and Couchbase.

Cluster

A distributed system is defined by its topology and provide the capability to manage data access at scale. Both MongoDB and Couchbase provide capabilities to create clusters. These systems provide very different approaches to managing and deploying clusters. At the most basic level MongoDB takes is hierarchical in nature (master/slave) while Couchbase has a common node type (peer-to-peer).

NoSQL data platforms exist to handle dynamic data sets and demanding interactive applications that exceed the capacity of a single machine. Platforms can scale-up by adding more resources to a machine or scale-out horizontally by adding more nodes. Horizontal scaling divides data and distributes data over multiple nodes, which make up a database. Partitioning emerges as a key issue in distributed architectures. Couchbase and MongoDB rely on data partitioning to meet scale demands but use different approaches to data distribution.

MongoDB

MongoDB is very flexible and configurable hierarchical topology. Deployment approaches are made based on system performance and availability demands of the cluster. This represents a cluster, based on the MongoDB documentation of scaling out.

A MongoDB cluster has multiple components: shards, query routers, and configuration servers. Typically, components are installed on a separate nodes in an environment.

Shards store data and for MongoDB to provide high availability and consistency each shard has a replica located on different nodes in the cluster.

Query Routers (mongos) interface with client applications and direct operations and queries to the appropriate shard(s) within a cluster. As operations are processed by appropriate shards the Router returns results to the client. A MongoDB cluster typically contains more than one router to distribute workload.

Config Servers hold cluster metadata that maps the cluster data to shards. The query router uses the Config Server metadata to route operations to specific shards for processing. MongoDB clusters typically have three config servers.

MongoDB distributes data into chunks. Each chunk stores data within a specific range and assigns chunks to shards. Each shard is two 64 megabyte chunk. When a chunk fills the cluster will split it into two smaller 32 megabyte chunks. MongoDB will automatically migrate chunks between shards. MongoDB clusters require routers and config servers. The config servers store cluster metadata including the mapping between chunks and shards. The routers are responsible for migrating chunks and updating the config servers. All read and write requests are sent to routers to determine where data resides.

Data availability in MongoDB is achieved through the creation of replicas. Replicas exist on separate nodes to provide redundancy for the cluster. Replica Sets are groups of mongod instances that host the same data set. The primary mongod receives all data mutation operations. All other replicas apply operations from the primary and are eventually consistent. Only one member can accept write operations. The secondaries asynchronously replicate the primary’s mutations and apply the operations to their data sets. Client applications can balance loads across master and replicas which improves throughput and decrease latency. However, replicas may not return the most recent data. As a result, eventual consistency needs to be managed throughout the life of the application.

Couchbase

Couchbase is a flat topology with a single node type. Each node can provide Query/Index/Data services to the peer-to-peer cluster. All nodes are equal and communicate to each other. As additional nodes join the cluster configuration is inherited from the existing cluster. The cluster can be managed by any node in the cluster using the pre-configured console, REST API, or command line utilities. A node of Couchbase runs several processes logically grouped for Data and Cluster management.

Data Manager serves client requests and stores data.

Cluster Manager is responsible for node and cluster-wide services and communication. The cluster manager monitors each node in the cluster, looks for node failures, performs failovers, and manages rebalance operations across the cluster.

Couchbase partitions data into 1,024 vBuckets (partitions) and evenly distributes these partitions evenly across however many nodes running data services in the cluster. vBuckets store all data within a specific range. This distribution is handled by the cluster and does not require intervention. This assignment happens when the cluster is started and will not change unless nodes are added or removed from the cluster. Couchbase client SDK maintains a cluster map that maps partitions to nodes. As a result, there is no need for routers or config servers. Clients communicate directly with nodes for data access.

Couchbase nodes store active partitions and serve as replica for other partitions in the cluster. A node executes operations on the active data for the partitions of data they own. Replicas are maintained for availability only. Couchbase stores data on a given node and distributes data to one or more nodes within a cluster. Replication within a single Couchbase cluster is configured per-keyspace (bucket). A bucket is divided into vBuckets and evenly distributed across the cluster. Multiple replicas can be defined and represents copies of data. The cluster automatically handles the distribution of data.

Scaling a NoSQL Cluster

As a database receives more load, the need to scale becomes more important. Load can come in many forms, such as the need for added data capacity or processing resources. However, it isn’t the only reason to scale a database as resiliency and disaster planning also fits the bill.

Scaling in MongoDB, because of the MongoDB architecture, can easily become more complicated than accomplishing the same in Couchbase Server.

Scaling a Multiple Node MongoDB Cluster

MongoDB is a NoSQL document database that has a master-slave architecture. When creating a cluster and scaling it, a sharded cluster and a replication set, which is another name for a replication cluster, must be created and managed. These are pieces to a potentially large puzzle that can easily become complicated.

Assume that there are three MongoDB instances available. With the instances available, start them with the following:

mongo mongod --configsvr --replSet rs0
mongo mongod --shardsvr --replSet rs1
mongo mongod --shardsvr --replSet rs1

The above would be one command per instance, specifying one configuration instance and two shard instances.

The commands will create a configuration node that uses an rs0 replica set as well as two shard nodes that use an rs1 replica set. Just starting the MongoDB instances yields two different node types and replica sets to worry about.

Now each of the instances need to be connected to get replication and sharding in a functional state.

Start by initializing replication on the two shard nodes that use the rs1 replica set. Connect to either of the shard instances and execute the following command to connect to the MongoDB shell:

mongo --port 27018

Once connected, the replication set needs to be initialized:

rs.initiate({
    _id : "rs1",
    members: [
        { _id : 1, host : "shard_server_1_ip:27018" },
        { _id : 2, host : "shard_server_2_ip:27018" }
    ]
});

The above command should include all members of the rs1 replication set. Once executed, the status of the nodes can be checked via the rs.status() command.

Now the configuration node needs to be configured. Once connected to the instance, a connection needs to be established to MongoDB using the Mongo shell, but this time via a different port since it is a configuration node:

mongo --port 27019

Once connected to the configuration node via the Mongo shell, execute the following command to initialize the replica set:

rs.initiate({
    _id : "rs0",
    members: [
        { _id : 0, host : "cfg_server_1_ip:27019" },
    ]
});

The shard nodes are ready to be configured. Remember, previously only the replicas were configured on the shard nodes, not sharding. Exit the Mongo shell on the configuration node, but don’t disconnect from the server. A new command called mongos needs to be used to do even more configuration. Execute the following:

mongos --configdb rs0/cfg_server_1_ip:27019

This will allow shards to be added via the Mongo shell. Unless running mongos in the background, a new Terminal window will need to be opened to use the Mongo shell.

Connect to the configuration server from a new Terminal and then connect using the Mongo shell. This time instead of using port 27019 use port 27017.

Once connected, execute the following commands to add the shards:

sh.addShard("rs1/shard_server_1_ip:27018");
sh.addShard("rs1/shard_server_2_ip:27018");

With sharding configured, it needs to be enabled for a particular database. This can be any MongoDB database you’d like. Enabling sharding can be done with the following:

sh.enableSharding("example");

While sharding is enabled on the database, how the collections are sharded needs to be determined. With MongoDB there is the the option to do Ranged Sharding or Hashed Sharding and the goal is to get the data in your database spread as evenly as possible around the cluster.

Take the following command for example:

sh.shardCollection("example.services", { "_id": "hashed" });

The above command will create a shard key on _id for the services collection. Is it the best way to do things, maybe or maybe not, that is up to you to decide how you want to shard your data in MongoDB.

To check the shard distribution, run:

db.services.getShardDistribution();

It was no small task in getting replication and sharding working in a cluster of MongoDB nodes. The task becomes even more cumbersome as the cluster scales up and down and it is something that frustrates companies.

Scaling a Multiple Node Couchbase Server Cluster

While Couchbase is a document database just like MongoDB, things are a little different in its architecture. Couchbase uses a peer-to-peer (P2P) design which eliminates the master-slave design. In addition, every node in a cluster has the same design, meaning it will have data, indexing, and query services available. This eliminates the need to create specialized replication or sharded clusters.

Let’s assume there are three servers available, each with Couchbase Server installed.

There are multiple ways to establish a cluster in Couchbase. Of the three servers, configure two of them as new clusters.

The unconfigured node, let’s say couchbase1 will be the first one added to the cluster. To add it to the same cluster as say, the couchbase2 node, from a browser, navigate to http://localhost:7091 and choose to join a cluster.

Join Couchbase Cluster

What matters here is IP address, administrative username and password of the other node, and the services that should be made available.

After the couchbase1 node is added to the couchbase2 cluster, the cluster needs to be rebalanced.

Join Couchbase Cluster

Rebalancing the cluster will distribute the Bucket data evenly across the cluster. A rebalance needs to happen every time nodes are added or removed from the cluster.

At this point, a two node cluster with replication and automatic sharding of data is available. Now let’s check out adding that third, already configured node, to the cluster.

Since Couchbase is peer-to-peer, connect to either couchbase1 or couchbase2 and choose to add a new server to the cluster.

Join Couchbase Cluster

Enter the node information for the third and yet to be touched server, similar to how was done previously. Remember to use the appropriate container IP address.

A third node to the cluster! Don’t forget to rebalance to finish the process.

There is a CLI way of adding nodes to a cluster or joining a cluster, but that isn’t any more difficult to what was already seen. Scaling Couchbase is a two step process regardless if you have three nodes or fifty nodes.

Migrating MongoDB Collections into Couchbase

MongoDB and Couchbase are both NoSQL document databases, one that uses BSON and the other that uses JSON. From a data perspective, the differences in BSON and JSON are minimal which makes it very easy to be migrated.

The MongoDB Collection Model

Assume there is a Collection called services that holds information about services offered by a company. The document model for any one of these documents might look something like the following:

{
    "name": "Amazon Simple Email Service (SES)",
    "tenants": [
        "tenant-1",
        "tenant-2",
        "tenant-3"
    ]
}

Each document would represent a single service with a list of subscribed tenants. Each document has an id value and the subscribed tenants reference documents from another Collection with matching id values.

With MongoDB installed you have access to its mongoexport utility. This will allow us to export the documents that exist in any Collection to a JSON file.

For example, we could run the following command against our MongoDB database:

mongoexport --db example --collection services --out services.json

The database in question would be example and we’re exporting the services Collection to a file called services.json. If we try to open this JSON file, we’d see data that looks similar to the following:

{"_id":{"$oid":"service-1"},"name":"Amazon Simple Email Service (SES)","tenants":[{"$oid":"tenant-1"},{"$oid":"tenant-2"}]}
{"_id":{"$oid":"service-2"},"name":"Amazon S3","students":[{"$oid":"tenant-3"},{"$oid":"tenant-1"}]}

Each document will be a new line in the file, however it won’t be exactly how the schema was modeled. MongoDB will take all document references and wrap them in an $oid property which represents an object id.

The Couchbase Bucket Model

Since Couchbase does not use Collections and uses Buckets instead, the data needs to reside a little differently within Couchbase.

While the MongoDB export file could be loaded without issues into Couchbase with cbimport, the data may not make sense from a maintenance and development perspective.

In Couchbase it is normal to have a document property in every document that represents the type of document it is. The former MongoDB Collection name could easily be used to represent the document type. As an end result our Couchbase documents should look something like this:

{
    "_id": "service-1",
    "_type": "service",
    "name": "Amazon Simple Email Service (SES)",
    "tenants": [
        "tenant-1",
        "tenant-2",
        "tenant-3"
    ]
}

In the above example the $oid values have been compressed and the _id and _type properties have been added.

Importing MongoDB Collections into Couchbase with Golang

Instead of importing the MongoDB Collection data directly into Couchbase with cbimport, it should be slightly manipulated into a cleaner format as mentioned above. This can be done easily with Golang and the Couchbase Go SDK, or any other supported Couchbase SDK.

For every line read of the MongoDB JSON export, a manipulation needs to happen followed by a save. The core logic for this would be as follows:

func cbimport(document string, collection string) {
	var mapDocument map[string]interface{}
	json.Unmarshal([]byte(document), &mapDocument)
	mapDocument["_type"] = collection
	bucket.Insert(mapDocument["_id"].(string), mapDocument, 0)
}

Assuming each line is read from the export, it would be passed into the above function along with the Collection name. The Collection name would be added to the document as a _type property and then saved into Couchbase. However, there is a problem with this.

The _id property in the export is an $oid which probably should not be used as a document id. To solve this problem of $oid values, it might make sense to have a function like this:

func compressObjectIds(mapDocument map[string]interface{}) string {
	var objectIdValue string
	for key, value := range mapDocument {
		switch value.(type) {
		case string:
			if key == "$oid" && len(mapDocument) == 1 {
				return value.(string)
			}
		case map[string]interface{}:
			objectIdValue = compressObjectIds(value.(map[string]interface{}))
			if objectIdValue != "" {
				mapDocument[key] = objectIdValue
			}
		case []interface{}:
			for index, element := range value.([]interface{}) {
				objectIdValue = compressObjectIds(element.(map[string]interface{}))
				if objectIdValue != "" {
					value.([]interface{})[index] = objectIdValue
				}
			}
		}
	}
	return ""
}

Passing the document into a compressObjectIds function would flatten every occurrence of an $oid regardless if it appears in an array or not.

The manipulation and insert can happen with any supported Couchbase technology. In the example of Golang, each insert is a blocking operation so it would make sense to use goroutines and workers to add the data as quickly as possible.

NoSQL for the Node.js Application Developer

Node.js is a very popular technology amongst developers who use a NoSQL database. When it comes to working with data, there are a few options which include various object document models (ODM) or query languages defined by the database.

We’re going to take a look at some of the developer differences between MongoDB and Couchbase while using Node.js, in an effort to make transitioning as simple as possible.

Moving from Mongoose to Ottoman

When developing against a database, it is common to use object document models (ODM) in NOSQL or object relationship models (ORM) in an RDBMS. Couchbase and MongoDB are no exception to this. After all, why would you want to be burdened with designing validators, type casting, and extensive amounts of application boilerplate logic?

With MongoDB, a popular ODM for Node.js is Mongoose. Likewise, with Couchbase, a popular ODM for Node.js is Ottoman.

So how do you take your Node.js application built with MongDB and Mongoose and transition it to Couchbase and Ottoman? Let’s examine this through a few examples.

Take the following fictional application scenario. Let’s say we are Amazon or some other company that offers services to developers. Any service can have any number of tenants, and any tenant can use any number of services. A possible document model for services, in no particular technology, might look like the following:

{
    "id": "service-1",
    "type": "service",
    "name": "Amazon Simple Email Service (SES)",
    "tenants": [
        "tenant-1",
        "tenant-2",
        "tenant-3"
    ]
}

When it comes to tenants of a service, the data model might look like this:

{
    "id": "tenant-1",
    "type": "tenant",
    "firstname": "Nic",
    "lastname": "Raboy",
    "address": {
        "city": "San Francisco",
        "state": "CA"
    },
    "services": [
        "service-1"
    ]
}

The two data models above are just two of many possible ways to model this data in the database. In the scenario we’re using, each document has at least an id and a type property. A relationship between the two documents is established through each of their array values. Any particular service will keep track of its tenants and any particular tenant will keep track of which services it is a part of. This relationship will be maintained in the application layer.

Creating an Application with MongoDB and Mongoose

So what kind of MongoDB with Mongoose application can be constructed with such a data model in mind?

Assuming the latest version of Node.js and the Node Package Manager (NPM) is installed, create a new project directory and execute the following:

npm init --y

The above command will initialize a package.json file that will maintain all the project dependencies. With the package.json file in place, execute the following to gather the necessary dependencies:

npm install express mongoose mongodb body-parser --save

The project goal is to create a fully functional RESTful API with several endpoints that can be accessed from any given client front-end application. The express dependency for Express framework will make API generation easy while the body-parser dependency will allow for complex requests to be made against the server. These requests include POST, PUT, and DELETE. Finally, the mongodb and mongoose dependencies will allow for development against a MongoDB database using and ODM.

Project structure for this example isn’t really relevant as long as the dependencies are available. Instead it makes sense to stick to the concepts.

While the JSON model is still fresh in our minds, let’s create a Mongoose schema to reflect what we’re after. Thinking back to the services model, something like this can be created:

var ServiceSchema = new Mongoose.Schema({
    name: String,
    tenants: [
        {
            type: ObjectId,
            ref: TenantSchema
        }
    ]
});

Remember, the JSON models that were seen previously had an id and a type property. An ODM will create and maintain this data removing the need to have them in the Mongoose model. Instead of storing plain id values in the array of tenants, Mongoose stores a reference object to the other document. This is useful for querying because instead of having to construct multiple queries and layers of application logic, the ODM can load the nested data.

Now about about the schema for tenant data? In Mongoose, it might look something like this:

var TenantSchema = new Mongoose.Schema({
    firstname: String,
    lastname: String,
    address: {
        city: String,
        state: String
    },
    services: [
        {
            type: ObjectId,
            ref: ServiceSchema
        }
    ]
});

Notice that the same rules are applied, this time referencing service documents in the array. Each of the two Mongoose schemas can be assigned to a Mongoose model through the following:

var ServiceModel = Mongoose.model("Service", ServiceSchema);
var TenantModel = Mongoose.model("Tenant", TenantSchema);

So the models are taken care of, but how about establishing a connection to MongoDB and preparing them for use? Somewhere in the application you might have the following boilerplate code:

Mongoose.Promise = Promise;
Mongoose.connect("mongodb://<host>:27017/<database>", function(error, database) {
    var server = app.listen(3000, function() {
        console.log("Connected on port 3000...");
    });
});

The above code states that asynchronous interactions with MongoDB and Mongoose will use native ES6 JavaScript promises. After a connection to the database has been established, the application will begin serving on port 3000.

So how are interactions between the application and MongoDB accomplished?

Let’s say you’re interested in creating a new JSON document to be saved to MongoDB. You might do something like the following:

app.post("/services", function(request, response) {
    var service = new ServiceModel({
        "name": request.body.name
    });
    service.save(function(error, service) {
        if(error) {
            return response.status(400).send(error);
        }
        response.send(service);
    });
});

The above is an Express API endpoint that can be accessed via POST request. Remember, the model pretty much only has a service name and a list of tenants. From the name property a new ServiceModel is created. When calling save on that model, it becomes persisted to the database and a response is returned to the client that hit this endpoint.

When retrieving data that had already been saved, a simple Mongoose query can be created:

app.get("/service/:id", function(request, response) {
    ServiceModel.findOne({"_id": request.params.id})
        .populate("tenants")
        .then(function(result) {
            response.send(result);
        }, function(error) {
            response.status(400).send(error);
        });
});

The above API endpoint can be accessed via a GET request. The id passed with the parameters should match that of an already saved document. Using the id a findOne function can be called which returns a promise. A single result will be found based on the search criteria. Sending an empty object will result in a query against all documents, but in this scenario only the first one is returned.

The populate function loads the data for tenants rather than returning an array of id values.

The endpoints for each of the other service or tenant functions wouldn’t be any different. However, what happens when the relationships need to be established? Take the following endpoint for example:

app.post("/tenant/service", function(request, response) {
    ServiceModel.findOne({"_id": request.body.service_id})
        .then(function(service) {
            TenantModel.findOne({"_id": request.body.tenant_id})
                .then(function(tenant) {
                    if(service != null && tenant != null) {
                        if(!tenant.services) {
                            tenant.services = [];
                        }
                        if(!service.tenants) {
                            service.tenants = [];
                        }
                        tenant.services.push(service._id);
                        service.tenants.push(tenant._id);
                        tenant.save();
                        service.save();
                        response.send(tenant);
                    } else {
                        response.status(400).send("ids invalid");
                        return
                    }
                }, function(error) {
                    return response.status(400).send(error);
                });
        }, function(error) {
            return response.status(400).send(error);
        });
    });

In the above endpoint, a lot is happening. First a query is made for service documents matching the provided service_id value. If a document was found, a second query is made for tenant documents matching the provided tenant_id value. If a document was found, the service_id is pushed into the services array of the tenant document and the tenant_id is pushed into the tenants array of the service document. Both documents are then saved which will update them in the database.

Creating an Application with Couchbase and Ottoman

A simple Node.js with Mongoose application was just created with a fairly complex data model. So how do we accomplish exactly the same within a Node.js with Ottoman application?

Just like with the previous example, create a new project directory and execute the following:

npm init --y

The above command will initialize a package.json file that will maintain all the project dependencies. With the package.json file in place, execute the following to gather the necessary dependencies:

npm install express ottoman couchbase body-parser --save

The project goal is still an Express based RESTful API, as previously seen in the Mongoose example. This time, instead of installing mongoose and mongodb we’re installing ottoman and couchbase. The project structure specifics aren’t too important for us in this example.

Let’s start by creating the Ottoman models for each of the document types that we wish to save. Starting with the service model, we might have something that looks like this:

var ServiceModel = Ottoman.model("Service", {
    name: { type: "string" },
    tenants: [
        {
            ref: "Tenant"
        }
    ]
});

Creating the above Ottoman model wasn’t any more difficult or easy than creating the Mongoose model. However, we were able to skip the step of defining a schema. In the model for services, the name property will be a string value and the tenants property an array of references to Tenant documents.

Just like with Mongoose, the ODM will take care of the id and type information for each of the documents. Now what might the model for tenant data look like?

var TenantModel = Ottoman.model("Tenant", {
    firstname: { type: "string" },
    lastname: { type: "string" },
    address: {
        city: { type: "string" },
        state: { type: "string" }
    },
    services: [
        {
            ref: "Service"
        }
    ]
});

The same rules are applied in the tenant model as with the service model. Again, these rules are very similar to what was found in Mongoose for MongoDB. Remember, this time we’re using Couchbase and Ottoman.

Now that the Ottoman models have been defined, let’s worry about establishing a connection to the Couchbase cluster and opening a Bucket. Somewhere in the project, you might have the following code:

var cluster = new Couchbase.Cluster("couchbase://<host>");
var bucket = cluster.openBucket("<bucket>");
Ottoman.store = new Ottoman.CbStoreAdapter(bucket, Couchbase);
var server = app.listen(3000, function() {
    console.log("Connected on port 3000...");
});

Once a bucket is opened and Ottoman is configured to use it, the Node.js server is started, at which point documents can be created and querying can happen.

So how do these interactions between the Node.js application and Couchbase happen?

Let’s say we wanted to create a new service document. A client will try to POST to a particular API endpoint and the application will save. An example of this can be seen below:

app.post("/services", function(request, response) {
    var service = new ServiceModel({
        "name": request.body.name
    });
    service.save(function(error, result) {
        if(error) {
            return response.status(400).send(error);
        }
        response.send(service);
    });
});

In the above example, a new model is created from the definition that we created. The name property is extracted from the POST body, and then the model is saved to the database. This code is nearly identical to what was seen in Mongoose.

What happens if we want to query for documents that were saved?

Take the example of finding one particular document that was previously saved, by its document id:

app.get("/service/:id", function(request, response) {
    ServiceModel.getById(
        request.params.id, {
            load: ["tenants"]
        }, function(error, result) {
            if(error) {
                return response.status(400).send(error);
            }
            response.send(result);
        }
    );
});

So there are a few differences in the above versus the Mongoose counterpart. In the above example we’re making use of a getById method rather than findOne. If we really wanted to we could use a find method like Mongoose has, but since this is a query based on id, there isn’t a need.

Previously, in Mongoose, we used the populate function to load reference data. The same can be accomplished by using the load option and an array of properties that you wish to load.

Most applications don’t consist of strictly simple endpoints. With Ottoman, we still need to cover the scenario where we wish to manage relationships between two documents.

Take the following endpoint that we’ve already seen in a similar variation:

app.post("/tenant/service", function(request, response) {
    ServiceModel.getById(
        request.body.service_id,
        function(error, service) {
            if(error) {
                return response.status(400).send(error);
            }
            TenantModel.getById(
                request.body.tenant_id,
                function(error, tenant) {
                    if(error) {
                        return response.status(400).send(error);
                    }
                    if(!tenant.services) {
                        tenant.services = [];
                    }
                    if(!service.tenants) {
                        service.tenants = [];
                    }
                    tenant.services.push(
                        ServiceModel.ref(service._id)
                    );
                    service.tenants.push(
                        TenantModel.ref(tenant._id)
                    );
                    tenant.save(function(error, result) {});
                    service.save(function(error, result) {});
                    response.send(tenant);
                }
            );
        }
    );
});

When a POST request is made to the above endpoint, the service_id is used to get a particular service document. If there were no errors, for example the document exists, the tenant_id is used to get a particular tenant document. The tenant document is added as a reference to the service document array and the service document is added as a reference to the tenant document array. Both documents are saved at this point.

Nothing had changed between Mongoose and Ottoman beyond very minor API differences.

Alternative Query Options in a MongoDB or Couchbase Node.js Application

Using an ODM like Mongoose or Ottoman isn’t the only way to get around when it comes to Node.js development. Each NoSQL database has their own SDKs and their own query languages.

For example, MongoDB has the MongoDB Query Language and Couchbase has N1QL. So how do you use each of these technologies in a Node.js application?

Let’s take the previous example, but this time replacing Mongoose with the MongoDB Query Language and Ottoman with Couchbase’s N1QL technology.

As a refresher, there were two different document types being used, services and tenants. The document that represents a service might be modeled like the following:

{
    "id": "service-1",
    "type": "service",
    "name": "Amazon Simple Email Service (SES)",
    "tenants": [
        "tenant-1",
        "tenant-2",
        "tenant-3"
    ]
}

Where as the document that represents a tenant might be modeled like the following:

{
    "id": "tenant-1",
    "type": "tenant",
    "firstname": "Nic",
    "lastname": "Raboy",
    "address": {
        "city": "San Francisco",
        "state": "CA"
    },
    "services": [
        "service-1"
    ]
}

In both models, there is an array which contains reference data to other documents. This reference data is id values to that of the other document.

So what do these applications look like without an ODM?

Creating an Application with the MongoDB Query Language

The first thing that needs to be done is a new project for Node.js needs to be created somewhere on the computer. Within the directory, from the command line, execute the following:

npm init --y
npm install express mongodb body-parser --save

The above commands will initialize a new package.json file and download the various Express and MongoDB dependencies. Nothing really different from the Mongoose example. However, Mongoose is not included in this example.

Since an ODM isn’t in the picture, neither are schema or model definitions. Instead, the structure of every document will be maintained by the application and queries being executed. That said, take for example creating a new document for services:

ServiceModel.save = function(data, callback) {
    Database.collection("services").insertOne(data, function(error, result) {
        if(error) {
            return callback(error, null);
        }
        callback(null, result);
    });
}

In the above snippet, we have a function called save on a class ServiceModel that we’ve created. What it is named really isn’t important, however, what is inside of it is.

Calling insertOne on a particular collection allows an object to be inserted into MongoDB. The object being inserted is data and it can really contain anything, but in this case it will look like one of the document models shown previously.

This is where things can get a little wild. Under normal circumstances, the documents in MongoDB can be queried using a find function. However, since our documents have an array of references that we’d like to load, an aggregate function must be used, making things a bit tricky.

ServiceModel.getAll = function(callback) {
    var cursor = Database.collection("services").aggregate([
        {
            "$unwind": {
                "path": "$tenants",
                "preserveNullAndEmptyArrays": true
            }
        },
        {
            "$lookup": {
                "from": "tenants",
                "localField": "tenants",
                "foreignField": "_id",
                "as": "tenantObjects"
            }
        },
        {
            "$unwind": {
                "path": "$tenantObjects",
                "preserveNullAndEmptyArrays": true
            }
        },
        { "$group": {
            "_id": {
                "_id": "$_id",
                "name": "$name"
            },
            "tenants": { "$push": "$tenantObjects" }
        }},
        {
            "$project": {
                "_id": "$_id._id",
                "name": "$_id.name",
                "tenants": "$tenants"
            }
        }
    ]);
    cursor.toArray(callback);
};

To load reference data a join needs to happen and joins in MongoDB happen through the $lookup operator. However, the $lookup operator was never meant to join items in an array. Instead the data needs to be manipulated by “unwinding” or “unnesting” the array, performing the join, then “winding” or “nesting” the data into its original form.

To break it down further:

{
    "$unwind": {
        "path": "$tenants",
        "preserveNullAndEmptyArrays": true
    }
},

The first $unwind will split the array into several documents, preserving the property if it is null or empty. This is necessary because the join operation needs to happen on a property that exists.

After the array is split into several documents, the $lookup join operation needs to happen:

{
    "$lookup": {
        "from": "tenants",
        "localField": "tenants",
        "foreignField": "_id",
        "as": "tenantObjects"
    }
},

The above says that the tenants property, which is an id, will be joined against the _id property from the tenants collection. This new data will be stored in a new property called tenantObjects.

So there is now an array of tenantObjects that needs to be flattened. After flattening the array with another $unwind, the data can be collected through a $group operation:

{
    "$group": {
        "_id": {
            "_id": "$_id",
            "name": "$name"
        },
        "tenants": { "$push": "$tenantObjects" }
    }
},

The original document only had three properties, _id, name and tenants which need to be reconstructed. Anything in the _id of the $group are constant grouping attributes while everything on the outside, in this case tenants, is part of a calculation.

We’re not in the clear yet though. The data we get back doesn’t really match what we started with or what was seen in the Mongoose example. Instead, it needs to be reformatted:

{
    "$project": {
        "_id": "$_id._id",
        "name": "$_id.name",
        "tenants": "$tenants"
    }
}

That was a lot just to load the reference data found in the array. Imagine what would happen if there were ten arrays? The aggregation query would be way more complex.

Getting a single document would be similar, with the exception of an extra step:

ServiceModel.getById = function(documentId, callback) {
    var cursor = Database.collection("services").aggregate([
        { "$match": { "_id": new ObjectId(documentId) } },
        {
            "$unwind": {
                "path": "$tenants",
                "preserveNullAndEmptyArrays": true
            }
        },
        {
            "$lookup": {
                "from": "tenants",
                "localField": "tenants",
                "foreignField": "_id",
                "as": "tenantObjects"
            }
        },
        {
            "$unwind": {
                "path": "$tenantObjects",
                "preserveNullAndEmptyArrays": true
            }
        },
        { "$group": {
            "_id": {
                "_id": "$_id",
                "name": "$name"
            },
            "tenants": { "$push": "$tenantObjects" }
        }},
        {
            "$project": {
                "_id": "$_id._id",
                "name": "$_id.name",
                "tenants": "$tenants"
            }
        },
        { "$limit": 1 }
    ]);
    cursor.toArray(callback);
};

Notice the use of the $match operation and the $limit operation in the above. The $match allows us to find a particular document based on some property, in this case an id value. We only want one result so we $limit the result set to one value.

Since there isn’t an ODM like Mongoose to handle the relationships, another method must be created to handle updating documents with new relationships:

ServiceModel.updateTenants = function(id, tenants, callback) {
    Database.collection("services").updateOne({ "_id": new ObjectId(id) },
        {
            $set: { "tenants": tenants }
        }, function(error, result) {
            if(error) {
                return callback(error, null);
            }
            callback(null, result);
        }
    );
}

In the above snippet, a single document would be updated based on the _id value. On a match, the tenants property will be created or replaced with whatever is passed. In this case tenants is an array of reference values.

All these same rules apply when creating a set of database methods for tenant data.

With Mongoose out of the picture, the connection to the database is established a little differently. Mongoose queues every operation even if the database connection wasn’t already established. This luxury doesn’t exist with the vanilla SDK.

Take the following for example:

MongoClient.connect("mongodb://localhost:27017/example", function(error, database) {
    if(error) {
        return console.log("Could not establish a connection to MongoDB");
    }
    module.exports.database = database;
    var tenantRoutes = require("./mongodb/routes/tenants")(app);
    var serviceRoutes = require("./mongodb/routes/services")(app);
    var server = app.listen(3000, function() {
        console.log("Connected on port 3000...");
    });
});

After the connection to the database is established, the resulting database is maintained by the application. In the Mongoose scenario, you didn’t have to keep track of it. After the database connection is established, the routes can be created and the application can be served, in this case on port 3000.

Each route endpoint created is roughly the same as the Mongoose example. For example, getting all services from the collection would look like this:

app.get("/services", function(request, response) {
    ServiceModel.getAll(function(error, result) {
        if(error) {
            return response.status(401).send({ "success": false, "message": error});
        }
        response.send(result);
    });
});

In the above snippet, the database model is used instead of Mongoose.

So where does this leave us in terms of Couchbase and N1QL?

Creating an Application with N1QL for Couchbase

Like with the previous example, it is a good idea to create a fresh project directory and execute the following commands:

npm init --y
npm install express couchbase body-parser uuid --save

In the above, Ottoman is no longer being installed, but we are installing uuid which will generate unique id values. Ottoman generated these for us, but now we’re on our own, which isn’t a bad thing.

Like with the MongoDB Query Language example, we’re on our own when it comes to maintaining a document model in Couchbase and N1QL.

Take the following as an example for creating documents:

ServiceModel.save = function(data, callback) {
    data.id = Uuid.v4();
    data.type = "service";
    data.tenants = [];
    var statement = "INSERT INTO `" + Bucket._name + "` (KEY, VALUE) VALUES ($1, $2) RETURNING `" + Bucket._name + "`.*";
    var query = N1qlQuery.fromString(statement);
    Bucket.query(query, [data.id, data], function(error, result) {
        if(error) {
            callback(error, null);
            return;
        }
        callback(null, result);
    });
}

The above method will pass in a JavaScript object. In it a unique id will be generated and a type will be defined. Remember, all Couchbase documents go into the same Bucket and are not separated like MongoDB and its collections.

Using N1QL, a parameterized INSERT statement can be constructed where the parameters are the document id and the data to be inserted. Not too bad at this point.

Now take the example where the service data needs to be queried:

ServiceModel.getAll = function(callback) {
    var statement = "SELECT s.id, s.type, s.name, " +
                    "(SELECT t.* FROM `" + Bucket._name + "` AS t USE KEYS s.tenants) AS tenants " +
                    "FROM `" + Bucket._name + "` AS s " +
                    "WHERE s.type = 'service'";
    var query = N1qlQuery.fromString(statement).consistency(N1qlQuery.Consistency.REQUEST_PLUS);
    Bucket.query(query, function(error, result) {
        if(error) {
            return callback(error, null);
        }
        callback(null, result);
    });
};

In the above snippet, instead of writing a long and nasty aggregation query, a N1QL query with subqueries can be constructed. The query is saying to return all documents that match the specified document type, but include an array which are the results of a subquery. The subquery is run based on the id values within the original array. If there were ten more arrays, the effort wouldn’t be any more strenuous.

Could a JOIN have been used instead of a subquery? Absolutely, but it would have required some re-nesting in the end. Not too complicated by comparison of what was done in MongoDB.

If a single document should be returned, it might look like this:

ServiceModel.getById = function(documentId, callback) {
    var statement = "SELECT s.id, s.type, s.name, " +
                    "(SELECT t.* FROM `" + Bucket._name + "` AS t USE KEYS s.tenants) AS tenants " +
                    "FROM `" + Bucket._name + "` AS s " +
                    "WHERE s.type = 'service' AND s.id = $1";
    var query = N1qlQuery.fromString(statement);
    Bucket.query(query, [documentId], function(error, result) {
        if(error) {
            return callback(error, null);
        }
        callback(null, result);
    });
};

Notice that there is now another parameter in the WHERE clause. This is all stuff that would normally show up in a relational database technology like MySQL or Oracle.

When it comes to updating the arrays of each document the concept is similar:

ServiceModel.updateTenants = function(id, services, callback) {
    var statement = "UPDATE `" + Bucket._name + "` USE KEYS $1 SET tenants = $2 RETURNING `" + Bucket._name + "`.*";
    var query = N1qlQuery.fromString(statement);
    Bucket.query(query, [id, services], function(error, result) {
        if(error) {
            callback(error, null);
            return;
        }
        callback(null, result);
    });
}

In the above, any keys that match the first provided parameter will be updated and each tenants property will be set to that of the second parameter.

To connect to Couchbase and start using it actually requires less than if you were using Ottoman. Connecting to Couchbase, configuring the routes, and starting the Node.js server, would look something like this:

module.exports.bucket = (new Couchbase.Cluster("couchbase://localhost")).openBucket("example");
var tenantRoutes = require("./couchbase/routes/tenants")(app);
var serviceRoutes = require("./couchbase/routes/services")(app);
var server = app.listen(3000, function() {
    console.log("Connected on port 3000...");
});

Like with MongoDB, each route endpoint won’t change much from the Ottoman alternative. Instead of using Ottoman to drive the endpoints, the database model methods that were previously created will manage this. For example:

app.get("/services", function(request, response) {
    ServiceModel.getAll(function(error, result) {
        if(error) {
            return response.status(401).send({ "success": false, "message": error});
        }
        response.send(result);
    });
});

The same rules apply when creating each of the other endpoint methods for this Couchbase with N1QL API.

About the Authors

Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in Java, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Apache Cordova. Nic writes about his development experiences related to making web and mobile development easier to understand.

Nic Raboy - May 2017

Justin Michaels is Director of Solution Engineering at Couchbase. He has experience in planning, deploying, and supporting mission critical systems. Justin enjoys understanding application patterns and regularly works with Java, Python, and Golang. He regularly presents at community events and conferences.

Justin Michaels - May 2017

Resources

These are extra resources that can be used to compliment your adventure away from MongoDB and into Couchbase.

Tools

MongoDB Collection Importer for Couchbase

Cookbook Example Projects

Mongoose to Ottoman Example

MongoDB Query Language to N1QL Example

Blog Articles

Data Modeling NoSQL Documents in MongoDB vs Couchbase Server

Import Your MongoDB Collection Data into Couchbase Server with Golang

Moving Your Node.js with MongoDB Application to Couchbase with N1QL

Migrating Your MongoDB with Mongoose RESTful API to Couchbase with Ottoman

Simplifying Your NoSQL Cluster Design by Moving From MongoDB to Couchbase

Documentation and Support

Couchbase Developer Portal

Couchbase Forums