Using MongoDB to Query BigchainDB Data

Published in

The BigchainDB Blog

10 min readJun 14, 2018

This is a joint blog post by BigchainDB and MongoDB.

BigchainDB gives each node operator the full power of MongoDB to query the stored data. What does that mean? We thought we’d write a blog post to show some examples.

Suppose Sergio Tillenham designs stylish custom cars.

Photo by Severin Dm on Unsplash. See note at end of article.

He wanted to create a trustworthy record of Sergio Tillenham car owners, so that potential buyers could look up the previous owners of each car. (The ownership history, or provenance, affects the value of a car.) Moreover, if someone claims a car was designed by Sergio Tillenham, but it’s not in the database, then it must be a fake. Also, if the car is in the database but the initial record wasn’t made and signed (cryptographically) by Sergio, then it’s also fake.

Sergio decided to use BigchainDB. He made a deal with the high-end dealers who sell his cars: if a dealer wants to sell his cars, then they must run a node in his car-ownership database (powered by BigchainDB). This gives each dealer the added benefit of having the full power of MongoDB to query the database.

We wrote a Python script to generate some fake data about 25 Sergio Tillenham cars, complete with ownership histories, and we wrote all the generated data to a BigchainDB network. You can find that script in the bigchaindb/car-example repository on GitHub. For each car (asset), the script generates a CREATE transaction and some TRANSFER transactions. Below we show some example data.

Transactions vs. Transactions

Before we delve into the example data, we should clarify that “BigchainDB transactions” shouldn’t be confused with MongoDB’s support for ACID transactions (including support for multi-document transactions in MongoDB 4.0).

A BigchainDB transaction is a serialized object: a JSON string. One sends a BigchainDB transaction to a BigchainDB network in the body of an HTTP POST request, and if it’s valid, it gets stored by the network.

ACID transactions are something else: a database feature. The classic example is a bank that needs to debit $50 from one account and credit $50 to another account (i.e. an internal transfer between two accounts). If a computer has a hiccup while processing that transfer transaction, the bank does not want it to finish halfway through, i.e. with $50 debited from one account but no credit to the other account. The bank wants the transaction to be all-or-nothing, i.e. “atomic”: the A in ACID. You can look up what the C, I and D mean. If you’d like to get started using MongoDB ACID Transactions, sign up for the Beta program at https://www.mongodb.com/transactions.

In this post, when we write about transactions, we mean BigchainDB transactions. (BigchainDB will probably use MongoDB’s support for multi-document transactions once that’s generally available, but that’s a topic for a separate post.)

The CREATE Transaction

A CREATE transaction begins the history of each car in Sergio’s BigchainDB network. The "asset" part of the CREATE transaction has some information about the car itself. A name, color and creation date & time are randomly generated. Here’s an example:

{"data": {"type": "car",
          "name": "Restless Bonus",
          "color": "cream",
          "datetime_created": 1075128200,
          "designer": "Sergio Tillenham"}}

Photo by Tyler B on Unsplash. See note at end of article.

The "metadata" part of a CREATE transaction is set to:

{"notes": "The CREATE transaction for one particular car (an asset)."}

You may have noticed the strange format of the "datetime_created" in the above example, i.e. 1075128200. That’s a POSIX time stamp, also known as a Unix time stamp. We stored times that way to make certain queries easier. (MongoDB also has Date objects, but BigchainDB doesn’t support those yet.) As it happens, 1075128200 is the POSIX time stamp of 14:43:20 on 26 January 2004 (UTC). (We told the script to generate a random time between 30 years ago and 3 years ago.)

Each CREATE transaction is signed by Sergio Tillenham’s private key and he is the first owner, i.e. the condition on the transaction’s sole output says that it can only be transferred if Sergio Tillenham signs the TRANSFER transaction.

The First TRANSFER Transaction

The first TRANSFER transaction (for each car) is special because it always transfers the car from Sergio to its first owner. We assumed that the first transfer happens within two years of the time the car was created. The "asset" part of the first TRANSFER transaction looks like:

{"id":
 "816c4dd7ae10b59a10f7016aacd685a5a41b402b3394a4264583d25851af1629"}

where the long hex string is the asset id of the car (and also the transaction ID of the CREATE transaction where it was created). The "metadata" part of the first TRANSFER Transaction looks like:

{"notes": "The first transfer, from Sergio to the first owner.",
 "new_owner": "Matthew Wheeler",
 "transfer_time": 1101566600}

The name of the new owner is randomly generated. The transfer time is between zero and two years after the creation time, also randomly generated. In this case, 1101566600 is 14:43:20 on 27 November 2004 (UTC).

All Other TRANSFERS

The transfer to the next owner, if it happens at all, is assumed to happen within 3650 days (about ten years). The time interval is randomly generated. More transfers are generated until the generated transfer time is in the future and therefore hasn’t happened yet. The "asset" part of the first TRANSFER transaction looks the same as in the first TRANSFER transaction. The "metadata" part looks like:

{"notes": null,
 "new_owner": "Laurie Jones",
 "transfer_time": 1228920200}

Exploring the Stored Data with MongoDB

Now that we’ve got a bunch of data stored in the MongoDB database, we can use MongoDB queries to answer some questions, such as:

How many cream-colored cars did Sergio design?
How many transfers happened in 2010?
What was the longest car name?
What’s the average time that an owner owns a car?
What’s the maximum number of times a car has been transferred?

Before we can answer those questions, we need to get a tool that can connect to, and explore, a MongoDB database. There are many options, including:

Graphical MongoDB apps such as MongoDB Compass, Studio 3T, Mongo Management Studio, NoSQLBooster for MongoDB, and Dr. Mongo.
The Mongo Shell
One of the MongoDB drivers. There are official drivers for C, C++, C#, Java, Node.js, Perl, PHP, Python, Motor (Python async), Ruby and Scala, and community drivers for many other languages, including Go and Smalltalk.

The Mongo Shell (mongo) is free and we can use it to start exploring:

$ mongo
MongoDB shell version v3.6.5
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.4
...
> show dbs
admin     0.000GB
bigchain  0.000GB
config    0.000GB
local     0.000GB
> use bigchain
switched to db bigchain
> show collections
assets
blocks
metadata
pre_commit
transactions
utxos
validators

We see that there are several collections in the bigchain database. The main ones are transactions, assets and metadata. We can get a look at an example document in the transactions collection using:

> db.transactions.findOne()

Here’s one such document:

{  
   "_id":ObjectId("5b17b9fa6ce88300067b6804"),
   "inputs":[…],
   "outputs":[…],
   "operation":"CREATE",
   "version":"2.0",
   "id":"816c4dd7…851af1629"
}

(I replaced some of the contents with “…” to make the overall structure more clear.)

The above transaction is one of the CREATE transactions. Note that it has an extra “_id” key added by MongoDB. Where are the “asset” and “metadata” keys and values? Did MongoDB forget them? No… when BigchainDB stores a CREATE transaction in MongoDB, it removes those and stores them in separate collections named assets and metadata.

TRANSFER transactions are stored slightly differently. We can find one using:

> db.transactions.findOne({"operation":"TRANSFER"})

Can you spot the difference in this TRANSFER transaction?

{  
   "_id":ObjectId("5b17b9fa6ce88300067b6807"),
   "inputs":[…],
   "outputs":[…],
   "operation":"TRANSFER",
   "asset":{  
      "id":"816c4dd7ae…51af1629"
   },
   "version":"2.0",
   "id":"985ee697d…a3296b9"
}

Answer: the “asset” key and value were not removed. Why?

A TRANSFER transaction doesn’t define a new asset, it just points to an already existing asset. Its "asset" field doesn’t contain any new information about assets, so it’s not removed and stored in the assets collection.

Let’s look at a random example document in the assets collection:

{  
   "_id":ObjectId("5b17b9fe6ce88300067b6823"),
   "data":{  
      "type":"car",
      "name":"Long Wave",
      "color":"lilac",
      "datetime_created":1093255577,
      "designer":"Sergio Tillenham"
   },
   "id":"96002ef8740…45869959d8"
}

We see that it contains two extra keys in besides the usual “data” key: the “_id” added by MongoDB and the “id” of the transaction it came from. Below is an example document in the metadata collection; the story there is the same.

{  
   "_id":ObjectId("5b17ba006ce88300067b683d"),
   "metadata":{  
      "notes":"The first transfer, from Sergio to the first owner.",
      "new_owner":"Meagan Bowers",
      "transfer_time":1058568256
   },
   "id":"53cba620e…ae9fdee0"
}

We could do some queries using the Mongo Shell, but let’s try one of the MongoDB apps with a nice graphical user interface: MongoDB Compass. When you connect, just use the default values (localhost:27017).

We can now answer some of the questions asked earlier. For example, “How many cream-colored cars did Sergio design?”

We see that there are four results, i.e. Sergio designed four cream-colored cars. Note how we projected the results to suppress the _id, and to show the data.name and data.datetime_created.

We could also answer that question using the Mongo Shell:

> db.assets.find({"data.color":"cream"}).count()
4

We can also answer, “How many transfers happened in 2010?” The first step is to get the start time and end time as POSIX time stamps. There are many websites that can do time conversions; we used unixtimestamp.com:

00:00:00 on 1 January 2010: 1262304000
23:59:59 on 31 December 2010: 1293839999

Now we can do the query:

$gte means “greater than or equal” and $lte means “less than or equal”. They are two of the MongoDB’s comparison query operators. The query means “find all metadata documents where metadata.transfer_time is greater than or equal to 1262304000 and less than or equal to 1293839999.”

We see that there were four car transfers in 2010.

What was the longest car name? We can use a MongoDB aggregation pipeline to answer that in the Mongo Shell:

> db.assets.aggregate([
{$project: {"_id":0,
            "data.name":1,
            "name_length": {$strLenCP: "$data.name"}}},
{$sort: {"name_length":-1}},
{$limit: 5}
])
{ "data" : { "name" : "Gentle Morning" }, "name_length" : 14 }
{ "data" : { "name" : "Throbbing Tree" }, "name_length" : 14 }
{ "data" : { "name" : "Throbbing Pine" }, "name_length" : 14 }
{ "data" : { "name" : "Restless Bonus" }, "name_length" : 14 }
{ "data" : { "name" : "Little Shadow" }, "name_length" : 13 }

This one takes some explaining. The pipeline has three stages:

$project creates a new aggregation with “_id” suppressed (0), and with two fields included: 1) data.name and 2) a computed field called name_length, which equals the length of the data.name field.
$sort then sorts that by name_length, in descending order (-1).
$limit then limits the number of results to five.

We could have limited the number of results to one in the last step and that would have been enough: we would have seen that the max name length was 14. However, we also decided that we also wanted to see the name of the car, or cars, with that name length. As it happens, four cars had a name length of 14: Gentle Morning, Throbbing Tree, Throbbing Pine, and our old friend, Restless Bonus.

The next version of MongoDB Compass (version 1.14, currently in Beta) has a new feature to help you build aggregation pipelines one stage at time, with live feedback showing the results you’ve got so far. It’s called the Aggregation Pipeline Builder. Below is a screenshot showing how it can be used to build the aggregation pipeline we saw earlier.

To see all the final results, you have to use the horizontal scrollbar or have a very wide monitor. The results are the same as before.

That Was Just a Taste

This is a blog post, not a textbook, so we’ll end with that example. We hope we’ve given you a sense of the power of the MongoDB Query Engine. A BigchainDB node operator can use it to query any of the data stored by BigchainDB, and they can expose as much of that power as they like to end users (maybe via an intermediate app that uses one of the MongoDB drivers). See the BigchainDB docs for more about that.

Here are some things that we didn’t explore, but are certainly possible:

Text search (also provided by the BigchainDB HTTP API, but it isn’t as full-featured as the MongoDB Query Engine)
Recursive search using $graphLookup (e.g. a search backwards through the transfer history of a car)
Geospatial queries
Adding a new index to speed up certain queries

The data in the MongoDB database was saved (using mongodump) and uploaded to the bigchaindb/car-example repo on GitHub, so you can explore it yourself if you like.

Note: Sergio Tillenham is a fictional character. We made him up. Any resemblance to actual persons, living or dead, is purely coincidental. The cars in the photos weren’t designed by Sergio Tillenham because Sergio Tillenham doesn’t exist.

About MongoDB

MongoDB is the leading modern, general purpose database platform, designed to unleash the power of software and data for developers and the applications they build. The unique MongoDB architecture combines the best of both relational and non-relational databases, addressing the needs of organizations for performance, scalability, flexibility and reliability while maintaining the strengths of legacy databases. Download MongoDB Enterprise and Ops Manager and check out MongoDB Atlas for the best ways to run MongoDB.

Be sure to follow the BigchainDB Blog to stay up to date on future releases and to get a first-hand look at new additions to the system, what has changed and why. Finally, if you’re a developer using BigchainDB, we want to hear from you. Send us an email at contact@bigchaindb.com and tell us your story.