MapReduce views

Introduction

Frank van der Linden posted he made an update to his Cloudant connector plugin, so from now it should be possible to manage design documents.

To become more familiar with couchDB’s design documents I can recommend reading ‘Writing and Querying MapReduce Views in CouchDB‘.

Design documents

Design documents you mostly manage in CouchDB / Cloudant are documents that describe views of documents. Great if you can manage them via http and not as in IBM Notes via the Domino Designer client😉

You could store other documents in your couchDb which are part of the design of your application (e.g. html, css, js files but that is another story).

Design documents are nothing more than JSON documents which are stored under the convention: must start with the _design/ pattern followed by the identifier e.g.: _design/default.

MapReduce

Views in couchdb are based on the MapReduce principle:

A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).

This video explains it by using playing cards.

Mapping

The filtering and sorting are defined by a JavaScript function. You can state conditions in the function to return the documents you are interested in e.g.:

function(doc) {
if (doc.labels) {
for (var idx in doc.labels) {
if (doc.labels[idx] === “organic”) {
emit(doc._id, doc.name);
}
}
}
}

//docs have to have the field doc.labels and only the id and name from the ones that have the label organic are returned

Reducing

Reducing is optional and results in a summary operation. The options in couchDb are:

  • _sum
  • _count
  • _stats

Count is probably the most common function to use. Notice that the returned key value might be null if values appear in multiple documents. The key is no longer unique and therefor null.

{
“rows”: [{
“key”: null,
“value”: 7
}]
}

If you apply the group parameter in the requested url the values will be grouped so the response could look like:

{
“rows”: [{
“key”: “appel”,
“value”: 3
}, {
“key”: “banaan”,
“value”: 2
}, {
“key”: “peer”,
“value”: 2
}
]
}

Sum works only on numeric values (of course) so if your mapping functions looks like:

function(doc) {
if (doc.fruits) {
for (var i in doc.fruits) {
emit(doc.fruits[i], doc.calories);
}
}
}

with the _sum reduce option the response could look like:

{
“rows”: [{
“key”: “appel”,
“value”: 720
}, {
“key”: “banaan”,
“value”: 1368
}, {
“key”: “peer”,
“value”: 320
}]
}

The _stats option returns a JSON object containing the sum, count, minimum, maximum, and sum over all square roots of mapped values. So for the mapping function above the response could be:

{
“rows”: [{
“key”: “appel”,
“value”: {
“sum”: 720,
“count”: 2,
“min”: 272,
“max”: 448,
“sumsqr”: 274688
}
}, {
“key”: “banaan”,
“value”: {
“sum”: 1368,
“count”: 3,
“min”: 272,
“max”: 648,
“sumsqr”: 694592
}
}, {
“key”: “peer”,
“value”: {
“sum”: 720,
“count”: 2,
“min”: 272,
“max”: 448,
“sumsqr”: 274688
}
}]
}

With this knowledge fresh in mind you can use them in your view(s) setup. Remember the design documents can be managed via CRUD operations, e.g. on address http://localhost:5984/veganstore/_design/default

{
“_id”: “_design/default”,
“language”: “javascript”,
“views”: {
“store”: {
“map”: “function(doc) { if (doc.label) { emit(doc.label, doc.origin); }}”,
“reduce”: “_stats”
}
}
}

In one design document you can define multiple views…

You can query your views by a set of parameters:

  • conflicts (boolean) – Includes conflicts information in response. Ignored if include_docs isn’t true. Default is false
  • descending (boolean) – Return the documents in descending by key order. Default is false
  • endkey (json) – Stop returning records when the specified key is reached. Optional
  • end_key (json) – Alias for endkey param
  • endkey_docid (string) – Stop returning records when the specified document ID is reached. Requires endkey to be specified for this to have any effect. Optional
  • end_key_doc_id (string) – Alias for endkey_docid param
  • group (boolean) – Group the results using the reduce function to a group or single row. Default is false
  • group_level (number) – Specify the group level to be used. Optional
  • include_docs (boolean) – Include the associated document with each row. Default is false.
  • attachments (boolean) – Include the Base64-encoded content of attachments in the documents that are included if include_docs is true. Ignored if include_docs isn’t true. Default is false.
  • att_encoding_info (boolean) – Include encoding information in attachment stubs if include_docs is true and the particular attachment is compressed. Ignored if include_docs isn’t true. Default isfalse.
  • inclusive_end (boolean) – Specifies whether the specified end key should be included in the result. Default is true
  • key (json) – Return only documents that match the specified key. Optional
  • keys (json-array) – Return only documents where the key matches one of the keys specified in the array. Optional
  • limit (number) – Limit the number of the returned documents to the specified number. Optional
  • reduce (boolean) – Use the reduction function. Default is true
  • skip (number) – Skip this number of records before starting to return the results. Default is 0
  • sorted (boolean) – Sort returned rows (see Sorting Returned Rows). Setting this to false offers a performance boost. The total_rows and offset fields are not available when this is set to false. Default is true
  • stale (string) – Allow the results from a stale view to be used. Supported values: ok and update_after. Optional
  • startkey (json) – Return records starting with the specified key. Optional
  • start_key (json) – Alias for startkey param
  • startkey_docid (string) – Return records starting with the specified document ID. Requires startkey to be specified for this to have any effect. Optional
  • start_key_doc_id (string) – Alias for startkey_docid param
  • update_seq (boolean) – Response includes an update_seq value indicating which sequence id of the database the view reflects. Default is false

With this summary on views in couchDBcloudant fresh in mind let’s try to create some views programmatically in Cloudant with Frank’s updated plugin =)

2 thoughts on “MapReduce views

  1. Sean Cull (@seancull) 2016-October-16 / 9:03 pm

    Hello Patrick, great post. Are the map functions executed at run time ( on the fly and totally dynamic ) or are they pre-stored like an index.

    • Patrick Kwinten 2016-October-16 / 9:37 pm

      Hi Sean,

      from the documentation: “CouchDB is designed to avoid any extra costs: it only runs through all documents once, when you first query your view. If a document is changed, the map function is only run once, to recompute the keys and values for that single document.”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s