Exploring Riak as One of the Popular NoSQL Open Source Database

Riak is an open-source database management system, which functions in the document-oriented database structure. This was formed by the Amazon Dynamo document, which provides an ideal decentralized key-value store, which supports all standard, Put, Get, and Delete operations on nonrelational and relational data.

Riak is also a highly scalable and distributed DB, which has a higher fault-tolerant store with HTTP, map/reduce, REST, and JSON queries, which makes this database ideal for web applications too. Most important, Riak is a NoSQL database, and so line up in the new generation of non-relational databases.

To better understand the reason behind Riak being so powerful and popular, we need to know some background. Let us explore the Amazon Dynamo. In the documentation of the same, we can see they have described three critical store behavior as N, R, and W. In this, N denotes the number of replicas of each of the values in the store. R represents the number of replicas required for the read operations. W is the total number of replicas needed for write operations. Riak’s primary role is to transmit the N, R, and W logic to those applications built on it. This is why Riak has become so popular as it can adapt to the application requirements well.

The Riak ring

Riak ring or the network consists of identical nodes, and the data gets replicated automatically across these nodes. You can add or remove any node at any time without the need for manual data migration. As every node in the Riak ring stays identical, you cannot identify a single point of failure or creating any bottleneck in the overall system.

Riak clusters can also grow and multiply into hundreds and thousands over time. So, naturally, there are possibilities for some machine failures even in between. Riak can detect the possibility of machine failures and also make auto-recovery if any machine is added or brought back to the cluster. The data which is existing as well as newly acquired is shared among different nodes automatically.

If any nodes are unavailable, you may still do the read and write operations if there is an acceptable live node count. If the cluster parts which share the latest version of the document shuts off or getting disconnected, then Riak may return on the values, and the application, by default, can choose the next best course of action. In this scenario, you may also configure the latest fresh variant of Riak.

Some quick facts about Riak

  • Created by: Basho
  • Database orientation: Document type
  • Implementation language: C & Erlang and JavaScript
  • Distributed: Yes, it is distributed with the capability to span across multiple database servers, racks, and many data centers.
  • Storage: InnoStore, Bitcask, or using Google’s LevelDB.
  • Engine type: Asynchronous as the heavy requests may be asynchronously processed without putting the same in other queues.
  • Open source: Apache License open source.
  • Map/Reduce: JavaScript-based Map/Reduce
  • Best Practice: if your objective is to get some fast and scalable DB with the simplest REST interface and flexible schema, you can use Mozilla’s bug store DB.
  • Use at the production: Riak is now successfully used by SwingVine, Mozilla, AOL, AskSponsoredListings, GitHub, and Formspring, among many other successful business brands.
  • Available packages: Basho now offers both 32- and 64-bit DEB and RPM packages, which possess an extremely low dependency level.
  • Installation: You have to download the appropriate package and install the same using the system’s default package manager.

For installation and maintenance support for Riak and other NoSQL databases, you can avail of the assistance of RemoteDBA. Let us explore Riak links next.

Links are metadata, which establish a one-way relationship between various objects in Riak. These are used to be a loosely associated model graph for the relationships between multiple items.

Link Header

The actual way to read and modify the links through HTTP API is by using the HTTP Link header. It can emulate the purpose of the <link> tags in HTML, i.e., establishing functional relationships with other given HTTP resources.

What you find inside the angle-brackets (<,>) in the link in the relative URL to another given Riak object. “Tag” portion is in double-quotes, a string identifier that has a meaning relevant to the given application. The items may have multiple links to separate them by using commas. Say, for example, if an object is a participant in the doubly-linked object list, then it may look something like below.

Link-walking

Link-walking is a typical particular MapReduce querying case, which can be accessed through the HTTPLinkWalking. It starts with a single input object and can follow links on that object to find various other objects that match the specifications. Additional one traversal may be specified on a single request with many intermediate results in return. It is the final traversal in the given link-walking request that returns the final results all the time.

MapReduce of Riak also has an add-on goal to increase the data-locality. While processing a larger dataset, this becomes much more efficient in taking the computations to the data than bringing them on to the computations. In real-time scenarios, the MapReduce job codes may be less than 10 kb, so it is very efficient to send this code to gigs of data, which will be further instantly processed rather than streaming gigabytes of data to the 10k code.

It is also a Riak solution to data-locality, which determines how Riak spreads the processing of queries across clusters. Any of the given Riak node in the cluster can coordinate any read or write by simply sending direct requests to other responsible nodes for data maintenance, and any Riak nodes in random may also coordinates the MapReduce queries by sending a map-step evaluation request straight to the corresponding node responsible for input data maintenance. The map-step results are also sent back to the coordinating node, where the reduce-step process can generate unified results.

Putting it simply, Riak has a powerful way of running the map-step functions on the right node holding input data for such functions and can run the reduce-step functions on the very node which coordinates MapReduce queries.

Leave a Comment