As I have mentioned in earlier blog entries, Kolab Enteprise has gained data loss prevention (DLP) functionality this year that goes above and beyond what one tends to find in other groupware products. Kolab's DLP is not just a back-up system that copies mails and other objects to disk for later restore, it actually creates a history of every groupware object in real-time that can later be examined and restored from. This will eventually lead to some very interesting business intelligent features.
The storage system for the Kolab DLP system is Basho's industry-leading distributed NoSQL database, Riak KV. (The "KV" stands for key/value.) We chose Riak KV because it scales naturally (it is designed to be run as a cluster of nodes by default), is robust by design (CAP Theorem ftw), and is dead simple to deploy on development and production machines alike. A further key factor for us is that Basho provides proven enterprise-grade support for its line of Riak products. This was a requirement for us as we need to provide enterprise-grade support for the entire Kolab Enterprise stack.
(It was a nice coincidence that both Riak and core parts of Kolab's DLP system are written in Erlang. ;)
I sat down with Manu Marchel, Managing Director for EMEA at Basho Technologies Inc., recently for a mutual interview. You can read my interview on the Basho blog (I'll update this entry with a link to their blog when it is published lived); here is a transcript of my conversation with Manu:
NoSQL is a quite a new technology in the Big Data space. Many people might have heard about things like Hadoop but how does NoSQL fit in? Could you give everyone the quick cheatsheet on what NoSQL databases are and specifically what is RiakKV your key value NoSQL database?
NoSQL databases are the new generation of databases that were designed to address the needs of enterprises to store, manage, analyse and serve the ever increasing amounts of unstructured data that make over 80% of all data being created nowadays in public clouds or private infrastructures. Apache Hadoop has done a great job of handling batch analytics use cases at massive scale for unstructured data, what I would call exploratory or discovery analytics. What NoSQL databases like riak do in comparison is help organisations manage their active data workloads as well and providing near real time operational analytics at scale. Most importantly, most businesses need scalability, availability and fault tolerance attributes as core requirements to their current and future applications architecture and these are deciding factors for NoSQL against traditional relational databases. NoSQL databases started as one of 4 types: Key-Value, Column store, Document and Graph but nowadays they are becoming multi model whereby for example riak can efficiently handle key value, but also documents as well as log/time series as demonstrated by our wide range of customers including Kolab Systems.
Riak KV is the most widely adopted NoSQL Key Value database with scalability, high availability, fault tolerance and operational simplicity as its key properties. It is used most often for mission critical use cases and works great for handling User Data, Session Data, Profile Data, Social Data, Real-time Data and Logging Data use cases. It provides near realtime analytics with its secondary indexes, search through Solr, in-situ Map/Reduce and soon to come Apache Spark support. Finally its multi data center replication capability makes it easy to ensure business continuity, geo location of data for low latency access across continents or by segregating workloads to ensure very reliable low latency.
RiakKV is known for it’s durability, it’s part of the reason we chose it for Kolab's DLP system. Could you give us some insight into how RiakKV achieves this?
Hardware does fail, and when it does your IT Infrastructure needs to be able to cope and your systems must continue to operate while getting the resources back online as soon as possible. Riak KV was designed to eliminate the impact of the unexpected. Even if network partitions or hardware failures cause unanticipated outages, Riak KV can still read and write your data. This means you never lose data even when the part of the system goes down. For Kolab customers, it means that they have the security of knowing that the data loss and auditing that they are paying for is backed up by the best system available to deliver on this promise.
Availability seems to be a very important thing for databases in today’s digital age. How is Riak providing this key feature to Kolab and how does this enhance the Kolab offering?
Simply, Riak KV is designed to intelligently replicate and retrieve data making sure that applications based on the database are always available. Scalability also comes into play here as well. Unlike traditional databases, Riak is designed to respond to the billions of data points and terabytes of data that are being produced -- often in real-time -- as it is able to scale in a near linear fashion to give Kolab the best possible performance. Ultimately this means that Kolab’s application is always available so as an end-user you don’t experience any system outages no matter how busy or active your users are.
We integrate RiakKV with Kolab’s data loss prevention system to store groupware object histories in realtime for auditing and roll-back if needed. Is this unique?
Yes! This is a great example of two great technologies working together to provide an excellent customer experience. Combining the power of Riak KV’s high availability, fault tolerance, and scalability with Kolab’s data loss prevention system means that you have an incredibly strong and powerful system.
Basho is a really unique name for a technology company - is there any history or background to it?
Thank you, we really like our name too. Basho’s name was inspired by the real life Matsuo Basho (1644 – 1694) who is considered by many to be Japan's most renowned and revered writer of haiku. Haiku are known for their balance of lines and syllables where the simplicity of the structure is important. This is a founding, guiding principle that Riak KV is based on, as our operational simplicity is core to our architecture eliminating the need for mindless manual operations as data can be automatically and uniformly distributed.
To see the partnership of Basho's Riak and Kolab Enterprise in action together, come see us in Munich at the TDWI European Conference 22-24th June. We'll be in a booth showing both Riak KV and Kolab Enterprise, and will be happy to answer your questions!