Apache Cassandra is an open source, NoSQL distributed database that allows for the quick storage and retrieval of large amounts of data. It has a great reputation in companies large and small for its performance, ability to scale, high availability, and more. As such, there are many reasons to implement it or migrate from a commercial or open core solution. Unfortunately, many companies do not have the financial resources, time, or in-house expertise to do so.
RTInsights recently sat down with Bassam Chahine, a Principal Consultant at Instaclustr. We talked about the benefits of Apache Cassandra; the challenges in implementing, migrating, and managing it; and how Instaclustr and its Instaclustr Managed Platform for Apache Cassandra can help.
Here is a summary of our conversation.
RTInsights: What is Apache Cassandra, and what are the main reasons businesses are using it?
Bassam: Cassandra is used when businesses are looking for solutions where existing technology does not meet their requirements or when their current database is very capable but has its limits.
So, its use cases are where businesses need to have the data available in multiple regions and where they are working with petabytes of data. In such cases, traditional databases are limited and cannot meet the needs of the business. Cassandra is definitely one to consider when looking for an alternative.
With Cassandra, it’s not uncommon to have loads where you have millions of transactions per second as well as petabytes worth of data. That data is replicated so that the retrieval of the data is local to whoever is trying to use it.
Another reason businesses are using Cassandra is that it is open source. Open source gives businesses full control and the ability to see what Cassandra is doing in the background. It has free licensing, so there’s no cost. Another great benefit of using Cassandra is the cost savings that are achieved by avoiding commercial or open core software.
At the same time, businesses have the benefit of having a very large, active Cassandra community, which continually adds features and addresses security issues. Updates and patches are made available very quickly. That may not be the case with commercial software.
Additionally, Cassandra is very scalable. Users of Cassandra vary in size—from the smallest of businesses using a 3-node cluster, to Apple and Netflix which are using Cassandra to handle large amounts of data.
RTInsights: What challenges do companies encounter with implementation management and scaling since it’s open source?
Bassam: There are a couple of things from an implementation point of view. One needs to understand how to best utilize Cassandra. The current demand for NoSQL databases makes finding contractors and employees harder.
Frequently, many companies do not have the in-house staff to determine how Cassandra can best be used. This is when you have to either contract an external consulting company to help fill the gap or train someone internally.
Having someone with experience working with and understanding Cassandra makes the process simpler. It is important to note that when transferring data from a relational database to a NoSQL database like Cassandra, the data model differs drastically.
The data model controls what data is stored and how it is accessed. A data model that may work today may not work a year from now based on the data size. So, you need a proper understanding of the use case in terms of what data is available and what you are trying to look for coming out of this database. Your projections of how you’ll be using it in the future will make Cassandra’s use much more viable than if you don’t take those steps initially.
RTInsights: How does Instaclustr help?
Bassam: Instaclustr was established in 2012 when 2 individuals didn’t want to run and install Cassandra themselves—but they couldn’t find a managed service. They saw that there was a gap in the market for that.
Initially, we were able to develop Cassandra to deploy on AWS and allow other users to start using it. Basically, we were trying to build the company in such a way that we become more knowledgeable about the product, and through that what we ended up doing was adding staff to be contributors to the Cassandra project and the Cassandra model.
We have over 300 million node-hours managing Cassandra clusters. We get to see a lot of things that are adjustable, tunable, and fixable. By doing that, we get knowledge about managing and deploying Cassandra and making sure that it’s available, resilient, and performant.
Our diverse staff has developed external knowledge and deep knowledge of Cassandra. That lets us provide consultant services for Cassandra to customers who are either having issues with Cassandra and want to improve it or are embarking on the utilization of the Cassandra database and need some guidance.
With that experience and expertise, Instaclustr can help businesses in 3 main ways.
One, we often provide a managed service, so customers don’t have to worry about managing Cassandra. In that way, they get to use Cassandra to store their data and query it without worrying about managing it. The second option is that we provide support for customers who already have Cassandra and want to talk to somebody 24×7 to answer any questions or help them with any issues. And then the third way we help is with on-demand consulting of a project implementation that they may want to embark on.
In addition, we provide migration services if you are using either a commercial or open core solution. By open core, I mean products where the center of the product is open source, but it is wrapped with commercial software around it, like DataStax, with a charged licensing fee for its use. So, we offer services where we can migrate customers from DataStax to open source Cassandra, which is key to saving a significant amount of money in the long run.
RTInsights: How is what you offer different from opting for a DataStax solution?
Bassam: DataStax has been in the market for quite some time. We were very good partners with them, and we were the preferred vendor for managed service for DataStax databases.
Where Instaclustr differs is that we wanted to focus on pure open source. We find there’s a benefit to having all the components of the product open source. That way, you know exactly what’s happening. If you want to change it, you can change it. You are not tied down to one vendor for help or support, and you don’t pay licensing fees. Those, I think, are the main differences.
See also: When and How to Migrate from DataStax to Open Source Apache Cassandra
RTInsights: What are the benefits of the Instaclustr Managed Platform?
Bassam: Our managed service allows anybody to log into our console from the web and deploy a Cassandra cluster, whether on AWS, Azure, or GCP. They can do this within minutes. They do not have to purchase hardware or set up the software. If you are using a cloud vendor, you do not have to set up the EC2s or install Cassandra. With our service, it’s all there.
Our managed service allows you to deploy a center cluster across multiple cloud vendors, where, for example, one data center is in AWS, another in GCP, and the third in Azure.
We also have the capability to deploy our managed service infrastructure on-prem. That is for customers who do not want to have their data in the cloud. In such a case, our TechOps team would have limited access to the cluster just to manage it, but the customers have control of their data. This helps customers from a speed of deployment standpoint, and also, they do not have to worry about backups, patches, upgrades, and performance.
We have a team that’s continually monitoring clusters, so if there’s an increase in the amount of utilization where we think it’s heading towards a case where we may see a performance degradation, we contact the customer. The customer, whether preemptively or proactively, can reach out to us anytime with questions on how to improve performance. They can do this with a query and from a customer’s point of view.
To use the solution, they just log in, create the schemas to create tables, and then insert the data and query it. They don’t have to deal with anything else beyond that.
RTInsights: Can you talk about some case studies and give some examples of success stories?
Bassam: There are several that will give you a sense of the benefits we provide.
We helped migrate a customer from using the DataStax DSE to Apache Cassandra. They are running on a cloud provider. So, there was a matter of taking their 200-plus nodes of Cassandra and converting them to open source Cassandra with zero downtime. Downtime was critical to them. They process a vast amount of money every day, and any second of an outage would have a significant impact.
We moved them from DataStax to Apache Cassandra, and going forward, Cassandra support is being provided by Instaclustr. They were able to save $500,000 a year on SEC costs. That was a significant win for them.
Not only did they save money, but in the process of migrating them to open source Cassandra, we actually saw a 5% improvement in performance. They were very happy with us.
So what did we do for them? Our team worked with the client to first develop a migration plan, validate the performance requirements on the new platform, and make sure that the performance requirements were met on the new platform.
We provided scripts for marketing clusters during migration to make sure that everything was okay. Also, we provided scripts and tested drawback options so that if there was any impact, issue, or step, we could minimize any impact.
Also, with that, we were able to replace any of the DSE DataStax components that were provided for monitoring and backup of repairs. We replaced them with open source equivalent products, alternatives that were best of week.
So, for example, instead of using OpsCenter to do repairs, we implemented Cassandra Reaper, which is a tool that creates flexibility for doing repairs with Cassandra. For backups, we actually have an Instaclustr backup tool that we use internally as well called Instaclustr ESOP.
We implemented that where incremental backups and full backups are taken off the Cassandra cluster upload for AWS S3 to be able to do restore in the future. And right now, we’re working with them on different aspects of improving the performance of some of their queries and rewriting the queries.
Another client we work with was already on Apache Cassandra. They were on a much older version—version 2.0. One of the things they wanted to do was to be able to have Cassandra managed by somebody else, and thus, their team could focus on other projects that they’re working on.
They did not have the resources to upgrade to a newer version of Cassandra, so we helped them migrate their clusters from their accounts into our managed service. Once they were in our managed service, then our TechOps team was able to do everything else that was needed. We upgraded Cassandra from 2.0 to 3.0 and then 3.0 to 4.0 so they could have better performance and utilize more features that were available in the newer version.
The second thing we did was review how their workloads were using Cassandra. We were able to reduce the number of nodes in a cluster and change the EC2 type to benefit both performance and reduce cost.
So, we did 3 things for that one customer. One, we upgraded them. Two, we moved them to a fewer set of nodes. And three, we changed the EC2 type for better performance. All of that saved them a lot of money versus having to manage their own clusters and over-allocating or over-provisioning what they needed.
Salvatore Salamone is a physicist by training who has been writing about science and information technology for more than 30 years. During that time, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.