Cloud conversations are evolving. Where there was once optimism that the cloud offered “the solution” to digital transformation challenges, more companies now consider it to be merely one tool in the toolbox. As such, we’re seeing more willingness to dive into hard conversations about cloud costs and the types of cloud architectures best for enterprise needs.
In this interview, Adit Madan, Director of Product Management at Alluxio discusses with Elisabeth Strenger of CDInsights the changing dynamics of cloud conversations at the enterprise level, with a focus on the increased attention given to cloud costs and spending predictability. He highlights the shift from an “all-in” cloud approach to a more nuanced strategy that incorporates multiple environments, such as on-premises and cloud. Let’s find out how cloud conversations are changing.
CDI: What has changed in cloud conversations at the enterprise level? Are people paying more attention to cloud costs?
Adit Madan: One of the things that we’ve seen at the enterprise scale is not just cloud egress cost, but the combination of cloud spend and being able to predict spend has been a constant topic of conversation. With the economic downturn, one of the things that we’re seeing is definitely more control over where money is being spent.
I wouldn’t say it’s specifically about egress costs. Certainly we’ve seen that cloud vendors have also significantly slashed their prices. Also, we’ve seen the reward in which a cloud spend means that I don’t money to spend on procuring hardware on premises. The point that I’m trying to make is it kind of goes both ways.
Some businesses extended the effect of the economic downturn – and just looking at the trend over a longer period of time, not just now in the last one or two years – is that the more sophisticated the organization is in terms of their capability of operating multiple environments, like an on-prem and the cloud or two clouds, the more likely they are to not buy into the “all-in” cloud.
CDI: That’s great insight.
Adit: A lot of times what we heard from our clients was “I want to be on a cloud. On-prem data centers are done.” But I think about two or three years back is when we saw a wave of conversations in between. [They said] “Okay, I realize that all-in on cloud is not going to be my future.”
They’ve changed a little bit. Let’s say things that you’re doing for the first time or you’re using certain tools which are not available on premises, it takes a lot of expertise to set up. But things you already know have predictable capacity utilization. The combination of these two factors can help people lower their cost.
CDI: So you’ve explained very well how egress costs are really just part of a cloud strategy. It’s the last thing perhaps that one has to worry about. “Okay, I’m not going to go all-in on cloud. I’m not going to move everything.” But are people actually concerned about how to get out or is that just a cost of doing business? Where are we at with that?
Adit: People are concerned about it. So the choices that they are making are to avoid exorbitant egress costs. So the difficult choices that we see is that once they need access to data across let’s say a hybrid cloud, they will make a copy of it. Once they make a copy of it, they’re avoiding egress fees, but they’re paying in the complexity of making a pipeline and extra storage costs. So egress is definitely a factor.
It really depends on what kind of a deal they have with cloud providers. The choices that they’re making really depend on that specific situation to some extent, but everyone is aware of [egress]. I don’t think it’s a factor that can be neglected but the mechanisms that I see most are a kind of trading off the egress cost factor with paying in some other form.
CDI: So some businesses are maintaining the duplicate version as a live copy. It’s always kept up to date with all the transactions and changes, etc. So it’s not a copy that you put into deep storage; it’s expected at any moment to have to step in and be the data of record, if you will.
Adit: So just to give you an example. We were talking to someone recently who was evaluating a solution, but before they used [Alluxio] last year, they were replicating about 100 petabytes across their on-premises and the cloud. They were paying a lot on egress but also paying a lot on 100 petabytes of storage and duplication. But it’s not archival storage, which is much cheaper to store, because it needs to be ready for access across these two locations, their on premises data center and their cloud provider as well.
CDI: I’ve heard of companies doing multi cloud and having the copy be in a different cloud, but that doesn’t help with egress cost. It’s a risk mitigation more than a cost optimization.
It must be very difficult to be the data platform provider for companies that have probably designed even more complex platforms than you are providing. What are what are some of the enhancement asks? From a data management platform perspective, what are customers really knocking at your door to get done?
Adit: Like I think one of the things that people are knocking down our doors for is scaling the platform. We are working with the largest companies in the world. We’re also working with a lot of the big tech leaders and similar household names on the enterprise side, so scale is one factor.
The other thing is consistency of data across these multiple environments. So once they are in the situation that the data is live across environments, what we are able to provide is automatic updates of the metadata of what is being accessed. As people are getting more and more mature with their use cases, they’re finding keeping multiple environments consistent is actually another thing.
A third category I would say is just more insight into what is happening to that data. How is data being accessed? What is the lineage of it?
CDI: Data observability is a huge hype word. I saw on your website that you have this position of being the storage between the storage and compute that that can play a huge role in governance. So are you seeing the pickup for the usage of your platform’s capabilities in governance scenarios?
Adit: Yes, there are a couple of ways.
I’d like to dissect what we mean by governance because sometimes we might have different understandings of it. So one common connotation that we have with respect to governance is just using a solution which is not necessarily a copy of data. You have one place of maintaining access control. That the first real win for customers. “I don’t need to have a source and a destination access-control model.” It’s all in one place, regardless of where the data is being accessed.
The second conversation that we’re having with respect to governance is by having this layer which sits between compute and storage. The governance model is independent of the storage type. So whether you’re running in AWS, whether you’re running in GCP, or whether you’re running on premises, you have the same model of accessing data.
These more sophisticated organizations that we’re working with like to have the customization that a tool like Alluxio offers to them. They come up with a data stack which they would like to port across their on-prem and different cloud environments, and they try to make them look as similar as possible.
CDI: Is there another point that you’d like to bring up in this interview?
Adit: One other thing I really would like to share is I think as part of our conversations with potential customers, usually, egress fees do always come up because when we say that you can access your cloud storage from one data center, customers realize they don’t have to move data.
Another point customers should consider is that our analytics apps show that data is reused a lot. We have to pay attention to the fact that 1% or 2% of the data or maybe somewhere between 5% and 10% of data is serving all of the workloads.
The amount if data that gets accessed repeatedly is pretty high. I want our users to see that we provide a massively distributed cache as well. So if you’re using a petabyte data lake, you could have 50 terabytes of cache storage allocated to something and that is actually going to give you an 80% cache hit ratio.
Another observation that I’d like to share with you is that data used in model training and other kinds of workloads can be very similar. You’re reading the training data again and again and again, so you aren’t going to repeat it.
CDI: It will be interesting to see what kind of data is left behind or ignored for a while as companies rationalize what data they are using the most.