Extending Data Warehousing to the Cloud

Data warehousing is evolving
Yellowbrick addresses why companies are thinking differently about data and why data warehousing is evolving.

Cloud Data Insights (CDI) had the opportunity to talk with Mark Cusack, the CTO of Yellowbrick Data, at the Gartner Data & Analytics Summit. We covered shifts in how businesses think of their data, how Yellowbrick is responding to and even anticipating emerging challenges around data warehousing, and utilizing data effectively.

(The interview has been lightly revised for clarity and readability.)


CDI: The Gartner Data & Analytics Summit is the ideal forum for data leaders and software and services providers to take the market’s pulse and get a sense of where the technology is heading. The pandemic caused many organizations to use data differently. Where have you seen your customers’ attention shift? Are you having different conversations than you had a couple of years ago?


Mark Cusack: We really are. The data warehousing segment, where we are very much focused, has evolved massively over the last five to 10 years. New players are coming in with a new focus on cloud data warehousing. What I think we are seeing now, particularly post-pandemic and with the current economic environment, is that people are scrutinizing their cloud spending. That has become top of mind for a lot of CFOs.

The idea of infinite capacity in the cloud has also been driving infinite spending patterns. People are getting a lot more cost-conscious. They’re trying to understand how they can get the agility out of the cloud but do it cost-effectively. Data teams are definitely generating use cases that create business value, but they want to do it more efficiently.

CDI: What does a software provider do to respond to that challenge?


Mark Cusack: Take the world of data warehousing, for example, that existed within a fixed footprint in a data center. We’ve learned how to squeeze as much capacity as possible sometimes with devious tricks to get the efficiency out of the underlying platform. Workload management was also very important so that you had the resources your users needed. Then came the cloud with its elasticity and separated storage that seemed limitless.

We’ve spent the last year bringing the best of both worlds together. We’re offering our cloud customers all of the table stakes they expect from a cloud data warehouse but cost-effectively because of how we can leverage hardware (even in the cloud) and through a capacity-based licensing and pricing model. This suits well-characterized workloads, but there is also a “pay by the drink” on-demand model for workloads that are exploratory or scale up or down.

CDI: What driving idea led to creating the technology that makes Yellowbrick different?


Mark Cusack:
Our founders came from the flash storage area and were interested in seeing if flash storage could change the database hardware stack. It did by removing the memory bottleneck. This enabled streaming analytic data straight from long-term SSD storage into the CPUs and bypassing that main memory middle step. Running a SQL query in Yellowbrick provides in-memory database levels of performance, but with access to a vast data set. That gained a huge amount of efficiency, which still matters in the cloud


CDI: Since data warehousing technology is capable of handling more workloads, are more people invited into the user circle?

Mark Cusack: They are. And I think there’s a demand from the lines of business to get more direct access. Everyone talks about self-service access. You want business analysts and data scientists to get their fingers on the data they need for their particular business problem as quickly as possible. I think that the role of a centralized data warehouse where you have to raise a ticket with central IT to get access to it is long gone.

A modern data warehouse should provide an easy user experience to non-DBAs. And to do that, it’s not just about having a very nice and pretty UI, but it’s about having a huge amount of automation in the data warehouse software itself, removing a lot of complexity that you needed an army of DBAs to manage before. One focus for us is making the data accessible so that the users can focus on what the business needs.

CDI: What is on your technology roadmap to address the emerging data needs post-pandemic?


Mark Cusack:
We have focused on moving all the efficiencies we had gained in data centers into running our software in public clouds. But companies don’t want to concentrate all of their data in the cloud or in one cloud. They want the freedom to deploy anywhere. The CFO’s Chief Risk Officer wants to know that when they choose a particular cloud or data warehouse platform, they can de-risk their buying decisions as they move data around and possibly repatriate it. Our strategy is to provide the same data warehouse experience wherever deployed.

CDI: A hot topic at this summit is the data lake and the data lakehouse. What’s your strategy for addressing these architectures?

Mark Cusack: I’m a believer that there’s no such thing as one tool to solve every single problem. What you get is the lowest common denominator that is slightly good at everything.

I see data lakes as two components. You have an object store as an interchange layer where you might exchange data products with different business lines or create new products. But you also have consumption engines around these things. Organizations have to choose the technology that best fits their business problem. We’ll stay focused on what we do best–extremely fast SQL performance.

CDI: What is a common use case among your customers?


Mark Cusack:
If you make any on-line purchase, whether you buy airline tickets or electronics from Best Buy online, chances are your transaction’s being checked for credit card fraud by Yellowbrick. One of our customers validates about 20 billion on-line retail transactions a year. So we typically sell to financial services or insurance, highly regulated industries, or telcos with loads of data and business-critical problems, usually involving money.

These customers use their Yellowbrick system 24×7 with very, very high levels of concurrency. We are in one of the largest credit companies in the world. They’ve standardized their enterprise data warehouse on Yellowbrick. Now we have multiple use cases at that company, including one with 12 petabytes of structured data.

Customers like these have very high SLAs, but it’s incredibly important to be a partner. And I think that’s reflected in our NPS score. We help them get the best out of the system. And we are brutally honest as a company. Things we don’t do well, we say that up front, even if we qualify ourselves out of a deal.

CDI: Without giving away any secrets, can you tell us the next challenge you’re looking to help address for customers. What is the next big thing that you want to help them accomplish?


Mark Cusack:
We’re really emphasizing this idea of what we call “distributed data cloud.” We know that data is siloed around different kinds of business and on different clouds and technology stacks. Having one technology stack for managing this data, and one team of experts instead of one for each cloud, will solve a lot of the hybrid and multi-cloud complexity and promote the re-use of data, models, and schemas. A key to that is enabling data cataloging as a tool for minimizing the data and governing the cloud costs. We’re better enabling the AI side of things, not just supporting AI workloads but leveraging AI within the software itself.

CDI: Thank you very much for your time, Mark. We look forward to seeing what you bring to next year’s Gartner Data & Analytics Summit.

Leave a Reply

Your email address will not be published. Required fields are marked *