Cloud Data Insights (CDI) met with Ryan Fattini, who runs data engineering and data science at CITY Furniture, and Ravi Shankar, SVP and CMO for Denodo, at the Gartner Data and Analytics Summit in August. Ryan told the story of how CITY Furniture took the success of a real-time data system for sales and extended it across multiple departments. The journey begins with a software engineer and an IBM mainframe and ends with a data democratization initiative. There are many interesting stops along the way–a streaming layer, an IBM cloud data warehouse, a miscellany of data stores, a data fabric, and data virtualization.
CDI: Four years ago, you were a software development engineer, and now you are an expert data professional with considerable influence. What was that transition like?
Ryan Fattini: It started back at the previous company where I worked as the full stack engineer. I built out their e-commerce platform and the application layer behind it. What introduced me to data or at least solving problems with data was that we were embedded in a marketing report. There were questions that needed to be answered. One of our major vendors wanted to know activation patterns around the sales of smartphones. Vendors wanted to know activation rates, what fueled them, and basically what was behind the trends that were coming up. Nobody really had an answer to these questions so I looked into how you would solve this kind of problem. It turns out the answer was data science. <laugh>. We built logistic regression models by taking demographic data against our activation rates in our cities– basically modeling. That was the start of my transition from being an engineer to someone who solves problems with data.
After building more models, I realized that the problem with data science isn’t building the models, it’s the engineering components. Six or seven years ago I was hearing about a lot of failures in the industry; data science wasn’t working. Companies were hiring academics who could build models but had no idea how to move them into production. It’s an engineering gap. I realized that most of what we were doing was engineering, not just model building. You can’t do one without the other.
Now we have the hybrid machine learning engineer kind of a hybrid, which is what happened in the software developer area where the roles of back-end and front-end developers merged. When I joined CITY Furniture as a software developer–there was no data team, no data warehouse, and no analysts, so I brought the same data-solving solutions to CITY. I found a couple of other engineers who were also interested in this kind of thing, and we started picking at problems using a data science approach with our engineering teams. We were going rogue at first but were able to show the company that this was the future and that we’d eventually need to do predictive and prescriptive analysis. When we presented our damage classification model to the CEO, he said, “This is great…but what I really need to know is predicting retail foot traffic.” So we pivoted to forecasting retail traffic by day by store.
See also: 22 Top Cloud Database Vendors
CDI: As happened to many other businesses, COVID-19 made forecasting almost impossible. What happened to your retail traffic forecasting model?
Ryan Fattini: The model ended up being critical. For brick-and-mortar retailers, keeping stores staffed to accommodate traffic was extremely difficult. There was no more historical context to forecast on, but there were some underlying things that didn’t change, and that was weekday seasonality. Saturday was always still the busiest, then Wednesday. We plugged the traffic forecasting model into the scheduling system, which helped stabilize the forecasting as we moved through phases of operating by appointment only, then 25% open times, then 50% until we were fully open.
Data science had proven its value to the company, and we now have a dedicated team of data science engineers–another hybrid role.
CDI: There is a continuum of workflow and skill sets that is the typical breakdown between data science and DataOps or data engineering. That hybrid role could be the key to bridging that disconnect.
Ryan Fattini: We do have two academic data scientists researching potential models, but we also need engineers that can build operational models that are more connected to the business and can be delivered in three months.
CDI: You’ve set the business and cultural context for us. What can you share about the technology challenges City Furniture faced in becoming more data-driven?
Ryan Fattini: The starting point was an IBM mainframe that pulled data from almost a hundred systems. It had been set up in the seventies, so it had data structures built under constraints for maximizing space. Every column had short names, dates were all numeric, and there were some weird data slices. The data warehouse was built when advanced analytics wasn’t even considered. We decided to focus on providing real-time data to the stores–managers could see in real-time what was being sold or not sold and the salespeople could monitor their KPIs and change selling strategies that same day. We did that by adding a streaming layer on the mainframe system that fed into the IBM cloud data warehouse.
When other business units saw what real-time data did for the sales department, they wanted some too.
CDI: That meant adding more systems to the real-time data warehouse?
Ryan Fattini: Yes, lots of systems were all different from the IBM transactional system that we had enabled for streaming data. There were different databases and different data sources. We thought our software engineering approach would work in this case too. We started batching in other systems databases into our warehouse, but this was clumsy and slow. There had to be a better way. We worked with Gartner consulting, and they brought up virtualization and building out a data fabric that supported it.
Connecting to the various data sources when data is virtualized means that you don’t need to move the data, and you’ve solved the data-gravity problem. Our proof of concept included relational and non-relational data sources, some of them on-prem that some teams use to run little Excel files. All of it could be addressed by the virtualization data fabric.
Ravi Shankar: Many people are not very aware of the logical form of integration. For 30 years, they have been doing physical integration. The analogy I would give is a patient who is having a cardiac event and needs some medication from the drugstore. You could take a bicycle to the drug store, pick up the medicine, and bring it back. The patient might not survive. Or you could drive to the drug store, and within minutes you’re back with the medication. Logical integration is a much faster way of getting to data than using some of the physical ways. The modern equivalent is dumping all data into a physical data lake. It’s still not integrated. Physical integration will continue to exist, for example, when moving data into a data warehouse, but the right tool should be used for the right job.
Data is increasing, and the variety is increasing. There is a benefit in having the data in a single place where it’s easy to find for business users. But the rate at which data proliferates far exceeds the human ability to pull that into a central place. So we’ve moved from a centralized data warehouse to multiple data warehouses, then came the data lake and the data lakehouse. The cloud service providers would like to have all the data put into the cloud, but even then, we use multiple technologies and multiple clouds. All require some kind of integration.
CDI: CITY Furniture has an impressive collection of data sources, probably like many businesses. You’ve broken down that last data storage silo–Excel. Which aspect of the data fabric was key to an effective virtualization strategy?
Ryan Fattini: The data catalog. This lets the software teams be more dynamic with their queries. When they have data, or they need a data solution, or just a data point, instead of them having to use a driver and write some crazy query into their application, we give them access to the data fabric through the data catalog and write a simple query string.
CDI: The data catalog lets them really know what the data is they’re accessing. It’s not just a mysterious set of fields. What happens next?
Ryan Fattini: The data catalog is still in beta testing with key stakeholders like the CFO and the COO. They can search and tag data and find reliable data sources themselves. First, we made data accessible in real time. Second, we used virtualization to connect all our data into one fabric. The third key shift is democratizing the data. The data catalog will allow anyone to find the right data. Also, the data fabric and virtualization layer support the work of distributed data teams while maintaining governance and consistency since they cannot make changes to the core system. Their changes are logical changes. We will still have a central team managing that. That’s how we scale data access and the infrastructure it requires across many teams. Hopefully, no more bottlenecks will be caused since these distributed teams will support their group of business users.
CDI: A data catalog can support data governance to a certain extent. Have you found that more is needed?
Ryan Fattini: You have to have strong governance in place. In addition to governance, you need standard operating procedures for how you do work. You can’t have six people building tables six, right? And you need a PII (Personally identifiable information) strategy before you consider democratizing data.
CDI: Thanks so much for sharing the story of CITY Furniture’s move to real-time data and the virtual integration of disparate data systems. You’ve laid a solid foundation for the data democratization phase and other transformations that might come after that.