Whatever else we can say about 2020, it’s been a transitional year in terms of enterprise data management and digital strategy. The unprecedented crises that we’ve all faced put a renewed focus on digital transformation and data management. As Microsoft’s CEO Satya Nadella famously said in the early summer, we’ve seen two years of digital transformation in two months. Perhaps the underlying enabler can be found in data fabrics.
Why did Natella claim that and what reasons do we have to believe he’s right? It’s connected to the fact that the majority of knowledge workers got sent home due to the pandemic, many with less than a week’s notice. The result was that normal social interaction patterns, based on working together in the same locations, blew up overnight. All of that soft, human-based interaction was providing a crucial prop to traditional data management systems, none of which are very good at representing and preserving real-world context. Chatter around the venerable water cooler may not have been 100 percent fool-proof in maintaining perspective, but it was better than what came next.
Beginning in March, big companies scrambled to compensate for the loss of the office as a unifying, compensating factor. A lot of the work that Nadella is referring to around digital transformation came about because companies had to go fully virtual almost overnight, which meant that data management (and other) strategies had to grow up fast.
Data fabrics are Emerging as the Future of Data Management
Data fabrics are heralded for their ability to weave together existing data management systems, enriching all connected apps. Defined by Gartner as “a design concept that informs and automates the design, deployment, and use of integrated and reusable data objects regardless of deployment platforms or architectural approaches,” they are considered the next step forward in the maturation of the data management space.
One result of this acceleration is that enterprise data fabric has gone from a marginal data management strategy to a very active, even trendy new tool in the arsenal. Data fabric, initially a kind of data management design pattern, is rapidly turning into an active area of development and focus with an emerging class of enterprise software vendors working hard to support it with real products. A data fabric is a way of addressing the enterprise data silo problem – where data is connected based on its meaning to the business, rather than its location in some storage system – at the computational layer. So, while a data fabric is related to data lakes and data warehouses, there are important differences. Data lakes focus on physical or storage-based colocation of data. Basically, they address the “where is the data stored?” question.
The value of a data fabric is different: rather than focusing on where data is stored, they focus on what data means and how its meaning enables enterprises to connect data, no matter where it’s stored, to accomplish a data-driven transformation. In the increasingly virtualized world we’re living in, where there is doubt that everyone will return to the office next year as if the pandemic never happened, the shortcomings of traditional data management solutions are becoming clearer every day. What databases, data warehouses, and data lakes all have in common is that they’re based on a very specific, very fixed storage strategy where they can only manage data that they physically store. But this leads to endless copying, confusion over data location, and which copy is the best, the most recent, or most trustworthy. Entire cottages industries have sprung up to address slices or parts of this problem.
The future is about context and meaning, not storage and location
As organizations have rushed to embrace the hybrid, multicloud future, where data lives in storage, most have found that this approach isn’t very useful. It’s not that it doesn’t matter, but it doesn’t give us very much leverage to accomplish digital transformation. After all, data location just isn’t essential as there’s always data in some other place, app, cloud, or environment containing increasingly crucial information.
What gives leverage for insight is context, but context about business meaning, not technical details about location. If I need a phone number to do my job, I’m not concerned about what database or data lake or data warehouse it’s stored in. Why would that matter to me? Data meaning is more valuable to the business because it leads more directly to actionable knowledge and insight. The value of connected data is especially key right now as enterprises address crucial supply chain vulnerabilities, move from being reactive to proactive with respect to monetizing data, and empower and support workers working in the most distributed fashion in living memory.
Query answering is an essential capability of real-world data fabrics
While data fabrics are removing the storage challenge, powerful query-answering is the key to data fabrics fulfilling their promise to succeed where technologies before have failed. Today, organizations are relying on enterprise knowledge graphs (EKG) to manage enterprise data in full generality, at scale, in a world where connectedness is everything. When operating at the heart of the data fabric, EKGs seamlessly connect and relate data from different structures, unifying context to turn data into knowledge.
More than ever, the CIO, Chief Digital, and Chief Data Officers need to pursue data management strategies that deliver timely access to connected data that is actionable, precise, and reliable. Data fabrics offer this promise and, during this crazy year and beyond, that is key to their rising popularity and interest. However, when C-suite change agents start to move down this road, they need to be sure that when evaluating and implementing data fabric solutions, they focus on solutions that offer query-answering capabilities. The reason why a data fabric solution needs to include the ability to answer queries against the data fabric generally come down to the following three reasons.
The first is the various data management functions that exist in most organizations – basic data storage, search, analytics like AI and ML, and query answering. Of these, query answering is the one most often aligned to everyday business value creation. Without the ability to answer business queries, organizations might as well say goodbye to proven systems and processes, such as executive dashboards, most applications, nearly all of SaaS, as well as report writing, compliance, soft real-time management of real-world systems like supply chain, smart building management, capital controls, drug discovery, and more.
An effective and transformative data management strategy needs to be aligned with the organization’s ability to answer queries against its enterprise data. That means that data fabrics must do more than just provide connected data against business meaning. They need to provide optimal solutions for answering queries posed to this connected data fabric.
The second is timeliness. Speed increasingly matters and not just in hard real-time systems. Timeliness also matters in the kind of soft real-time systems that are used in a variety of industries: supply chain management in manufacturing and pharma/life sciences; smart building management for occupant quality, health, and safety; integration between factory floor yields and procurement in manufacturing; sense-making in financial services news, and the list goes on. Answers to queries based on stale data are not just inadequate, but they may well be harmful. This means that for maximizing business value creation, query answering has to be deeply integrated with data fabric solutions.
Finally, query answering systems are sound and complete. Given that the promise of data fabric is to connect and provide access to all enterprise data, that promise requires soundness and completeness. For example, search and AI/ML typically aren’t either. Many business processes require access to all correct answers to some query with zero incorrect answers. The former property (all correct answers) is completeness; the latter (only correct answers) is soundness. Query-answering systems generally offer these guarantees, where search systems offer best-effort metrics around precision and recall. World-class search systems typically offer, say, 90 percent precision and recall. That means that 10 percent of the actually correct results are missing, and the results returned are incorrect. Search is also a fundamental technique of data management, but it should only be used when its best-effort guarantees are acceptable.
How enterprise knowledge graphs enable a more effective data fabric
Data fabrics are an emerging solution to the problems that confront traditional data management solutions based on data location and storage rather than on business meaning. The challenges of 2020 have shown us consistently that digital transformation cannot be delayed. So how and where to get started?
Increasingly it’s clear that enterprise knowledge graphs are essential to a data fabric because it’s the knowledge graph component that provides query answering capabilities and that provides the needed focus on data meaning in context. These capabilities represent and reinforce the real-world context that has been damaged by the transition to virtual. Successfully deploying a data fabric will not only support the new normal but also ensure organizations have the necessary foundation for continued digital transformation. Much like a beating heart, adopting and building a knowledge graph with querying capabilities is the key to a successful data fabric solution.
After-all, query answering underlies nearly everything the enterprise does to derive business value from data, and that will be true in the future, too. Data fabrics are the best way to ensure that digital transformations are successful and rooted in connected data that can be queried no matter its location, based on its business meaning and value to the enterprise.