Why Data in the Cloud Will Soon Morph into Autonomous Distributed Data

Let’s say you want to share information with another company. There could be a wide variety of valid reasons to share information, such as sending inventory information to place an order from a vendor or report production details to support an external production audit. Let’s also say you plan to send thousands of pieces of data in the exchange. 

Here’s the problem: Once the data is out of your control, you can no longer assume that it’s leveraged and secured according to the data policies of the company. The data is now decoupled from your data management control plan and no longer has a reference to its structure, purpose, data governance, and, of course, its data security. 

You run a high risk that someone will leverage your data in ways that you and/or your enterprise would not approve. There could be unintended accidents such as a leak of data that should be private or purposeful misuse such as selling the data to an investor who wants to leverage the data for insider trading.  

But what if your data could take all its attributes and policies along on this journey? Attributes could include valid use cases, security, and governance, and a log that tells exactly how the data is leveraged as it’s leveraged. In other words, the data would have the ability to protect and manage itself. You would be assured that the data will be used in approved ways and never fall outside of those approved parameters. Would that be of value to you?

See also: Cloud Adoption Trends of 2021 Amplify in 2022

Same data, new capabilities

If you Google “autonomous data,” you’ll end up with many different definitions. It’s a topic that’s been part of many Ph.D. dissertations over the years. It’s also a regular topic of conversation in the halls of database vendors, large and small. However, the autonomous data that’s coming soon will be a bit different from the structured and unstructured data of the present and past. 

Here are three common attributes that will set autonomous data apart:

1. The ability to self-manage when decoupled from a database. Autonomous data is surrounded by small, decoupled data management layers that can live on many different platforms and manage the data by using a distributed model. The autonomous data can still maintain a connection back to a database control plane, but that control plane can work across many different clouds, applications, users and even exist inside other databases.

2. Support for structured and unstructured data using the same mechanisms. Since the control plane applies structured at the time of use, to either unstructured data or structured data that you would like to leverage differently, you can leverage this technology for both structured and unstructured data without having to reconfigure. You can manage the use of unstructured data such as PDF documents, video files, audio files, and even old text files with the same data management control plane by applying a structure to determine suggested use. You could also allow secured, monitored, and managed access no matter where the unstructured (or structured) data physically exists. 

3. The ability to provide autonomous data security, no matter where the data physically exists or if there is access to a control plane. Worried that your data will become vulnerable when it’s not within your direct control, or when it’s no longer communicating with a data management control plane, or if you don’t even know where the data is physically stored? That won’t be a concern. Data security stays with the data and is not part of some centralized database and database security that can only secure its centrally stored data. Moreover, the data security is truly autonomous, with the ability to defend the data no matter where it exists, using behaviors and policies that the owner of the data predefines. 

See also: Cloud Migration: Enabling Innovation

The road to autonomous data  

This autonomous data technology won’t appear as a new product or new cloud services but as a series of changes to the ways that database management systems deal with data. The focus in the past was on control of the data, including what the data is, means, does, and who has access to it. With data automation, where the data resides is no longer a factor. Indeed, the data could be scattered over 100 different cloud and non-cloud servers, even mobile systems and IoT, with all locations caring for different parts of the holistic data.

This distributed data paradigm is not new. We’ve been talking about it since the 70s. This older iteration just divides up the data, either single or multiple copies, for several decentralized databases connected via some network. What is new is that we now define the distribution of data not by the distribution of databases but just the data itself. No centralization is required. 

A use case such as edge computing could have 1000 different chunks of data running on 1000 different devices, but each chunk does not require its own tiny database. Data automation just provides a platform for the autonomous data to exist. Autonomous data will carry out all the data management, data governance, and data security operations, no matter if the data is connected to a data management control plan or not. It also won’t matter if it’s just a single record or massive amounts of data, structured or unstructured. 

The race to the autonomous data model will be gradual. We already know that the move to multicloud and other complex distributed architectures drives many enterprises into a complexity wall. The tipping point of data complexity is in sight, where data will become useless due to the restrictions of data centralization. 

Leave a Reply

Your email address will not be published. Required fields are marked *