2021 Trends in Big Data: The Interoperability Challenge
The dictates of big data—its inner manipulations and trends—have defined the very form of the data ecosystem since the inception of these technologies nearly a decade ago. The normalization of big data engendered the current preoccupation with statistical Artificial Intelligence, the mounting reliance on cloud architecture, and the viability of edge deployments of bountiful, streaming data.
It’s become entrenched in the most meaningful dimensions of data management, implicit to all but its most mundane practices, and indistinguishable from almost any type of data leveraged for competitive advantage.
As such, current momentum in the big data space isn’t centered on devising new expressions of its capabilities, but rather on converging them to actualize the long sought, rarely realized, time honored IT ideal of what Cambridge Semantics CTO Sean Martin termed “interoperability. And, the more the data starts to support that, the more interesting that gets, too.”
The grand vision of interoperability involves the capacity to readily interchange enterprise systems and resources as needed to maximize business productivity without technological restrictions. That might involve simply exchanging big data elicited from the Internet of Things with those at the cloud’s core (and vice versa), or dynamically positioning cognitive computing models in blockchain to determine cryptocurrency micro predictions.
Achieving this end, however, requires expedient, ad-hoc data integrations, AI’s statistical and knowledge bases, cloud automation, data integrity, and fluid blockchain deployments. Mastery of these data management elements is requisite “to interchange data between these systems, without all the heavy lifting of essentially exporting the data and then creating pipelines to transform it and then import it,” Martin noted. “[You] can actually go directly from one to another.”
True interoperability implies interchanging technologies, tools, and approaches—the rudiments of which are based in integrating their data to “help people tear down their silos” commented StorCentric CTO Surya Varanasi. Gartner predicts that by 2023, organizations can accelerate time to integrated delivery 30 percent by employing data fabrics (Zaidi et al, 2020). This architecture “by definition mixes together the data from multiple places and then creates this one single unified view of the data, and then makes it available to the consumers,” specified Denodo CMO Ravi Shankar.
Integrating all data as an architectural overlay across enterprise systems results in a logical fabric supporting singular deployments. Martin cited a FDA use case in which it’s “tying together 10, 15 databases with the stuff to get a view of everything about this drug: every communication of the people that made it, safety things, all that stuff.” These integrations require redressing differences in data models and semantics. A “universal semantic layer is a place that brings together all the technical data, which are the underlying sources, and provides a business transformation on top of that which is now available to all these different tools,” Shankar explained. Consequently, consumers can use any tool to analyze or operationalize data, an important step towards interoperability.
Gartner prognosticated applications of graph processing and databases will increase 100 percent by 2022 for integration and data science deployments (Zaidi et al, 2020). Knowledge graphs are foundational to big data interoperability; they offer a universal semantic layer with standardized data models. They provision a triad of high value use cases, including:
- Data Fabrics: With certain data fabric implementations, knowledge graphs are the output of integrations and link information from disparate sources so organizations are “using the knowledge graph to represent the data that’s been integrated,” Martin revealed. “Part of the integration process is creating models that describe all that information.”
- Total AI: Gartner includes knowledge graphs as part of AI, hearkening to this discipline’s knowledge base during its formative years in the 20th century. These graph models include concrete knowledge representations aside from (or universal to) data domains; they can synthesize this dimension with statistical AI “to merge the two approaches,” Martin acknowledged. “More than that, you can integrate the results of the statistical approach or machine learning approach with the knowledge representation to enrich [it] as you go.”
- Data Science: The previous Gartner prediction indicates knowledge graphs are sought to prepare data for machine learning feature generation. According to Martin, with these capabilities “the icing on the cherry is now I can do predictions about all these products, all these customer attributes, that I’ve integrated in graph queries.”
The cloud provides one of the foremost challenges (and functions as one of the most prominent drivers) for interoperability—even basic on-premise hybrid or multi-cloud interoperability. Varanasi described this challenge as “data’s spread around heterogeneous systems around multiple datacenters, and some of your data’s in the cloud.”
Moreover, the contemporary public health crisis has also created a situation in which “remote working is here to stay,” opined Todd Rychecky, Opengear VP of Sales, Americas. “Microsoft is now saying 100 percent of their workforce can work remotely indefinitely.” These factors place an undue burden on cloud computing mollified by automating its layered complexity, resulting in “less errors,” Rychecky said. Cloud automation for interoperable transitions between environments include:
- Distributed Cloud: Identified by Gartner as the future of cloud, distributed clouds are another means of letting organizations couple their data with their compute resources. “Since you’re giving customers the power to pick where their loads are running [with] technologies like Kubernetes to make very dynamic decisions, day to day you might choose where you pay for the compute resources,” Martin stipulated.
- Data Mobility: According to Varanasi, data mobility enables the enterprise “to connect all these disparate data centers and these different [storage] protocols and you define policies that allow you to share, replicate, and migrate your data.” Data migration is a subset of data replication and involves moving data “from this source to this target,” Varanasi commented. He described replication as “you have two systems, you’re creating content on both, and you want the data to be shared. You go from A to B and B to A.” Synchronization typically involves discriminately moving data between multiple sites.
- Network Resilience: Automating remote access for network resilience is a vital component of interoperability since “if the network’s down, there is no cloud; there is no big data; there is no AI,” Rychecky maintained. Network resilience is fortified by approaches that automatically configure organizations’ remote resources at scale “for their routers and switches and all their IP addresses that actually bring up the site,” Rychecky confirmed.
Ensuring data integrity is as multifaceted a task as it is pivotal, especially for interoperability because “if somebody creates a corruption of one site and if it’s ransomware, you don’t want to spread it all over the place,” Varanasi cautioned. Data integrity includes facets of data mobility, cyber liability, and most eminently, data protection. Organizations can guard against ransomware attacks on backups by leveraging file identifiers, serial numbers, and redundancy methods for “a layer of protection atop the data to get security,” Varanasi remarked.
In addition to facilitating reliable backups, “we need to secure the network and provide end-to-end resilience,” Rychecky indicated. “Step two is eliminating human errors, so automation. Things like configuration errors we saw for updates or ransomware, maybe a DDoS attack, we can eliminate those.” Other best practices include implementing a separate network management plane from the production network while making use of cyber liability engineers that “test everything and make sure it works,” Rychecky mentioned. “They sit between NetOps and DevOps.”
When transferring data between locations, it’s essential to “go all the way from a secure site to another secure site with full encryption running on the network,” Varanasi disclosed. Organizations must also ensure all the data’s been moved, and that “what’s happened in the source is whole in the target, if you will,” Varanasi divulged. “In addition to encryption in flight to make your data secure, we do checks on the source and the target so that we’ve written the right data at the other side as well.”
The networking issues implicit to big data interoperability extend to popular blockchain deployments of cryptocurrencies. “The other day PayPal announced that you can transact now in BitCoin using Venmo, and that’s huge,” Rychecky posited. “So you’re going to have a lot more digital currency offerings.” For cryptocurrencies, blockchain must balance data privacy concerns (pertaining to its consensus approach while preserving the confidentiality of specific users) with regulatory and legal restrictions. However, its increasing traction simply provides another source of IT systems with which to interchange big data for interoperability—particularly as organizations account for consumer needs.
The Big (Data) Picture
Big data’s become part of data-driven processes everywhere, whether involving cognitive computing, multi-cloud, or vanguard streaming data use cases. Its next evolution is to first integrate, then freely exchange data from the myriad tools and technological approaches it inhabits across the enterprise. Cloud automation, knowledge graphs, data fabrics, and data integrity measures are stepping stones for organizations to see which one realizes this vision first—and its tangible business value.
Zaidi, E., Thoo, E., Heudecker, N., Menon, S., Thanaraj, R. (2020). Magic quadrant for data integration tools. Retrieved from https://www.gartner.com/en/documents/3989223/magic-quadrant-for-data-integration-tools
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideBIGDATA newsletter.