Data Platforms – A journey. The Yesteryears, Today, and What Lies Ahead
Like pipelines delivering fuel to generate energy for human advancement, data platforms have been used for decades to deliver data to optimize business processes. These technologies are loosely categorized under the terms – ESB (enterprise service bus), ETL (extract transform load), EDW (enterprise data warehouse), and BI (business intelligence). They have reincarnated multiple times over the last three decades to serve application development paradigms, data agility, and deployment choices. Let’s look at the past three eras to envision what lies ahead.
The pre-Big Data (the 90s to early-2000s) – The slow-moving sludge era
This was the era of waterfall application development and client-server architectures. Data was structured and usually rigid. Deployments were on-premise powered by expensive storage and compute. RDBMS was king of this era and SQL achieved mainstream adoption within the enterprise. A typical enterprise data management stack would look as follows –
- ESB – TIBCO EMS, IBM MQSeries
- ETL – Informatica, IBM DataStage
- EDW – Teradata, Vertica, Oracle Exadata
- BI – SAP BusinessObjects
In other words, this was the era of traditional software vendors, selling proprietary software to enterprise IT with SQL as the primary lingua-franca. IT was a cost-center, business leaders knew and could measure the ROI on data.
The open-source Big Data (the late-2000s to mid-2010s) – Adapting Yahoo, LinkedIn, Netflix technology for the Enterprise era
This era built upon the Agile movement driven by web-scale pioneers. Data was the new oil and fluidity mattered. Cloud deployments were in their infancy. A radically different data architecture emerged – based upon commodity hardware, cheap storage, and focussed on scale. These were driven by hyper-scale applications such as Google’s Search, and Amazon’s eCommerce. Engineers at application companies such as Yahoo, Facebook, and LinkedIn incubated and open-sourced their counterparts. Silicon Valley VCs jumped on the bandwagon as they got product market fit for free.
A typical enterprise data management stack would look as follows –
- ESB – RabbitMQ / ZeroMQ
- ETL – Map-Reduce over HDFS
- EDW – HBase over HDFS
- BI – Tableau (using a SQL wrapper and caching engine)
This era saw the rise of many NoSQL solutions under the open-source movement, but the crown belonged to Hadoop and the vendors that supported the ecosystem – Cloudera, HortonWorks, and MapR. Data sources proliferated across use cases such as Mobile, IoT, and SaaS. And vendors started touting AI/ML to the rescue to further attempt to mine value out of data.
The Cloud Native Stack (Mid 2010’s to 2020) – SQL strikes back era
Application development moved to the world of DevOps powered microservices. Data monetization was real and cloud deployments accelerated. Data platforms reacted to these movements swiftly. AWS led the way and proved that customers care about simplicity and agility of use, as opposed to using their favorite open source toy for solving their point problem. Public Cloud adoption went mainstream and multi-cloud deployments were a real need for Enterprise IT.
A typical stack in this current era looks like –
- ESB – Kafka / Kinesis
- ETL – DataBricks / EMR over S3
- EDW – SnowFlake / BigQuery
- BI – ThoughtSpot / PowerBI
This era reigned in open source gone wild, SQL was king(again), and delivering data-driven impact beyond the AI/ML hype was prioritized.
The Next Frontier – Simplicity and performance from convergence
The secular idea of data as the only path to transformation has been turbo-charged by the pandemic. Hence, the rigid boundaries between ESB, ETL, EDW, and BI have to collapse. Such an outcome will be driven by two aspects – one rethinking a bygone constraint and another an opportunity. System constraints that reinforced the rigid boundaries between ESB, ETL, EDW, BI are starting to wane off. The opportunity lies in reimagining from the ground up and end-to-end, a data platform that could drive AI/ML’s impact in production. In other words, how can a data analyst single-handedly put an ML model in production in weeks?
While data architectures across generations responded to the needs of the application, data agility, and deployment choices, they were siloed into the realms of ESB, ETL, EDW, BI. This siloed mindset, reinforced by the vendors, is why data’s impact within an enterprise is perceived as unfulfilled promises. What enterprises need isn’t yet another – cloud-native, Kubernetes-compliant, adapted from the latest <FaceBook, Uber or LinkedIn> toy. They need promises to be met from a simplified (hyper) converged data management platform.
About the Author
Darshan Rawal is Founder and CEO of Isima, aiming to win a once-in-25-year disruption in data, systems, and Applied AI. Isima’s bi(OS) is the world’s first Hyperconverged Data Platform serving builders of API, AI, and BI use-cases in a unified manner. Before Isima, Darshan built and sold data products to Fortune 1000 enterprises for two decades. During this journey, I have built perspective by building, scaling, and coaching high-performing teams across the US, EU, India, and APAC. I value the continuous pursuit of excellence in one’s craft while maintaining humility.
Sign up for the free insideBIGDATA newsletter.