Portable Scalable Data Visualization Techniques for Apache Spark and Python Notebook-based Analytics


Python Notebooks are great for communicating data analysis & research but how do you port these data visualizations between the many available platforms (Jupyter, Databricks, Zeppelin, Colab,…). Also learn about how to scale up your visualizations using Spark. This talk will address:

-6-8 strategies to render Matplotlib that generalize well
-Reviewing the landscape of Python visualization packages and calling out gotchas
-Headless rendering and how to scale your visualization from one to 10,000
-How to create a cool animation
-Connecting your big data via Spark to these visualizations
Data visualization is the only way most analytics consumers understand data science and big data. It’s challenging to visualize big data, and harder to get this to work across multiple open platforms. Double down on the difficulty for rendering 100,000 visualizations needed for ML Operations automation and data driven animations. Popular Python based Matplotlib, D3.js based, Bokeh and high density visualization packages and best ways to integrate those with massive data sets managed by Spark will be the subject of our presentation. We will demonstrate common strategies (image, svg, HTML embed) and gotchas common with integrating Spark, Jupyter and non-Jupyter environments. Headless data visualization strategies are used to automate Machine Learning Operations and data driven animations. A Python notebook will be the center of this demo. The strategies presented are accessible by those with a passing experience with Python based data visualization packages.



One Comment

Write a comment