Automatic generation of large PowerPoint decks from survey data with Quantipy Python package

[ad_1]

by Geir Freysson |


About Geir: Geir is the co-founder and CEO of Datasmoothie, a tech firm that brings the enjoyment again into statistical evaluation. Geir can be a caffeine fanatic and Internet addict.

Introduction

How is the President doing within the newest polls? Are your workers glad? Is this medication working? Do individuals assume machines will exchange people within the subsequent 50 years?

Quantipy is an open supply Python library developed in collaboration between the opinion polling firm YouGov and us, Datasmoothie. Quantipy focuses on making life simpler for researchers within the individuals data business who work on gathering solutions to questions like those above.

In this weblog put up we’re going to present you find out how to take uncooked survey data and robotically generate a PowerPoint slideshow, which could be themed in response to your model. You’ll find yourself with 28 robotically generated slides that present individuals’s opinions on numerous questions on machine automation, as surveyed by PEW Research centre.

Quantipy creates slides that use native PowerPoint/Excel charts to allow them to be branded with completely different PowerPoint themes. On the left is a fundamental chart and on the fitting the person has chosen one of the usual PowerPoint themes.

Reading survey data into python with Quantipy

We begin by studying the data into Quantipy, straight from SPSS (a standard file-type within the individuals data industries).

dataset = qp.DataSet(‘pew-dataset’)
dataset.read_spss(‘./gaming-jobs-broadband.sav’)

We begin by exploring what variables are within the dataset.

dataset.list_variables()
['sex','marital','ideo','q1,'q2', ...]

and we are able to take a look on the meta data to see what solutions have been obtainable for particular questions (variables) as nicely.

dataset.describe(‘marital’)
single                                      codes texts
marital: MARITAL. Are you at present married ...
1                                           1 Married
2                                           2 Living with a associate
3                                           3 Divorced
4                                           4 Separated
5                                           5 Widowed
6                                           6 Never been married
7                                           8 Don’t know
8                                           9 Refused None

Decide what questions and solutions to incorporate

Once we’ve determined what variables we would like in our PowerPoint presentation we add them to the suitable arrays. We are going to construct a slide deck that exhibits solutions to questions on machine automation and we’re going to analyse how individuals of completely different genders, with completely different ranges of schooling, completely different marital statuses and in numerous areas responded.

#these are the questions
xvars =[‘auto1a’,’auto1b’,’auto1c’,
        ’auto1d’,’auto1e’,’auto2',’auto3']
#and these are the teams we're evaluating to
yvars = [‘sex’,’ideo’,’cregion’,’educ2',’marital’]

Data aggregation

We then create a so-called Quantipy Stack which shops all of our aggregations and in addition provides us entry to pulling them out in handy methods.

stack = qp.Stack(add_data={'pew': {'data': dataset.data(),
                                   'meta': dataset.meta()}})

#that is the place the aggregations occur
stack.add_link(x=xvars,y=yvars,views=['cbase','c%'])

We can discover the stack straight, for a sanity examine. We use a Python IDE particularly designed for data-science made by Yhat, referred to as Rodeo, and in it we use the console to plot the outcome of how individuals answered the variable “auto1a” in response to their marital standing. Don’t fear about how the chart seems to be for now.

stack[‘pew’][‘no_filter’][‘auto1a’][‘marital’][‘x|f|:|y||c%’].dataframe.transpose().plot.barh(stacked=True)

A screenshot from Rodeo, the data-science IDE for Python. We plot the outcomes of a query in response to how individuals answered evaluating their marital standing. We view the codes relatively than the labels for now, however we’ll add labels earlier than we create the presentation. Answers Eight and 9 are “don’t know” and “refuse to answer”. Most individuals appear to know what there marital standing is.

Export to PowerPoint

Now that we’ve aggregated our outcomes we wish to current them in PowerPoint. We use the python-pptx library to generate our PowerPoint information and a few useful Quantipy helper strategies to create slides from pandas dataframes. We’re additionally prepared for the data to be client pleasant, so we exchange codes with precise labels with paint_dataframe.

from pptx import Presentation
import quantipy.core.builds.powerpoint.helpers as hp
from quantipy.core.helpers.features import paint_dataframe
prs = Presentation()
# we wish to group some background varialbes onto the identical slide,
# e.g. gender and marital standing
slide_vars = [[‘sex’,’marital’],[‘cregion’],[‘educ2’],[‘ideo’]]
for query in xvars:
    for slide in slide_vars:
        chains = stack.get_chain(x=query,y=slide,views=
                           [‘x|f|:|y||c%’],orient_on=’x’,guidelines=True)
        df = paint_dataframe(dataset.meta(),
                             chains[0].concat().transpose())
        chartData = hp.ChartData_from_DataBody(df)
        question_label = df.columns.ranges[0][0]
        hp.add_slide_with_chart(prs, chartData,
                                question_label,legend=True,
                                normalized=True)
prs.save(‘my-report.pptx’)

That’s all there’s to it. You now have a PowerPoint Document with 28 slides, seven questions in contrast throughout 5 background variables. Because the charts are exported to native PowerPoint/Excel charts they reply to modifications within the theme, so the slides could be branded in response to your theme with the clicking of a mouse.

You may export on to a web based dashboard or report with Datasmoothie (like this one), which shall be lined in one other put up.

[ad_2]

Source hyperlink

Write a comment