Python Tutorial: Visualize the Reach of Facebook Ads | by Arjun Tambe | Dec, 2020


Let’s dive right in.

Step 1: Access the Facebook API

Like many other APIs (Application Programming Interface), you need to sign up to use it. Follow the steps on Facebook’s Ads Library API to sign up and create an app here. We’re not actually creating an app for others to use; we’re just using it to access Ads data. You don’t have to grant your app any permissions beyond the default.

Facebook Graphs API Explorer: how to get an access token

When you’re done, you’ll be able to access the API here (or from Tools > Graphs API Explorer). Click “Generate Access Token” and copy it. Bookmark the page: access token expires every 2 hours, so you’ll have to go back every time it expires.

Step 2: Start a Python script to use the API

If you’re already familiar with JSON, skip to the next section.

To access data from the Facebook API, we can use the requests module to connect to a URL, make sure the response is valid (status code 200), and return the content of the response. Make sure to paste in your access token from the API.

The URL is just a specially formatted string containing several parameters to select which and what information about ads we want. Each parameter is specified in the URL with the parameter name set equal to the arguments, separated by the & symbol, like this:

Step 3: How to write a query for Facebook Ads API

Now, we’ll cover how to build the URL string by appending relevant parameters to a starting URL. (Another option is to manipulate the URL with the requests library).

A) Base URL: The URL always starts with: “". We must include the parameterad_reached_countries. To fill this field, we have to use country ISO codes, which you can download here. If you want to specify multiple countries, you have to create a list with square brackets and single-quote marks around the country codes. [Note 1].

What our URL should look like now:['US', 'IN', 'BR']

B) Search terms or Page IDs. The Ads API also requires you to specify either the search_term parameter, or the search_page_ids parameter. search_term works like a normal search with spaces working like ANDs.

search_page_ids requires you find the ID of a Page whose ads you want to see. To find it, you can search the Ads Library here for the name of a page you’re interested in, click on the page, and see the page ID in the URL (where it says view_all_page_id=<ID>). To specify multiple IDs, we need to use a list with square brackets again. Make sure to add an & between parameters.

Now our URL should look like this:['US', 'IN', 'BR']&search_page_ids=[338109312883186,300653326696613] (or using search_terms instead of search_page_ids).

C) Other parameters: In addition to the above, you can specify a few additional criteria, shown below. The parameter name is shown on the left with valid arguments (either enumerated or their type) below.

Source: Facebook Ads API Documentation

It’s also useful to test your criteria with the ads tool to see if your criteria work as expected. You can just add the parameter names to the URL for the ads tool, and see if you’re getting the kinds of ads you were hoping for.

D) Response fields

A lot of information is available on each ad, but only some of it is shown by default. Take a look at the information we can get on each ad below:

Source: Facebook Ads API Documentation

To get a set of response fields different from the default ones, use the parameter fields with a list in single-quotes:fields=[‘impressions’, ‘id’, ‘region_distribution’]. To make the map visualization we want, make sure to include 'region_distributions’ and 'impressions' in addition to the other fields you want.

Now, the URL should look like this:['US', 'IN', 'BR']&search_page_ids=[338109312883186,300653326696613]&fields=['id',‘impressions’,‘region_distribution’]

E) Final step: get the data!

Okay, we’re done making the URL. Pass it into the get_json_response(url) function we made earlier, and then get the data, which we can do just by calling the key [‘data’] (the response is a dictionary).

Sometimes, we’ll get more than 100 results, and the data will be “paginated” instead of being available all together. We can use the key ['next'] inside of the item for the key ['paging'] to get a new URL for the next page of data. When we’re out of data, the next page will be empty. Here’s the code:

Step 4: Parsing and cleaning the data

A) Unpacking the data

We currently have the variable ads_df storing a Pandas dataframe with the ads data. The dataframe looks like this:

Dataframe after Step 3

Each row corresponds to a single ad. Ultimately, we want to sum up regions’ impressions across ads. To do this, we first need to “unroll” the region_distribution column so each row has just one region. We also want to multiply the percentage by the total impressions, so we have an the number of impressions for each ad for each region [Note 2]. The resulting dataframe looks like this:

What the df should look like after this step

To do this, we’ll write a function that takes a row, and returns the corresponding expanded set of rows. We can use apply (with axis=1 to iterate through rows, not columns) to iterate faster, but we can’t plug the result back into the original dataframe since the number of rows has changed. Instead, we’ll use apply to put each set of expanded rows in a big list, and concatenate them all at the end, like this:

The function expand_region_distribution(row) should create a new dataframe from that row where each new row is an entry from the region_distribution dictionary [Note 3: Additional options]. It also should add additional columns to the new dataframe: one is region_impressions, which we get by multiplying percentage by the total impressions, and the rest are copies of the same row the function was passed:

C) Fixing non-matching regions

Unfortunately, region names are very non-standard, and country names aren’t even included in our dataframe. About 20% of the regions won’t get mapped if we don’t clean up the name.

Clone this tool from Github to fix this problem. It uses a geographical dataset along with a different Facebook API and some manually specified region names to take the name of a given region, and return its correct name along with the country and ISO code (an standard two-letter code to notate countries). You can read the documentation for a deeper explanation.

To use it, use your same Facebook access token, and just use apply to convert all the region names in your data and assign them to a new set of columns.

Step 5: Preparing to map

A) Getting map data. Now that we’ve got our dataset ready, how do we put the results on a map? We’ll use the Natural Earth dataset here to draw the map. Download 2 files: regions data from “Admin 1 Regions” (for the regions) and countries border data from “Admin 0 Countries” (to draw country borders on a world map).

We’ll use the fantastic tool GeoPandas to read in the data. Always keep all of the files we downloaded (.shx, .cpg, etc) together and keep their names the same. We’ll also drop any duplicate regions within a country (there are a few) and any rows missing a region, since this could screw up our data.

B) Prepare to merge. Before we merge the ads and geographical data, we have to get a total number of impressions for each region, since multiple ads can reach the same region. Make sure to fill blank spots in region_corrected first (otherwise, those regions will be removed from the dataframe entirely — more on that next). Check the total number of impressions afterwards.

C) First merge. In Step 4C, we made a new column ‘region_corrected’, which matches with the column ‘name’ in geo_data. Some region names are the same in multiple countries, so we’ll merge on both the region name and the ISO code to make sure we’ve got the correct country. Merge to the left to keep all the regions in the geographical data. Check the number of impressions again.

C) Preparing for a second merge. You may have noticed that the number of impressions after the merge was less than the number of impressions before. Some ads data was lost because some of the region_corrected entries could not be merged.

All the ads data at least has a country name and ISO code. So, we’ll “distribute” the impressions from those regions to the whole countries that the region belongs to. For instance, if an ad shown in California got 50 impressions and we couldn’t find California in the geographical data, we’d add 1 impression each to all 50 states.

First, let’s figure out where the merge failed . For each row in ads_df, we see which rows in geo_data had a matching ISO and region name, and return True if there are no such rows — i.e. if the ads_df row couldn’t be merged. We select those rows and sum up the result by the country:

Then, we count how many regions exist in each country usingthe geographical dataframe. We merge that into the not_merged dataframe, and divide the total missing impressions per country by the number of regions to see how many “extra” impressions have to be added to each region.

D) Second merge. Finally, merge the result into geo_data, and add those extra impressions to each region’s number of impressions. Since we’re copying one row (for a country) of not_merged into multiple rows of geo_data, (each region for the same country), we’ll make sure we’re doing a many-to-one merge (validate='m:1'). Make sure to fill missing values with 0.

Now, the resulting size should match the original one!

Step 6: Map the results!

We’re going to use regions data to visualize each region, but countries data to visualize borders. We’ll use geopandas to map (check out this tutorial, too). First, we’ll draw the country borders. We get the axis so that we can draw the regions on the same axis.

Then, plot the region data. The figure at the top uses the colormap ‘Oranges’, but there are lots of other colormaps you can use [Note 4]. Once you add, you should be able to see your map [Note 5]!

Global reach of ads by media companies sponsored by the Chinese government

What uses does this technique have? One obvious use is to visualize advertising reach. If you make a map like this for Amazon or Allbirds, you can see where the company is having the most success advertising its products.

But beyond this, some ads serve political purposes. Mapping out political campaign ads within the United States (e.g. for Biden or for Trump) could suggest where one party is likely to gain traction. Some ads, like these ads from PragerU or these from ExxonMobil, serve political goals outside of a political campaign; you could use these see where the political narratives associated with those ads are most influential. You could make a map for ads sponsored by other governments — perhaps contrasting two governments’ media efforts side-by-side— to see how states compete for influence in the realm of social media.

The Ads Library API can serve a vast array of purposes. Hopefully, this tutorial has helped you extract powerful insights from the data!

Read More …


Write a comment