Better Blog Post Analysis with googleAnalyticsR


In my earlier function as a advertising and marketing information analyst for a running a blog firm, one among my most vital duties was to trace how weblog posts carried out.

On the floor, it’s a reasonably simple objective. With Google Analytics, you’ll be able to rapidly get nearly any metric you want on your weblog posts, for any date vary. 

However in terms of evaluating weblog submit efficiency, issues get a bit trickier. 

For instance, let’s say we wish to examine the efficiency of the weblog posts we revealed on the Dataquest weblog in June (utilizing the month of June as our date vary). 

However wait… two weblog posts with greater than 1,000 pageviews have been revealed earlier within the month, And the 2 with fewer than 500 pageviews have been revealed on the finish of the month. That’s hardly a good comparability!

My first resolution to this drawback was to search for every submit individually, in order that I might make a fair comparability of how every submit carried out of their first day, first week, first month, and many others. 

Nevertheless, that required numerous handbook copy-and-paste work, which was extraordinarily tedious if I wished to check various posts, date ranges, or metrics at a time. 

However then, I realized R, and realized that there was a a lot better method.

On this submit, we’ll stroll by the way it’s finished, so you are able to do my higher weblog submit evaluation for your self!

What we’ll want

To finish this tutorial, you’ll want primary information of R syntax and the tidyverse, and entry to a Google Analytics account.

Not but conversant in the fundamentals of R? We can assist with that! Our interactive on-line programs educate you R from scratch, with no prior programming expertise required. Sign up and start today!

You’ll additionally want the dyplr, lubridate, and stringr packages put in — which, as a reminder, you are able to do with the set up.packages() command.

Lastly, you will want a CSV of the weblog posts you wish to analyze. Right here’s what’s in my dataset:

post_url: the web page path of the weblog submit
post_date: the date the submit was revealed (formatted m/d/yy)
class: the weblog class the submit was revealed in (non-compulsory)
title: the title of the weblog submit (non-compulsory)

Relying in your content material administration system, there could also be a method so that you can automate gathering this information — however that’s out of the scope of this tutorial!

For this tutorial, we’ll use a manually-gathered dataset of the previous ten Dataquest weblog posts.

Organising the googleAnalyticsR package deal

To entry information from the Google Analytics API, we’ll use the superb googleAnalyticsR package deal by Mark Edmonson. 

As described within the documentation, there are two “modes” to the googleAnalyticsR package deal. The primary mode, which we’ll use right here, is a “Strive it out” mode, which makes use of a shared Google Venture to authorize your Google Analytics account. 

If you wish to make this report a recurring device on your weblog or consumer, make sure you create your personal Google Venture, which is able to assist preserve the visitors on the shared Venture to a minimal. To learn the way to set this up, head over to the package setup documentation.

For now, although, we’ll stick to “Strive it out” mode. 

First, we’ll set up the package deal utilizing this code:

set up.packages('googleAnalyticsR', dependencies = TRUE)

This installs the package deal, in addition to the required dependencies.

Subsequent, we’ll load the library, and authorize it with a Google Analytics account utilizing the ga_auth() perform.


Once you run this code the primary time, it’ll open a browser window and immediate you to log in to your Google account. Then, it provides you with a code to stick into your R console. After that, it’ll save an authorization token so that you solely have to do that as soon as!

When you’ve accomplished the Google Analytics authorization, we’re able to arrange the remainder of the libraries and cargo in our weblog posts. We’ll additionally use dplyr::mutate() to alter the post_date to a Date class whereas we’re at it!


blog_posts <- learn.csv("articles.csv") %>%
    post_date = as.Date(post_date, "%m/%d/%y") 

Right here’s what the weblog submit information body seems like: 

Lastly, to get information out of your Google Analytics account, you will want the ID of the Google Analytics view you wish to entry. ga_account_list() will return an inventory of your obtainable accounts.

accounts <- ga_account_list()

view_id <- accounts$viewId[which(accounts$viewName == "All Web Site Data" & accounts$webPropertyName == "Dataquest")]

Now, we’re able to do our first Google Analytics API requests!

Accessing weblog submit information with googleAnalyticsR

On this tutorial, our objective is to assemble information for the primary week every submit was lively, and compile it in a dataframe for evaluation. To do that, we’ll create a perform that runs a for loop and requests this information for every submit in our blog_posts dataframe.

So, let’s check out how one can ship a request to the Google Analytics API utilizing googleAnalyticsR.

                  date_range = c(as.Date("2020-06-01"), as.Date("2020-06-30")),
                  metrics = c("pageviews"),
                  dimensions = c("pagePath")

This request has just a few elements. First, enter the view_id, which we already saved from our ga_accounts() dataframe.

Subsequent, specify the date vary, which must be handed in as an inventory of dates.

Then, we enter the metrics (like pageviews, touchdown web page periods, or time on web page) and dimensions (like web page path, channel, or system). We are able to use any dimension or metric that’s obtainable within the Google Analytics UI — right here’s a useful reference for locating the API identify of any UI metric or dimension.

So, the request above will return a dataframe of all pageviews in June, by web page path (by default googleAnalyticsR will solely return the primary 1,000 outcomes).

However, in our case, we solely wish to retrieve pageviews for a selected web page – so we have to filter on the pagePath dimension utilizing a dimension filter, which seems like this:

page_filter <- dim_filter(dimension = "pagePath",
                          operator = "REGEXP",
                          expressions = "^$")

To make use of this filter in our request, googleAnalyticsR desires us to create a filter clause – which is how you’d mix filters if you happen to wished to make use of a number of dimension filters. However in our case, we simply want the one: 

page_filter_clause <- filter_clause_ga4(checklist(page_filter))

Now, let’s strive sending a response with this filter:

              date_range = c(as.Date("2020-07-01"), Sys.Date()),
              metrics = c("pageviews"),
              dimensions = c("pagePath"),
              dim_filters = page_filter_clause)

The result’s a dataframe with the pageviews for the R Markdown submit!

Creating the for loop

Now that we will collect information and filter it by dimension, we’re able to construct out our perform to run our for loop! The steps to the perform are:

  • Arrange a knowledge body to carry the outcomes
  • Start the loop primarily based on the variety of rows within the information body
  • Entry the submit URL and submit date for every submit
  • Create a web page filter primarily based on the submit URL
  • Ship a request to Google Analytics utilizing the post_date as the beginning date, and date the week later as the top date
  • Add the submit URL and pageview information to the ultimate information body

I even have added a print() command to tell us how far alongside the loop is (as a result of it could possibly take awhile) and a Sys.Sleep() command to maintain us from hitting the Google Analytics API fee restrict.

Right here’s what that appears like all put collectively!

get_pageviews <- perform(posts) {

  last <- tibble(pageviews = numeric(),
                      post_url = character())

  for (i in seq(1:nrow(posts))) {

    post_url <- posts$post_url[i]
    post_date <- posts$post_date[i]

    page_filter <- dim_filter(dimension = "pagePath",
                              operator = "REGEXP",
                              expressions = post_url)

    page_filter_clause <- filter_clause_ga4(checklist(page_filter))

    page_data <- google_analytics(view_id,
                                    date_range = c(post_date, post_date %m+% weeks(1)),
                                    metrics = c("pageviews"),
                                    dim_filters = page_filter_clause)

    page_data$post_url <- post_url

    last <- rbind(last, page_data)

    print(paste("Accomplished row", nrow(last), "of", nrow(posts)))





We might probably velocity this up with a “purposeful” in R, comparable to purrr::map(). The map() perform takes a perform as an enter and returns a vector as output. Try Dataquest’s interactive online lesson on the map function if you would like to deepen your information!

For this tutorial, although, we’ll use a for loop as a result of it is a bit much less summary. 

Now, we’ll run the loop on our blog_posts dataframe, and merge the outcomes to our blog_posts information.

recent_posts_first_week <- get_pageviews(blog_posts)
recent_posts_first_week <- merge(blog_posts, recent_posts_first_week)


And that’s it! Now, we will get on to the good things — analyzing and visualizing the information.

Weblog submit information, visualized!

For demonstration, here is a ggplot bar chart tha


recent_posts_first_week %>%
  ) %>%
    pretty_title = str_c(str_extract(title, "^(S+s+n?){1,5}"), "..."),
    pretty_title = issue(pretty_title, ranges = pretty_title[order(post_date)])
  ) %>%
  ggplot(aes(pretty_title, pageviews)) +
  geom_bar(stat = "id", fill = "#39cf90") +
  coord_flip() +
  theme_minimal() +
  theme(axis.title = element_blank()) +
  labs(title = "Latest Dataquest weblog posts by first week pageviews") +
  scale_y_continuous(labels = comma)

Now we will see how helpful it’s to have the ability to examine weblog posts on “even footing”! 

For extra data on the googleAnalyticsR package deal and what you are able to do with it, take a look at its very helpful resource page


Source link

Write a comment