Insights from Visualizing Public Data on Twitch | by Kiran Gershenfeld | Dec, 2020


Legend [By Author]

What is this image?

This is a computer-generated network graph created from a one week snapshot of twitch viewership.

A higher resolution version is available here.

Each node represents a single streamer that appeared in the top 100 streams on Twitch during data collection. Each node is analogous to one TV show like “Breaking Bad”.

The size of each node is determined by the number of unique viewers found in their stream throughout data collection.

Each line between nodes represents the number of viewers shared between those two streamers with a higher thickness indicating more overlap.

Those in the outer ring are streamers that didn’t have any significant viewership overlap with anyone else. I put them there manually so they were not flung off into the void and forgotten.

The colors represent algorithmically found viewership communities. In this context, I am defining a community to be a collection of streamers watched by the same viewers.

Where am I?

If anyone watches Twitch I encourage you to find where you fall as a viewer on the Graph. Here is where I watch:

My viewing habits [By Author]

I am partial to Dota2 and not League but the inclusion of a bit of MOBA in there is accurate nonetheless.

With this circle identified, I can now recommend myself new streamers that are watched by people similar to me. It is interesting to consider the potential impact of giving users more control over recommendations by making this type of data public. From here I will zoom out and explain some other interesting phenomena on the graph.

Who is Tommyinnit?

“Tommyinnit”, a 16 year old Minecraft streamer, had the largest number of unique viewers in my data. Despite his size, he does not appear in the center of the graph, instead lying on the outskirts of the huge English speaking cluster. Tommy and his group of friends all stream on the same Minecraft server, frequently interacting and talking to each other on stream. The group also has a massive presence on YouTube and, judging by his engagement metrics, Twitter as well.

The group’s massive viewership, but relatively insular placement on the graph, means that they are likely converting fans from other platforms into viewers on Twitch, and for the most part these viewers aren’t sticking around for any other streamers.

Keeping the Minecraft Viewers on Twitch

We can tell from the graph that Tommyinnit’s Minecraft community is huge but generally isolated from most of the platform. Twitch likely wants to push these viewers to bigger communities with more streamers to watch. The OfflineTV and Friends (Pink) community would be the natural choice here given its placement on the graph. This community is very popular on Twitch and YouTube, they play games with each other frequently, and they recently started their own Minecraft server. This means they could provide familiar content while also being more in tune with Twitch culture, easing new viewers into broader Twitch communities.

Recommending OfflineTV and Friends to Minecraft viewers could potentially capture the viewing hours of a massive younger demographic. A real life example of this is “Ludwig” (Pink) recently doing a Minecraft stream with some of Tommy’s friends and pulling in huge viewership.

Language Layout

The higher level structure of the graph is unsurprisingly decided by language. The large cluster in the middle is English and the surrounding clusters generally represent one language each. The communities (colors) don’t always follow the same boundaries though.

For instance, the red nodes in the English cluster like BLASTPremier are English CS:GO streams, not Russian as the color implies. This means CS:GO streams in English and Russian are sharing significant viewership despite the language barrier. The same phenomenon occurs with English FIFA streams being categorized as German, indicative of FIFA’s cultural dominance in Germany.

Unfortunately, languages with non-Latin characters are not shown. This is discussed further in the technical details below.

What on earth is “Variety”?

The largest community, and the one that appears in the center, is labeled as Variety (Purple). This is Twitch’s own name for the community and it comes from the fact that there is no defining game or category for them. This group is in the center because every community shares viewership with Variety, they are essentially the common denominator of Twitch.

That isn’t to say that they aren’t a real community though. The vast majority of the large streamers in Variety frequently talk to each other on stream, host podcasts with each other, compete in tournaments with each other, and sometimes live together. Though there may not be a defining game, the Variety community very much has an identity and a dedicated, massive audience for their antics.

Where did the data come from

I collected the data directly from Twitch using their API. During the week of December 6–12th I queried the top 100 streams and their viewers from the Twitch API. I added this to a database, removing duplicate viewers, to end up with a massive list of streamers and unique viewers who appeared in their stream. This database included 1123 streamers and was 800mb in size.

Next, I found the number of overlapping viewers between each streamer, writing any overlap above 300 into a large dictionary. This dictionary was processed and imported to Gephi where the graph could be constructed. Code for this is available here.


This data is a snapshot and is dependent on a stream’s viewership on a specific week at very specific times. Unfortunately, I couldn’t process streamers with non Latin characters in their name meaning the data does not include streamers with foreign characters in their name. Chinese and Indian streamers generally prefer other websites for livestreaming (Douya, Huya Live, YoutubeGaming) so I do not believe this significantly impacts the data. Finally, I manually removed some specific event streams like “TheGameAwards” because they were very large and do not represent a particular community.

Graphing and layout

In Gephi, I first found the modularity of the data. Modularity is a metric to determine densely connected nodes in a graph. In practice, this splits the nodes up into many communities which are denoted by the colors on the graph. I then ran a few layout algorithms ending with ForceAtlas2 to spatialize (create the layout of) the graph. After some manual adjusting to bring clusters closer together for readability, I put all the lonely nodes in the outer circle and exported it to an SVG.

I think that there are a lot of interesting insights to be gained from looking at data like this. I encourage everyone to view and download the image in a higher resolution and make your own conclusions. This was an incredibly rewarding project and I hope I can expand on it in the future. Thanks to Twitch for making so much data available to the public and also for the endless entertainment. All source code and images can be found on my Github here.

Read More …


Write a comment