Creating hashtag co-occurrence networks

06 Apr 2023

Why create co-occurrence networks?

Social networks are a “powerful way to represent and study simple and complex interactions” (Menczer et al., 2020). One of the ways we can do this by generating bipartite user/hashtag networks - which are seeded by a hashtag (in this example, #MeToo) - and projecting the hashtags into a weighted co-occurrence network.

By analysing the characteristics of the co-occurrence network, we can cluster nodes into ‘communities’. Detecting these communities and examining their coherence can support the understanding of the ‘sub-themes’ of activism (or other non-activism-related topics) that people organise around.

Gathering the data

The first step in this process is data collection. At the current time of writing, it is possible to access the Twitter API - who knows (Elon aside) if this will still be the case in the future!.

The dataset can be extracted using the Twython (a Twitter API wrapper) package to access Twitter API v1.1. See Twython documentation for more information.

Note: OAuth 2 authentication provides simpler authentication than OAuth 1 and is better suited for read-only calls.

The seed hashtag (i.e. “#MeToo”) should be added to the q parameter for the search which will return up to 50,000 tweets from the last 7 days that used that hashtag. e.g.

from twython import Twython

# authenticate using API credentials
twitter = Twython(consumer_key, consumer_secret, oauth_version=2)

# generate access token and apply
access_token = twitter.obtain_access_token()

twitter = Twython(consumer_key, access_token=access_token)

# search
twitter.search(q=hashtag, 
                tweet_mode='extended', 
                count=100, 
                lang = 'en', 
                result_type = 'mixed')

They can then be stored in a JSON file.

Creating the bipartite network

A bipartite user/hashtag network can provide useful information around influential users within a network, and can also support bot detection by providing metrics that indicate an anomalous volume of hashtag usage for a single user within the network.

The below image shows the #MeToo bipartite network that was generated with networkx by creating a DiGraph object and making the users the source nodes and the hashtags the targets. This is visualised with pyvis:


Projecting the network

A hashtag co-occurrence network is a specific type of bipartite network that can be created by projecting the ‘hashtag’ nodes of the network onto the ‘user’ nodes, generating a weighted network.

Nodes co-occur if they have common neighbours (Menczer et al. 2020, section 4.5). Hashtags therefore co-occur when a user (or users) mentions both hashtags in a tweet.

This projection retains the hashtag nodes and links are added to connect nodes that co-occur. Note: It would also be possible to project the users in the same way.

The projected network is an undirected, weighted network. It is undirected because the hashtag co-occurrence is a bilateral relationship, and it is weighted, with the link weights representing the number of times the hashtags have co-occurred (both mentioned) within a tweet.


Community detection

Nodes in networks are often grouped in communities (or modules or clusters) which are “sets of nodes with a relatively higher density of connections between them”. (Menczer et al. 2020, section 6).

These communities are interesting because they provide us with information about network structure and what purpose these clusters serve in relation to the network. An example of this could be polarised political opinions, which may divide a social media network into ideological clusters (these can also be conceived as ‘echo chambers’ or ‘filter bubbles’). (Menczer et al. 2020, section 6).

The below image shows the different communities under the #MeToo banner, with partitions created by the Louvain algorithm, which created the most cohesive clusters (available using networkx).

The #MeToo network has influential hashtags that are in line with the aims of the network, however, there are also communities organised around hashtags (like #FirstThem) that question the approach of the movement, and others that outright oppose it (such as #FeminismIsCancer).

The cohesiveness within and separation between the different clusters is apparent when visualised with pyvis:


Bibliography

Menczer, Filippo. Fortunato, S, Davis, C, A. (2020) A First Course in Network Science, Cambridge University Press