Fake Tweets

Hashtags analysis

The base for the analysis is the timespan of 9 year . It strats from November 2009 and follows untill November 2018. The hashtags analysis was performed both on all set of hashtags and hashtags written in English. Total number of unique hashtags used is approximatelly 72 thousand. At the same time, total number of unique hashtags written using latin alphabet is approximattely 56 thousand.

TF-IDF

In order to calculate the TF-IDF (term frequency–inverse document frequency) for the each hashtag, we processed data so that we had list of all the hashtags that were used for each day (with repetitions) during the specified period. At the same time, we created a list that contained every unique hashtag for the whole period.

The results for the whole set of the tweets is displayed in the Figure 1 (image to the right). The size of the word depends on it's TF-IDF value. It can be observed that #СПБ (Russian abbreveatio for Saint-Petersburg). More than that, large part of the largest words are Russian.

The second word is #news, followed by #новости (news), #sport, #politics . As the result, most of the top words are pretty generic, including #music, #crime, #tech etc. If we go down the list, we can find some hashtags that are related to events or heavily disscussed topic, for example, #ДНР (Donetsk People's Republic), #Крым (Сrimea), #НевскиеНовости (news portal located in Saint-Petersburg), #Украина (Ukraine) etc.

Figure 1: TF-IDF for full set of hashtags

Figure 2: TF-IDF for English set of hashtags

Same is true, if we perform calculate TF-IDF only on hashatgs that are written using latin alphabet (see, Figure 2). Cloud is full of generic hashtags such as #news, #sports, #usa, #world, #local, #politics and etc.

However, if we go down the list we can observe hashtags as #policebrutality, #blacktolive, #americafirst, #trump, #makeamericancreatagain and more.

Hashtag frequencies over time

According to the plot, the activity of these accounts can be separated into several phases:

  1. 2009-2011: Accounts were not mentioning hashtags
  2. January 2012 - December 2014: started to use hashtags such as #usa and #news.
  3. January 2015- December 2016: The usage of hashtags suddenly spikes for the next two years. The most popular hashtags are #news, #politics #breaking
  4. *November 2015:sudden drop of activity considering all the hashtags; hashtags such as #world, politics drop almoust 3 time (possible connection to Paris Attack? || Turkey shot a Russian plane?).
  5. January 2017- January 2018: accounts activelly use new hashtags such as #maga (make america great again), #fakenews (which also spikes in November 2017), also a big spike in #makeamericagreatagain and #americafirst in June 2017.
  6. January 2018 untill now: almoust no activity.

Activity pattern is changing on Januaries.