Fake Tweets

Overview

Data

Recently Tweeter has released a huge dataset that contains tweets posted by the troll acounts. The dataset set is publicly available at the Tweeter official page.

5.3 GB


in size

10m


tweets

31


variables

Network

3k


nodes

~2k


largest component

160k


links

Variables / Features

Most used variables during the project:

Variable Discription
tweetid tweet indefication number (unique per tweet)
userid hashed if less than 5k followers
user_profile_description description at the time of suspension
account_language user specified account language
tweet_time according to the UTC
tweet_text text of the tweet
hashtags a list of hashtags used in a tweet
retweet_userid for retweets, the userid who authored the original tweet
user_mentions a list fo userids who were mentioned in the tweet
likes_count likes count for a tweet

and many more...

Summary

Full graph: daily activity of the troll-network

Likes

Loglog-plot of the like distribution of all the Tweets in the dataset.

Retweets

Loglog-plot of the retweet distribution of all the Tweets in the dataset.

Followers

Loglog-plot of the follower distribution of all the account in the dataset