Giter Site home page Giter Site logo

ta-instafake's Introduction

InstaFake Dataset

Dataset of the Intagram Fake and Automated Account Detection paper

Installation

Install miniconda

https://conda.io/en/latest/miniconda.html

Setup a CONDA environment

We create a new vitrual environment named instafake.

conda create --name instafake python=3.6
Activating the virtual environment

If you are inside the virtual environment, your shell prompt should look like: (instafake) user@computer:~$ If that is not the case, you can enable the virtual environment using:

conda activate instafake 

To deactivate the virtual environment, use:

conda deactivate
Install required packages

To install the required packages, run the following command in your instafake virtual environment:

pip install -r requirements.txt

Import Datasets as Dataframes

To import the fake and automated datasets as pandas dataframes, simply define the dataset folder path data, and dataset version dataset_version and call import_data from utils:

from utils import import_data

dataset_path = "data"
dataset_version = "fake-v1.0"

fake_dataset = import_data(dataset_path, dataset_version)

dataset_path = "data"
dataset_version = "automated-v1.0"

automated_dataset = import_data(dataset_path, dataset_version)

Dataset Structures

The dataset contains of 2 set of json files with given features:

Fake Account Detection
  1. user_media_count - Total number of posts, an account has.
  2. user_follower_count - Total number of followers, an account has.
  3. user_following_count - Total number of followings, an account has.
  4. user_has_profil_pic - Whether an account has a profil picture, or not.
  5. user_is_private - Whether an account is a private profile, or not.
  6. user_biography_length - Number of characters present in account biography.
  7. username_length - Number of characters present in account username.
  8. username_digit_count - Number of digits present in account username.
  9. is_fake - True, if account is a spam/fake account, False otherwise
Automated Account Detection
  1. user_media_count - Total number of posts, an account has.
  2. user_follower_count - Total number of followers, an account has.
  3. user_following_count - Total number of followings, an account has.
  4. user_has_highlight_reels - Whether an account has at least one highlight reel present, or not.
  5. user_has_url - Whether an account has an url present in biography, or not.
  6. user_biography_length - Number of characters present in account biography.
  7. username_length - Number of characters present in account username.
  8. username_digit_count - Number of digits present in account username.
  9. media_comment_numbers - Total number of comments for a given media.
  10. media_comments_are_disabled - Whether given media is closed for comments, or not.
  11. media_has_location_info - Whether given media includes location, or not.
  12. media_hashtag_numbers - Total number of hashtags, given media has.
  13. media_upload_times - Media upload timastamps.
  14. automated_behaviour - True, if account is an automated account, False otherwise

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name InstaFake Dataset: An Instagram fake and automated account detection dataset
alternateName InstaFake Dataset
url
sameAs https://github.com/fcakyon/instafake-dataset
sameAs https://github.com/fcakyon/instafake-dataset
description The InstaFake Dataset is comprised of anonymized Instagram user data collected by Fatih Cagatay Akyon and Esat Kalfaoglu over the second half of 2018. We’re releasing this dataset publicly to aid the research community in making advancements in machine learning based social media analysis.
provider
property value
name Fatih C. Akyon and Esat Kalfaoglu
sameAs https://scholar.google.com.tr/citations?user=RHGyDE0AAAAJ&hl=en
license
property value
name Attribution-NonCommercial
url

ta-instafake's People

Contributors

fcakyon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.