Giter Site home page Giter Site logo

justintimperio / gdelt-diff Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 76 KB

An Automated File Manager for Maintaining a Local Copy of GDELT Source Files

License: MIT License

Python 78.86% Shell 21.14%
gdelt gdelt-files gdelt-knowledge-graph gdelt-data gdelt-events

gdelt-diff's Introduction

GDELT-Diff

Codacy grade GitHub

Abstract

This small tool is designed to automate the download, orginization, and storage of GDELT source files. GDELT-Diff includes a daemon that runs every 60 mins fetching any new or missing files and sorts them into folders for easy storage. Additionally, an extremely lightweight tool is provided to maintain a copy of only the streams most recent files in /tmp/gdelt-live. This is for anyone doing real-time analysis of the GDELT and doesn't require a full copy of the source files.

What is the GDELT?

The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Advanced users and those with unique use cases can download the entire underlying event and graph datasets in CSV format. Deep technical knowledge and extensive experience working with large datasets is required to make use of these datasets, with the GKG alone requiring more than 2.5TB of storage compressed.

To learn more about the GDELT and the records that make up its database, check out the offical documentaion page.

Install Instructions

NOTE: This utlity is designed for large servers with a MINIMUM +100GB OS Drive, +10TB of storage, and +32GB of RAM. Also please consider how many files you need to sync before running.

  1. If you have a pre-existing directory of GDELT files, YOU MUST ensure that files are organized into folders by stream, year and month(/path/stream/2015/05/)
  2. Install GDELT-Diff:
curl https://raw.githubusercontent.com/JustinTimperio/gdelt-diff/master/build/install.sh | bash
  1. Edit Your User Config File With The Paths You Wish to Use:
sudo vi /etc/gdelt-diff/config
  1. Manually Run GDELT-Diff to Ensure Everything is Setup:
sudo gdelt-diff -d
  1. Enable Automatic Downloads With:
sudo systemctl enable gdelt-diff.timer
  1. Enable Automatic Live Downloads With:
sudo systemctl enable gdelt-live.timer

Uninstall GDELT-Diff:

This will NOT remove the files you have downloaded

sudo /opt/gdelt-diff/build/remove.sh

CLI-Tool

When using the utlity manually simply stop the systemd.timers and call gdelt-diff manually:

sudo gdelt-diff --diff

To sync only one stream use:

sudo gdelt-diff --diff_english

OR

sudo gdelt-diff --diff_translation

To force a fetch of 404'd URLs use:

sudo gdelt-diff --retry

To refresh the database of synced files:

sudo gdelt-diff --refresh_database

To see all options and flags:

sudo gdelt-diff -help

gdelt-diff's People

Contributors

justintimperio avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gdelt-diff's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.