Giter Site home page Giter Site logo

pachadotdev / tradestatistics-database-postgresql Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 2.0 51.92 MB

Tidy trade data from UN COMTRADE and also countries, commodities, RTAs and tariffs tables. Uses RDS and Apache Arrow, then uploads to PostgreSQL.

License: Apache License 2.0

R 100.00%
apache-arrow comtrade postgresql r trade

tradestatistics-database-postgresql's Introduction

UN COMTRADE Datasets in Arrow Parquet

Source the file 00-download-data.R. The script asks you if you have already configures the environment variables for UN COMTRADE token.

What does the script do?

Let's you select and download the complete yearly records under different trade classification:

  1. HS rev 1992 (1992-2020)
  2. HS rev 1996 (1996-2020)
  3. HS rev 2002 (2002-2020)
  4. HS rev 2007 (2007-2020)
  5. SITC rev 1 (1962-2020)
  6. SITC rev 2 (1976-2020)
  7. SITC rev 3 (1988-2020)
  8. SITC rev 4 (2007-2020)

Then the downloaded files are saved locally in ZIP (as they come from UN COMTRADE) and Parquet format.

How is this done?

The scripts complete the next steps:

  1. A prompt is shown asking you if you have already obtained and saved a token to be able to download files from UN COMTRADE, then you'll be asked which classification you want to download (e.g. HS92), and if you want replace the old files with newer ones if local files are older than the available versions from UN COMTRADE at the moment of running the scripts.
  2. A CSV file containing the downloaded files, indicating the local download date, when was the file uploaded the UN COMTRADE and the download link will be saved locally (e.g. see LINK).
  3. Another CSV file containing subset of the updated files shall be saved locally (e.g. see LINK).
  4. If you selected the option of replacing old files, the old ZIP and Parquet files for the different years shall be replaced for new files. The parquet files are created by extracting the CSV file for each year from the ZIP, and then reading it to save a Parquet version with minimal edition (just replacing NAs with "0-unspecified" for the hive-stlye partitioning). The CSV files (which are extracted one at a time) are deleted after the Parquet files for a year are created.
  5. If the Parquet files for a certain year are not present, the scripts shall create those files with the same changes as in point (4).

Required free disk space

Depends on the classification. For example, SITC rev 2 (1976-2020) needs 20GB of free space for the process and 18GB afterwards:

  • 7.1GB to download the ZIP files
  • 10.6GB for the Parquet files, which need free space to be created year-by-year.
  • Up to 2.7GB for each extracted CSV file for Parquet files creation

Notes

The scripts require 7-Zip to be installed.

tradestatistics-database-postgresql's People

Contributors

pachadotdev avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.