Giter Site home page Giter Site logo

parse_ghcn_data's Introduction

GHCN climate data

The Global Historical Climatology Network (GHCN) has daily weather data from 75,000 weather stations in 180 countries and territories (see https://www.ncdc.noaa.gov/oa/climate/ghcn-daily/). The data are in text files with ".dly" extension. Unfortunately, .dly files are written in an arcane format that is extremely difficult to use. Several tools exist to download and view data, for example:

However, FedData seems to drop the measurement, quality and source flags, and the data are still in a somewhat awkward format. I don't own Matlab, and I was unable to get the Python script to work due to an installation problem with missing libraries in a Python virtual environment.

By contrast, this R function is very simple (and R is free). You pass it the name of a .dly file, it parses the file and loads it into a regular R dataframe that retains all of the original information and can easily be saved as a .csv file. The function is designed for a single file, but it would be trivial to automate it to process a batch of files.

Note: you should have the R library "reshape2" installed (it is a popular library for manipulating dataframes).

Finding weather stations and downloading files:

Download the list of GHCN stations from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt. Then read the list into R, select stations in your area of interest, and download them. This method is dead simple and there are much slicker alternatives, but it's not hard to use and you can see exactly what is happening. The FedData library has a more sophisticated version that allows you to choose years, variables and extent, but it missed some stations when I gave it a bounding box.)

#The schema of the station data, from readme.txt.  
station_schema <- read.table(text = 
  "ID           1-11   Character
  LATITUDE     13-20   Real
  LONGITUDE    22-30   Real
  ELEVATION    32-37   Real
  STATE        39-40   Character
  NAME         42-71   Character
  GSNFLAG      73-75   Character
  HCNFLAG      77-79   Character
  WMOID        81-85   Character", 
  header = FALSE, stringsAsFactors = FALSE)

#Read in the GHCN list of stations from It's big (100,000 lines).
stations<-read.fwf('ghcnd-stations.txt',
                widths = c(11,9,10,7,3,31,4,4,6),stringsAsFactors=F,
                strip.white=TRUE,comment.char="",col.names = tolower(station_schema$V1))

#Find the stations within a bounding box (alternatively could use the 'sp' package to make this spatial, overlay polygons, etc.).
studyarea_stations<-stations[stations$latitude > 41.44 & stations$latitude < 45.97453 & stations$longitude > 104.3587 & stations$longitude < 111.9546,]
#Get the station ids
id_list<-studyarea_stations$id

#Download the files
library(RCurl)
source("get_ghcn.R") #load the function (add filepath if necessary)
FTP<-"ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/all/"
get_ghcn(FTP,id_list) #will put them in your working directory

Convert a GHCN daily climate file to a useable .csv file

Let's assume that you have downloaded a file called "MBM00044354.dly" from the daily FTP directory (either /all or /by_year) into your working directory. The file contains all weather records for the station. Now use the function to convert the data to a reasonable format.

source("parse_GHCN_data.R") #load the function (add filepath if necessary)
#Parse the file
fname<-'MGM00044354.dly' 
df<-parse_GHCN_data(fname)
#Write out the dataframe as a csv file
write.csv(df,"MGM00044354.csv",row.names=FALSE,na="")

That's all there is to it. The main variables are:

  • PRCP = Precipitation (tenths of mm)
  • SNOW = Snowfall(mm)
  • SNWD = Snow depth (mm)
  • TMAX = Maximum temperature (tenths of degrees C)
  • TMIN = Minimum temperature (tenths of degrees C)

but data from some weather stations include many other variables. See GHCN readme for more information, at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt. If you're only interested in one variable, just use regular R subsetting to get it: {r, eval=FALSE}rain<-df[df$variable=="PRCP",] but it's kind of nice having the variables together in the right format so that you can plot them against each other.

parse_ghcn_data's People

Contributors

patagonian avatar jcpayne avatar

Stargazers

Matt Brousil avatar César Alejandro Martínez Ortíz avatar Matúš Chládek avatar

Watchers

James Cloos avatar

Forkers

mbrousil

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.