Giter Site home page Giter Site logo

lterwg-som's Introduction

Soil Organic Matter

Repository for the Soil Organic Matter Synthesis working group

Principal Investigators

Will Wieder and Kate Lajtha

Award Date: September 20, 2017

Description

Soil organic matter is a massive storehouse for carbon, as well as a key regulator of nutrient cycling and soil quality in terrestrial ecosystems, yet ecology lacks a full understanding of the controls on stabilization and breakdown of soil organic matter. Two sets of competing theories underlie models that adequately predict site-specific dynamics, but result in different sets of predictions about the response of soil organic matter to perturbations.

Website

https://lter.github.io/som-website/

lterwg-som's People

Contributors

avnimalhotra avatar brunj7 avatar j-am-moore avatar katerina-georgiou avatar msleckman avatar piersond avatar srearl avatar srweintraub avatar wwieder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lterwg-som's Issues

add min and max values to key file

add expected values to key file, ultimately to be pulled out into a separate lookup table or other resource (but initially added to the key file v2 template for convenience). this will be a QC step in the homogenization script.

add tarball column to filter data by "control" = Y/N

Existing function may be useful. However, may be simple to work directly on tarball --> Use if statements to look for ctl value in levels, if found in any update "Control" column to YES, else NO.
Wondering if I had a reason not to do it this way originally?

some hands-on management may be appropriate

Hi @piersond - I reverted the pierson branch back to its state before I added comments, which I should not have done in the code, so am replying in a issue.

Regarding... By "manually", do you mean adjusting the files in the directory to only contain the ones we need? (i.e. directory should contain 2 files(keykey and csv or sheet). I think this is a good approach and will be simpler in the long run.

Yeah, I think actually going into these directories and addressing some hands-on management if and as needed is probably a reasonable approach. It is less efficient for sure but I worry about simply running these scripts without any sight or feel for the source data could create problems. And a reality is that we cannot possibly address all of the craziness that we will encounter with scripts so we are almost guaranteed to be doing some manual work anyway.

key-key consistency

There seem to be several key-key files out there that are using slightly different variable names. e.g. some of the neon data have c_tot and soc instead of lyr_c_tot and lyr_soc, see example here.
Is this common in many datasets? How can we check and throw an error when a dataset doesn't have the 'right' variable names that we're trying to align?

NEON Litterfall

NEON litterfall data were homoged, but additional complications remain:

  • ANPP actually just fine litterfall, not biomass increment growth for woody biomes;
  • ANPP has already converted automatically from gDW/m2/y (given units) to gC/m2/y (target units) by multiplying values by 0.5;
  • ANPP, not sure how grazing was or was factored in here (@srweintraub )?;
  • ANPP should be sum functionalGroup (L3) for each trapID (L2);
  • Chemistry just measured one time per year (@ max litterfall, and only for needle and leaves), but litter fluxes are collected regularly throughout growing seasonl
  • Chemistry provided as %, not converted in homog!

soc vs. c_tot

How are these two variables in our homog data similar / different?

get_latest_som broken?

I'm not going to spend a ton of time on this, but it seems like this function may be broken (error message below).
Easy enough for me to just pull down the latest tarball, but how should others access it? I'm guessing it would be through EDI once the database is published?

Error in add_id_path(nodes, root_id = root_id, leaf = leaf) : !anyDuplicated(nodes$id) is not TRUE

rename master to main

Trying to rename master to main with the steps below, but unable to remove master branch
I also don't want to mess with workflow from others...
@srearl can you advise?

git branch -m master main
git push -u origin main
git branch -d origin/master
git branch -u main origin/main
git branch -u origin/main main

Fix time series format and units

Currently the time series data is spread across observation_year, observation_year_1 and observation_year_2. Further issue, the format of this data is all over the place. These data need to be converted to a unified format, perhaps as separate columns for year, month, day, full date.

@piersond wrote a script to quick fix the time series data formats. May be a starting place, but is mostly a hack to allow the group to work during week in Santa Barbara.

reset the repo

Hi @piersond - Sorry, meant to write earlier but in meetings all morning. As I began going through your additions/edits, it struck me that we should really be working off branches and doing this correctly with a proper code review. I think we need to take a mulligan and start over; fixing this will be a bit of a mess, but we will get back on track and it will be better going forward. I still need to work in the units conversion and we need to file upload so much yet to integrate.

I put all of the work that you had addressed in the first pull request in a branch called pierson. Unfortunately, I had to squash all of the commits in the process. Getting that code back to a point where we can re-merge that work will be a bit of a hassle but once done, we should be good. That will go something like following where the steps refer to a general workflow further down the message.

  1. in GitHub rename your old forked repo under settings
  2. refork lterwg-som
  3. clone the new repo (that you forked)
  4. checkout pierson branch
  5. copy data processing folder to local computer (Desktop or wherever)
  6. checkout master
  7. create new feature branch (see step 4 and 0c)
  8. copy files from local computer (Desktop or wherever) to directory of your new branch
  9. continue at step 5

Generally, the work flow should then look something like:
general rule: Never commit or merge work into local/master
general rule: Always synchronize your local/master up to upstream/master before branching for new feature work
general rule: Always create a new branch from your synchronized origin/master to commit your new code changes (Suggestion: name your new feature branch: piersond-what-this-fixes)

  1. Make sure local/master is current with origin/master. (origin is your Github repository, not LTER's)
  2. Synchronize your local/master with upstream (LTER)/master. (Follow these instructions: https://help.github.com/articles/syncing-a-fork/)
    If you do this correctly (never commit your changes into local and origin/master), you will always just Fast-Forward your origin up to current with LTER master.
  3. Push local/master to your origin/master (your Github account).
  4. Checkout a new branch from local/master for your new code work
  5. Commit code changes until you are done and ready to push back to LTER.
  6. Checkout local/master and synchronize with upstream/master again. (In case LTER code changes have occurred since the last time you synchronized.)
  7. Merge local/master into your feature branch. (If any file changes were pulled down from upstream.)
  8. Submit Pull Request to merge your feature branch into LTER/master.

Once PR is reviewed and merged into LTER/master, synchronize your local/master with upstream/master to get all your updated changes. Then push your updated local/master to origin/master, so your Github repo is now up-to-date with LTER's.

NEON site with no name?

I've found a NEON site or sites with no "site_name" or "location_name." Need to find a way to track this down to a keykey and data file.

Update givenUnits in keyV2 to match unit conversion in soilHarmonization

There are currently many mismatches. Does not appear to be a homogenization issue, simply an issue for providing the correct information to users across all tables/platforms.

Use unitsConversions %>% print(n = Inf) with the pkg loaded to see the units conversions that the soilHarmonization pkg is using.

Also need to update the units in the key table we're using for the Shiny app.

aligning L1 data?

How are we going to align data within sites?

These are most likely data that were read in from different files and linked via key-key files such that data for soil C and productivity measured in the same 'plots', but are in different rows of our homogenized (L1) data? Do these get merged when we bring together the big tarball, or in post processing?

Aligning data

This follows up on closed #8,

I have renewed terror about how best to handle datasets that need to be aligned.
CDR and HRV are two huge sites with lots of manipulations and datasets where this will need to happen (KNZ is similar, but to a lesser extent).

@srearl you wrote a really slick script to handle this for NutNet (zipped up in that directory). Would similar pre-alignment of the raw data be preferable to trying to do so within the tarball?

modified at timestamp to key file?

considering ways in which we might keep track of if and when a key file was modified, and use that as a factor as to whether a data package should be homogenized

givenUnits for Location data

Is there a standardized conversion to givenUnits for user defined data in locations tab of the key-key file (e.g. mm for precipitation, m for elevation, etc)? Are these conversions being done in the homog script?

experimental treatments

I also realized that when experimental treatment levels are present, we don't have any clear way to distinguish what kind of treatment these are from information in the merged files. This may be in the HMGZD notes, but it's not immediately clear when we query the tarball...

web site

add authorship and projects tab to website (abstract for 'proposed' papers)

test shiny app for data entry

create a proof of concept how it could work with Shiny

Julien will ask a data intern to work on a first pass and then sync with Dereck

NEON Roots & productivity

@srweintraub , maybe you already knew this, but I'm a little slow. It looks like there's another step needed to match variables from multiple sheets / measurement across root and litterfall data from NEON. There's an example of this in your code that generates the 'master_database' here.

# Each dataframe has few actual data variables, with lots of metadata

seems like ANPP and root biomass estimates may be pretty useful- and are thus far sparse in the database. Would you be willing to help out with this one?

units homogenization error

suggestions on this error? Error in googleDirData[[i]][[dataCol]] * PDU_UCP[PDU_UCP$var == dataCol, : non-numeric argument to binary operator in HF/Prospect_hill warming

getting in the SWaN directory for Harvard forest too?

negative SOC stocks

There are 154 observations of negative SOC stocks. Which sites are these? Can we fix it?

EDIutils 0.0.0.9000

Hello,

EDIutils has undergone a major refactor for submission to rOpenSci and CRAN. This new and improved version covers the full data repository REST API, handles authentication more securely, better matches API call and result syntax, improves documentation, and opens the door for development of wrapper functions to support common data management tasks. In the process of this refactor the function names and call patterns have changed and several functions supporting other EDI R packages have been removed, thereby creating back compatibility breaking changes with the previous major release (version 1.6.1). The previous version will be available until 2022-06-01 on the deprecated branch. Install the previous version with:

remotes::install_github("EDIorg/EDIutils", ref = "deprecated")

EDIutils functions used in your code and suggested replacements

  • Replace api_get_provenance_metadata() with get_provenance_metadata()

Please let me know if you have any questions,
@clnsmth

rendering latex

I noticed that tinytex doesn't like to render δ in 13C or 15N data. I've change the Var_long column to replace δ with a d to avoid this issue, seems like we'll be OK, but maybe it's worth modifying master going forward? @srearl

! Package inputenc Error: Unicode character δ (U+3B4)
(inputenc) not set up for use with LaTeX.

Tarball errors, location & site code data

Location data missing from BES, SSCZO and undetermined NEON site (and dataset).
Site code missing from New England Pastures and the same undetermined NEON site.
NE pastures also has no network

additional calculations

Can we provide some simple scripts to provide to users to fill in data in the tarball or the smaller datasets they pull down from the shiny ap? These could include:

  • Climate data (from LTER site info, or based on lat-lon & world clim or other datasets),
  • Calculated variables (C:N, stocks [when BD and lyr_soc are provided], layer_mid [from layer_top & _bottom], others)?

I'm inclined to merge these calculated data with the column where they belong to avoid the proliferation of column, but maybe we should provide an additional 'flag' to identify where we're calculating these values from the raw data provided.

It would be good to discuss the workflow for this and when (or if) users would interface with these
tools.

QC modifications

Do we need tighter quality control for data generated by homog script to avoid subsequent analysis issues?

  • non-decimal degrees in lat-lon

  • strings and weird characters

  • force numerical data types where expected

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.