lter / lterwg-som Goto Github PK

View Code? Open in Web Editor NEW

8.0 19.0 6.0 98.69 MB

Soil Organic Matter Synthesis working group

Home Page: https://lter.github.io/som-website/

R 9.90% HTML 90.10%

lter-science synthesis-working-group

lterwg-som's Introduction

Soil Organic Matter

Repository for the Soil Organic Matter Synthesis working group

Principal Investigators

Will Wieder and Kate Lajtha

Award Date: September 20, 2017

Description

Soil organic matter is a massive storehouse for carbon, as well as a key regulator of nutrient cycling and soil quality in terrestrial ecosystems, yet ecology lacks a full understanding of the controls on stabilization and breakdown of soil organic matter. Two sets of competing theories underlie models that adequately predict site-specific dynamics, but result in different sets of predictions about the response of soil organic matter to perturbations.

Website

https://lter.github.io/som-website/

lterwg-som's People

Contributors

Stargazers

Watchers

Forkers

msleckman brunj7 katerina-georgiou swood-ecology maiermike jianlingfan

lterwg-som's Issues

add min and max values to key file

add expected values to key file, ultimately to be pulled out into a separate lookup table or other resource (but initially added to the key file v2 template for convenience). this will be a QC step in the homogenization script.

add tarball column to filter data by "control" = Y/N

Existing function may be useful. However, may be simple to work directly on tarball --> Use if statements to look for ctl value in levels, if found in any update "Control" column to YES, else NO.
Wondering if I had a reason not to do it this way originally?

some hands-on management may be appropriate

Hi @piersond - I reverted the pierson branch back to its state before I added comments, which I should not have done in the code, so am replying in a issue.

Regarding... By "manually", do you mean adjusting the files in the directory to only contain the ones we need? (i.e. directory should contain 2 files(keykey and csv or sheet). I think this is a good approach and will be simpler in the long run.

Yeah, I think actually going into these directories and addressing some hands-on management if and as needed is probably a reasonable approach. It is less efficient for sure but I worry about simply running these scripts without any sight or feel for the source data could create problems. And a reality is that we cannot possibly address all of the craziness that we will encounter with scripts so we are almost guaranteed to be doing some manual work anyway.

key-key consistency

There seem to be several key-key files out there that are using slightly different variable names. e.g. some of the neon data have c_tot and soc instead of lyr_c_tot and lyr_soc, see example here.
Is this common in many datasets? How can we check and throw an error when a dataset doesn't have the 'right' variable names that we're trying to align?

fix searl orcid next upversion of edi.521

Add fix for date input in keykey, convert to Y-M-D in HMGZD data, fail QC if no YYYY

NEON Litterfall

NEON litterfall data were homoged, but additional complications remain:

ANPP actually just fine litterfall, not biomass increment growth for woody biomes;
ANPP has already converted automatically from gDW/m2/y (given units) to gC/m2/y (target units) by multiplying values by 0.5;
ANPP, not sure how grazing was or was factored in here (@srweintraub )?;
ANPP should be sum functionalGroup (L3) for each trapID (L2);
Chemistry just measured one time per year (@ max litterfall, and only for needle and leaves), but litter fluxes are collected regularly throughout growing seasonl
Chemistry provided as %, not converted in homog!

soc vs. c_tot

How are these two variables in our homog data similar / different?

get_latest_som broken?

I'm not going to spend a ton of time on this, but it seems like this function may be broken (error message below).
Easy enough for me to just pull down the latest tarball, but how should others access it? I'm guessing it would be through EDI once the database is published?

Error in add_id_path(nodes, root_id = root_id, leaf = leaf) : !anyDuplicated(nodes$id) is not TRUE

rename master to main

Trying to rename master to main with the steps below, but unable to remove master branch
I also don't want to mess with workflow from others...
@srearl can you advise?

git branch -m master main
git push -u origin main
git branch -d origin/master
git branch -u main origin/main
git branch -u origin/main main

Fix time series format and units

Currently the time series data is spread across observation_year, observation_year_1 and observation_year_2. Further issue, the format of this data is all over the place. These data need to be converted to a unified format, perhaps as separate columns for year, month, day, full date.

@piersond wrote a script to quick fix the time series data formats. May be a starting place, but is mostly a hack to allow the group to work during week in Santa Barbara.

reset the repo

Hi @piersond - Sorry, meant to write earlier but in meetings all morning. As I began going through your additions/edits, it struck me that we should really be working off branches and doing this correctly with a proper code review. I think we need to take a mulligan and start over; fixing this will be a bit of a mess, but we will get back on track and it will be better going forward. I still need to work in the units conversion and we need to file upload so much yet to integrate.

I put all of the work that you had addressed in the first pull request in a branch called pierson. Unfortunately, I had to squash all of the commits in the process. Getting that code back to a point where we can re-merge that work will be a bit of a hassle but once done, we should be good. That will go something like following where the steps refer to a general workflow further down the message.

in GitHub rename your old forked repo under settings
refork lterwg-som
clone the new repo (that you forked)
checkout pierson branch
copy data processing folder to local computer (Desktop or wherever)
checkout master
create new feature branch (see step 4 and 0c)
copy files from local computer (Desktop or wherever) to directory of your new branch
continue at step 5

Generally, the work flow should then look something like:
general rule: Never commit or merge work into local/master
general rule: Always synchronize your local/master up to upstream/master before branching for new feature work
general rule: Always create a new branch from your synchronized origin/master to commit your new code changes (Suggestion: name your new feature branch: piersond-what-this-fixes)

Make sure local/master is current with origin/master. (origin is your Github repository, not LTER's)
Synchronize your local/master with upstream (LTER)/master. (Follow these instructions: https://help.github.com/articles/syncing-a-fork/)
If you do this correctly (never commit your changes into local and origin/master), you will always just Fast-Forward your origin up to current with LTER master.
Push local/master to your origin/master (your Github account).
Checkout a new branch from local/master for your new code work
Commit code changes until you are done and ready to push back to LTER.
Checkout local/master and synchronize with upstream/master again. (In case LTER code changes have occurred since the last time you synchronized.)
Merge local/master into your feature branch. (If any file changes were pulled down from upstream.)
Submit Pull Request to merge your feature branch into LTER/master.

Once PR is reviewed and merged into LTER/master, synchronize your local/master with upstream/master to get all your updated changes. Then push your updated local/master to origin/master, so your Github repo is now up-to-date with LTER's.

NEON site with no name?

I've found a NEON site or sites with no "site_name" or "location_name." Need to find a way to track this down to a keykey and data file.

rehomog arc2

@wwieder - could you please rehomog ARC2 (https://drive.google.com/open?id=1aPus8aUW3jajr5kh3-9DYLiP8g4KqQgJ), it is ready to go just need to hit go. Thank you!

Update givenUnits in keyV2 to match unit conversion in soilHarmonization

There are currently many mismatches. Does not appear to be a homogenization issue, simply an issue for providing the correct information to users across all tables/platforms.

Use unitsConversions %>% print(n = Inf) with the pkg loaded to see the units conversions that the soilHarmonization pkg is using.

Also need to update the units in the key table we're using for the Shiny app.

aligning L1 data?

How are we going to align data within sites?

These are most likely data that were read in from different files and linked via key-key files such that data for soil C and productivity measured in the same 'plots', but are in different rows of our homogenized (L1) data? Do these get merged when we bring together the big tarball, or in post processing?

include metadata for newly published datasets

scm czo

Aligning data

This follows up on closed #8,

I have renewed terror about how best to handle datasets that need to be aligned.
CDR and HRV are two huge sites with lots of manipulations and datasets where this will need to happen (KNZ is similar, but to a lesser extent).

@srearl you wrote a really slick script to handle this for NutNet (zipped up in that directory). Would similar pre-alignment of the raw data be preferable to trying to do so within the tarball?

Script to calc site level mean SOC and standard dev

How do we do this with the variable experimental structures? missing data? outliers? etc.

Create lookup table to constrain keykey user inputs

modified at timestamp to key file?

considering ways in which we might keep track of if and when a key file was modified, and use that as a factor as to whether a data package should be homogenized

givenUnits for Location data

Is there a standardized conversion to givenUnits for user defined data in locations tab of the key-key file (e.g. mm for precipitation, m for elevation, etc)? Are these conversions being done in the homog script?

check for errors in lit c data - some lter sites reporting extremely high values

experimental treatments

I also realized that when experimental treatment levels are present, we don't have any clear way to distinguish what kind of treatment these are from information in the merged files. This may be in the HMGZD notes, but it's not immediately clear when we query the tarball...

content of key files in a version controlled system?

web site

add authorship and projects tab to website (abstract for 'proposed' papers)

addressing inconsistent lat long formats

we need to pull out information on the experimental levels (block/plot/rep) from the dropdown choices already provided by data providers

test shiny app for data entry

create a proof of concept how it could work with Shiny

Julien will ask a data intern to work on a first pass and then sync with Dereck

rename `^bd_` to remove special char

tarball %>%
rename('bd_methods_notes' = ^bd_) %>%
colnames() %>%
sort()

Check about Google Vault Snapshot the drive before updating keykey v2

Ask Nick about that option

NEON Roots & productivity

@srweintraub , maybe you already knew this, but I'm a little slow. It looks like there's another step needed to match variables from multiple sheets / measurement across root and litterfall data from NEON. There's an example of this in your code that generates the 'master_database' here.

lterwg-som/data-processing/getAndWrangleNEONSoilData.R

Line 210 in 3a74bc4

# Each dataframe has few actual data variables, with lots of metadata

seems like ANPP and root biomass estimates may be pretty useful- and are thus far sparse in the database. Would you be willing to help out with this one?

Adding git update branch to tutorial

Adding how to delete branch and update branch from the command line

repo: https://lter.github.io/som-website/

units homogenization error

suggestions on this error? Error in googleDirData[[i]][[dataCol]] * PDU_UCP[PDU_UCP$var == dataCol, : non-numeric argument to binary operator in HF/Prospect_hill warming

getting in the SWaN directory for Harvard forest too?

Update "Tarball-V2_control data_filtering.R" to handle csv control values

Add functionality to script that will isolate data rows based on multiple control values when given comma separated value in the keykey control_value_id field.

Create Shiny app to handle keykey data entry and first stage QC

SHINY app for key-key and file upload (Derek's Sunday night project)
--> Step towards long-term keykey use, off Google Drive

Intern help from Julien @ NCEAS
Shiny dashboard? (# data contributed, #data points , etc)

NutNet join script duplicated chemistry values across years

Need to fix NutNet join script to properly carry over chemistry (e.g. lyr_soc) for each sample year. Right now, the values for all chem analytes (columns) are duplicated across years.

negative SOC stocks

There are 154 observations of negative SOC stocks. Which sites are these? Can we fix it?

accessing the latest tarball locally

@srweintraub - here is a script for downloading and importing the latest tarball:

https://github.com/lter/lterwg-som/blob/master/data-processing/get_latest_som.R

EDIutils 0.0.0.9000

Hello,

EDIutils has undergone a major refactor for submission to rOpenSci and CRAN. This new and improved version covers the full data repository REST API, handles authentication more securely, better matches API call and result syntax, improves documentation, and opens the door for development of wrapper functions to support common data management tasks. In the process of this refactor the function names and call patterns have changed and several functions supporting other EDI R packages have been removed, thereby creating back compatibility breaking changes with the previous major release (version 1.6.1). The previous version will be available until 2022-06-01 on the deprecated branch. Install the previous version with:

remotes::install_github("EDIorg/EDIutils", ref = "deprecated")

EDIutils functions used in your code and suggested replacements

Replace api_get_provenance_metadata() with get_provenance_metadata()

Please let me know if you have any questions,
@clnsmth

rendering latex

I noticed that tinytex doesn't like to render δ in 13C or 15N data. I've change the Var_long column to replace δ with a d to avoid this issue, seems like we'll be OK, but maybe it's worth modifying master going forward? @srearl

! Package inputenc Error: Unicode character δ (U+3B4)
(inputenc) not set up for use with LaTeX.

Tarball errors, location & site code data

Location data missing from BES, SSCZO and undetermined NEON site (and dataset).
Site code missing from New England Pastures and the same undetermined NEON site.
NE pastures also has no network

additional calculations

Can we provide some simple scripts to provide to users to fill in data in the tarball or the smaller datasets they pull down from the shiny ap? These could include:

Climate data (from LTER site info, or based on lat-lon & world clim or other datasets),
Calculated variables (C:N, stocks [when BD and lyr_soc are provided], layer_mid [from layer_top & _bottom], others)?

I'm inclined to merge these calculated data with the column where they belong to avoid the proliferation of column, but maybe we should provide an additional 'flag' to identify where we're calculating these values from the raw data provided.

It would be good to discuss the workflow for this and when (or if) users would interface with these
tools.

QC modifications

Do we need tighter quality control for data generated by homog script to avoid subsequent analysis issues?

non-decimal degrees in lat-lon
strings and weird characters
force numerical data types where expected