Comments (11)
My first thought is to perhaps offer several different versions of the database, depending on the user's needs. For example, one version that downloads just what is needed to generate the standardized species - specific abundance data (as we did for METE), one version that includes all data (or just abundance?) for sites that have been surveyed a minimum of X times for time series analyses, one version that includes carbon and biomass data, etc. This is not an ideal solution, but could help limit the data required for download while making the nature of the omissions clearer.
from retriever.
ahh github...this is new for me. Ethan you are so cutting edge. So you have the nuts and bolts in that list, but there are some things missing that I might want and probably others would like this info easily/quickly. One is there is a soils field or two that a plant ecologist would want. We can of course extract bioclim type variables from the plot location by ourselves, but there is actual soil data taken from the site by the FIA and recorded in the db. The others that are important in my view are DIA and HT (diameter and height). These columns themselves are useful for obvious reasons (size distributions, biomass estimates, individual level trait data, etc), but their are accompanying columns to these that tell you how these measures were taken or estimated. For example sometimes height is a direct measurement, sometimes it is estimated by diameter allometry (i think) and perhaps even by eye in some cases. The FIA represents each type of measure using an integer. SO if you wanted to do anything with that data it is essential you know how the data were recorded in the field.
from retriever.
I second Nate's comments. For my own purposes, I'll need to be able to ID not only unique sites but also unique individual trees, with DBH data (and height, method of height estimate). This might be an example of the greatest amount of detail that someone might want to get out of FIA...I'd second fluby's comment that it might be best for the user to be able to choose how much detail they extract.
from retriever.
Thanks to everyone for the feedback (both here and via email). Here's a very rough first draft of a database structure that should contain most things that folks have mentioned broken out into a reasonable set of tables. Let me know what I've missed.
Sites:
Site ID
Shared location information (info that is shared by all plots in a site, e.g. State, County, an averaged lat and long)
Plots:
Unique Plot ID (combining the Site ID and Plot Number)
Site ID
Plot Number
Location Info that differs among plots (Lat, Long, etc.)
Substrate
Elevation
Aspect
Slope
Soils
Trees:
Plot ID
Year
Species
Diameter
Height
Measurement Method Info
I could also see adding some of the biomass/carbon fields here if folks think that's really valuable
Survey:
Site ID
Year
Standardized Inventory (Y or N)
from retriever.
I also like Kate's idea of multiple versions, but I'm not sure that doing it for individual datasets is the best way to start. If there's a clear need for a specific dataset in a specific widely used format down the line we'd be happy to work on it. I just start to worry about ending up with BBS-time-series, FIA-time-series, MCDB-time-series... We're definitely planning on doing some combined datasets down the line (assemblies of time-series data, abundance data, etc.), which might address a lot of the same goals.
from retriever.
You may run into some issues making the plot table- most interesting plot level variables (FORTYPCD, STDAGE, STDORGCD, SLOPE, ASPECT, PHYSCLCD, DSTRBCD, TRTCD ) are recorded as "conditions" which are then linked to subplots, so that a plot may have multiple conditions. In my own work I've been taking the condition numbered "1" as the condition for the plot, but there is probably a better way to go about this.
The plot table should also include whether the subplots were sampled using a "macroplot" rather than the standard radius plot and if so, what the threshold diameter was for sampling tree in the macroplot. Standard inventories are still sometimes sampled with a marcoplot.
I would also like to see trees recorded in the seedling table put into the tree table with a column indicating the they are seedlings- not sure how many other folks are looking at seedling data, though.
In the long run, it would be cool to link up some of the Phase 3 sampling- I've been working with the lichen data, but they also appear to have stuff on woody debris and vegetation. This is also where the good soil data it located.
Also, were you planning on having the retriever automatically remove non-sampled plots? If not, it would be good to include fields that would allow users to do so.
Another convenience (but not a necessity) would be for the retriever to convert species codes (SPCD) into three columns: family/genus/species.
Lastly- beware the strange subplot numbers on plots sampled using the standard design. There should only be four subplots numbered 1-4, but I frequently found numbers such as 101-104, 201-204. When I talked with the FIA- they told me to ignore subplots not numbered 1-4 if working with aggregated plot-level data. Not sure if you are going to have the retriever do this, but it might be work checking with FIA as for the best practice. I've found them to be pretty responsive.
from retriever.
Thanks Jes! Those are all excellent suggestions. I do have an FIA contact hear at USU and will definitely run all of this by him down the line as well as check with him on his thoughts about the complexities that you've mentioned.
from retriever.
Chris Woodall ([email protected]) with the FS has kindly offered to answer any questions we have along them way (putting here as a note to self).
from retriever.
Hi @ethanwhite,
I'm currently working on a project called QUICC-FOR with Dominique Gravel. To calibrate and validate models, I'm developping a postgreSQL database linking several forest plots network "database" (FIA, Quebec, Ontario, New-Brunswick) into one final relational database. One advantage using postgreSQL is that I can easily intercept the plots locations with rasters of the past, present and future climate.
Now (and more related to the initial topic), to reach this goal, I simplified the structure of all databases into a simplier relational design. You will find the structure here. When I'll be done, I'm pretty sure I'll be able to give you access to the simplified version of the FIA database.
I wish, one day I could provide you access to the rest of the databases (CAN part), but all of them
are under agreement with the ministeries. Canada is far away to have open data policy as US, unfortunatly.
If you're interested, let me know !
Cheers,
from retriever.
Hi @SteveViss - that sounds like a great resource! I certainly appreciate the challenges of working data under agreements that don't allow sharing.
It would be great if you could keep us up to date with what you're doing. We could either access the data directly from you if your allowed to redistribute (and are OK funding the bandwidth) or mirror the structure with the Retriever if we get around to implementing the ability to do that.
Definitely looking forward to seeing what you get up to, both wrt to data and the science that comes from it.
from retriever.
Closing since we've gotten the necessary feedback and this sort of thing will be implemented in a new project we're working on. Thanks again to everyone for their input!
from retriever.
Related Issues (20)
- Enable subsetting / clipping the spatial dataset to smaller extents HOT 3
- API research for API integration in Data Retriever (GSoC '21)
- Add a default bounding box for usgs-elevation
- Retriever doesn't detect new python scripts HOT 1
- Add RDatasets
- Tidycensus dataset doesn't work with the download and install csv commands. HOT 3
- Make sure that the the R api dataset are run on the retrieverdash
- Add new functions to rdataretriever and Retriever.jl
- Excel xlsx file; not supported HOT 9
- Update codecov to action stage in workflows HOT 2
- not able to use gdal==3.3.2 while working with ".shp" files HOT 2
- Improve test coverage HOT 6
- display_all_rdatasets_names in rdatasets takes a list of package_name HOT 4
- Create breeding bird survey for all releases. HOT 4
- Downloading fails for files with no Content-Disposition HOT 1
- Retriever should gracefully fail if there is no internet. HOT 2
- hacktoberfest guide
- Installation from source fails due to missing configuration HOT 6
- Installation failing on Python 3.12 due to removal of imp package HOT 1
- Test and update Bioclim data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from retriever.