The FIA database is an awesome resource, but it is overwhelmingly complex requiring a

Chris Woodall ( cwoodall@fs.fed.us ) with the FS

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

What should a simplified version of the FIA database look like? about retriever HOT 11 CLOSED

weecology commented on July 2, 2024

What should a simplified version of the FIA database look like?

from retriever.

Comments (11)

fluby commented on July 2, 2024

My first thought is to perhaps offer several different versions of the database, depending on the user's needs. For example, one version that downloads just what is needed to generate the standardized species - specific abundance data (as we did for METE), one version that includes all data (or just abundance?) for sites that have been surveyed a minimum of X times for time series analyses, one version that includes carbon and biomass data, etc. This is not an ideal solution, but could help limit the data required for download while making the nature of the omissions clearer.

from retriever.

nateswenson commented on July 2, 2024

ahh github...this is new for me. Ethan you are so cutting edge. So you have the nuts and bolts in that list, but there are some things missing that I might want and probably others would like this info easily/quickly. One is there is a soils field or two that a plant ecologist would want. We can of course extract bioclim type variables from the plot location by ourselves, but there is actual soil data taken from the site by the FIA and recorded in the db. The others that are important in my view are DIA and HT (diameter and height). These columns themselves are useful for obvious reasons (size distributions, biomass estimates, individual level trait data, etc), but their are accompanying columns to these that tell you how these measures were taken or estimated. For example sometimes height is a direct measurement, sometimes it is estimated by diameter allometry (i think) and perhaps even by eye in some cases. The FIA represents each type of measure using an integer. SO if you wanted to do anything with that data it is essential you know how the data were recorded in the field.

from retriever.

mekevans commented on July 2, 2024

I second Nate's comments. For my own purposes, I'll need to be able to ID not only unique sites but also unique individual trees, with DBH data (and height, method of height estimate). This might be an example of the greatest amount of detail that someone might want to get out of FIA...I'd second fluby's comment that it might be best for the user to be able to choose how much detail they extract.

from retriever.

ethanwhite commented on July 2, 2024

Thanks to everyone for the feedback (both here and via email). Here's a very rough first draft of a database structure that should contain most things that folks have mentioned broken out into a reasonable set of tables. Let me know what I've missed.

Sites:
Site ID
Shared location information (info that is shared by all plots in a site, e.g. State, County, an averaged lat and long)

Plots:
Unique Plot ID (combining the Site ID and Plot Number)
Site ID
Plot Number
Location Info that differs among plots (Lat, Long, etc.)
Substrate
Elevation
Aspect
Slope
Soils

Trees:
Plot ID
Year
Species
Diameter
Height
Measurement Method Info
I could also see adding some of the biomass/carbon fields here if folks think that's really valuable

Survey:
Site ID
Year
Standardized Inventory (Y or N)

from retriever.

ethanwhite commented on July 2, 2024

I also like Kate's idea of multiple versions, but I'm not sure that doing it for individual datasets is the best way to start. If there's a clear need for a specific dataset in a specific widely used format down the line we'd be happy to work on it. I just start to worry about ending up with BBS-time-series, FIA-time-series, MCDB-time-series... We're definitely planning on doing some combined datasets down the line (assemblies of time-series data, abundance data, etc.), which might address a lot of the same goals.

from retriever.

jescoyle commented on July 2, 2024

You may run into some issues making the plot table- most interesting plot level variables (FORTYPCD, STDAGE, STDORGCD, SLOPE, ASPECT, PHYSCLCD, DSTRBCD, TRTCD ) are recorded as "conditions" which are then linked to subplots, so that a plot may have multiple conditions. In my own work I've been taking the condition numbered "1" as the condition for the plot, but there is probably a better way to go about this.

The plot table should also include whether the subplots were sampled using a "macroplot" rather than the standard radius plot and if so, what the threshold diameter was for sampling tree in the macroplot. Standard inventories are still sometimes sampled with a marcoplot.

I would also like to see trees recorded in the seedling table put into the tree table with a column indicating the they are seedlings- not sure how many other folks are looking at seedling data, though.

In the long run, it would be cool to link up some of the Phase 3 sampling- I've been working with the lichen data, but they also appear to have stuff on woody debris and vegetation. This is also where the good soil data it located.

Also, were you planning on having the retriever automatically remove non-sampled plots? If not, it would be good to include fields that would allow users to do so.

Another convenience (but not a necessity) would be for the retriever to convert species codes (SPCD) into three columns: family/genus/species.

Lastly- beware the strange subplot numbers on plots sampled using the standard design. There should only be four subplots numbered 1-4, but I frequently found numbers such as 101-104, 201-204. When I talked with the FIA- they told me to ignore subplots not numbered 1-4 if working with aggregated plot-level data. Not sure if you are going to have the retriever do this, but it might be work checking with FIA as for the best practice. I've found them to be pretty responsive.

from retriever.

ethanwhite commented on July 2, 2024

Thanks Jes! Those are all excellent suggestions. I do have an FIA contact hear at USU and will definitely run all of this by him down the line as well as check with him on his thoughts about the complexities that you've mentioned.

from retriever.

ethanwhite commented on July 2, 2024

Chris Woodall ([email protected]) with the FS has kindly offered to answer any questions we have along them way (putting here as a note to self).

from retriever.

SteveViss commented on July 2, 2024

Hi @ethanwhite,

I'm currently working on a project called QUICC-FOR with Dominique Gravel. To calibrate and validate models, I'm developping a postgreSQL database linking several forest plots network "database" (FIA, Quebec, Ontario, New-Brunswick) into one final relational database. One advantage using postgreSQL is that I can easily intercept the plots locations with rasters of the past, present and future climate.

Now (and more related to the initial topic), to reach this goal, I simplified the structure of all databases into a simplier relational design. You will find the structure here. When I'll be done, I'm pretty sure I'll be able to give you access to the simplified version of the FIA database.

I wish, one day I could provide you access to the rest of the databases (CAN part), but all of them
are under agreement with the ministeries. Canada is far away to have open data policy as US, unfortunatly.

If you're interested, let me know !
Cheers,

from retriever.

ethanwhite commented on July 2, 2024

Hi @SteveViss - that sounds like a great resource! I certainly appreciate the challenges of working data under agreements that don't allow sharing.

It would be great if you could keep us up to date with what you're doing. We could either access the data directly from you if your allowed to redistribute (and are OK funding the bandwidth) or mirror the structure with the Retriever if we get around to implementing the ability to do that.

Definitely looking forward to seeing what you get up to, both wrt to data and the science that comes from it.

from retriever.

ethanwhite commented on July 2, 2024

Closing since we've gotten the necessary feedback and this sort of thing will be implemented in a new project we're working on. Thanks again to everyone for their input!

from retriever.

What should a simplified version of the FIA database look like? about retriever HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent