terraref / reference-data Goto Github PK
View Code? Open in Web Editor NEWCoordination of Data Products and Standards for TERRA reference data
Home Page: https://terraref.org
License: BSD 3-Clause "New" or "Revised" License
Coordination of Data Products and Standards for TERRA reference data
Home Page: https://terraref.org
License: BSD 3-Clause "New" or "Revised" License
Need to re-process existing environmental logger files per #26
something like:
for file in /projects/arpae/terraref/raw_data/ua-mac/EnvironmentLogger/*/*_enviromentlogger.json;
do
cp $file $file.backup
echo "[$(cat $file)]" > $file
sed -i 's/}{/},{/g' $file
rename enviroment environment $file
done
It may not be the most efficient, but it seems to work.
When everything checks out, we can rm *.backup
@yanliu-chn talked today about creating 3 custom SRS's for this project that I didn't capture very well in my notes:
Could you say a tiny bit about these? I wanted to create an issue to capture that. For my own understanding, I found this:
http://geeohspatial.blogspot.com/2013/03/custom-srss-in-postgis-and-qgis.html
http://daniel-azuma.com/articles/georails/part-9
...where ultimately we might simply need to define our custom SRID in PostGIS with something like:
INSERT into spatial_ref_sys (srid, auth_name, auth_srid, proj4text, srtext) values ( 96703, 'sr-org', 6703, '+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs', 'PROJCS["USA_Contiguous_Albers_Equal_Area_Conic_USGS_version",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Albers"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-96.0],PARAMETER["Standard_Parallel_1",29.5],PARAMETER["Standard_Parallel_2",45.5],PARAMETER["Latitude_Of_Origin",23.0],UNIT["Meter",1.0]]');
(note that the authority name and ID are not required if the system is not specified by an outside authority like EPSG)
@robkooper
We could then define columns in Clowder PostGIS for each of the custom SRIDs - a setup script could check for some custom definitions file in Clowder, insert those into PostGIS if found, and change how the datapoints/etc. tables are defined accordingly. The geog column (which is currently the only one) could still be SRID 4326 and that can always be present, but then we could also have geog_96703 or other SRIDs for project-specific SRS's.
So for Terra we might end up with geog, geog_999001, geog_999002, geog_999003 where the 99900x are our 3 SRIDs. Then we require data to be submitted in one of the 4 reference systems. If a new sensor emerges with a new one, we add a column for it.
This is mostly thinking out loud... maybe I'm way off.
An initial draft of calibration protocols: https://docs.google.com/document/d/132_dkGAIQJ3cG7bQkPIkX7-RgXyWLDoQWJFDj5c-5uU/edit#
For each sensor, define:
(by Mike Gore and Elodie Gazave)
Genomic data have reached a high level of standardization in the scientific community. Below are the most widely accepted formats that are relevant to the data and analyses that will be generated in this project.
Today, all high-impact journals typically ask the author to deposit their genomic data in either or both of these databases before publication.
Raw reads + quality scores are stored in FASTQ format. FASTQ files can be manipulated for QC with FASTX-Toolkit
Reference genome assembly (for alignment of reads or BLAST) is in FASTA format. FASTA files generally need indexing and formatting that can be done by aligners, BLAST, or other applications that provide built-in commands for this purpose.
Sequence alignments are in BAM format – in addition to the nucleotide sequence, the BAM format contains fields to describe mapping and read quality. BAM files are binary files but can be visualized with IGV. If needed, BAM can be converted in SAM (text file) with SAMtools
BAM is the preferred format for sra database (sequence read archive).
Not part of milestone, though some groups may wish to do this. Will implement if / when needed. Reference sequences will be 17x, (perhaps not enough for de novo alignment).
SNP and genotype variants are in VCF format. VCF contains all information about read mapping and SNP and genotype calling quality. VCF files are typically manipulated with vcftools
VCF format is also the format required by dbSNP, the largest public repository all SNPs.
Genomic coordinates are given in a BED format – gives the start and end positions of a feature in the genome (for single nucleotides, start = end). http://www.ensembl.org/info/website/upload/bed.html BED files can be manipulated with bedtools
Genome annotations are in GFF format GFF format contains genes and other genomic features. Allows “track” info for visualization http://useast.ensembl.org/info/website/upload/gff.html
Gbrowse is a comprehensive database + interactive web application for manipulating and displaying annotations on genomes.
Analysis tools that all use SNP data (vcf) as input.
Submit variant calls to Phytozome, has embedded Jbrowse
It would be great to have daily historical weather data (to include t_min, t_max, and precipitation) for Maricopa (or a nearby location) available somewhere. I will use this to build a weather model to base the regression on.
Vocabularies to support and framework for relating synonyms across vocabularies, enable to search datasets with translation across vocabularies
Vocabularies to support:
Related issues:
See also:
EarthCube GeoSemantics project
Currently the files only contain the start position. We don't have the width of the line and we don't have the location of each line.
Presumably the gantry knows the x location where each line is triggered.
Additional questions:
@solmazhajmohammadi and @FlyingWithJerome and @czender please update / clarify / revise this issue
@markus-radermacher-lemnatec suggestions?
The standards committee requested information about the specification of sensors and their location on the gantry. This information is spread across multiple documents. Some could be shared in their entirety, others may need to be modified to have proprietary information redacted. Currently, we do not have a way of identifying the intended audience (e.g. some information that can be shared publicly is in the same folder or document as proprietary information).
We should create a simple protocol for identifying the intended audience and any restrictions on sharing and reuse.
We should first address the clear identification of any restrictions on sharing (Part 1). The mechanisms for controlling access (Part 2) are built into tools we use (Box, Google Drive, computer file systems).
Identifying the intended audience could be simplified by creating folders named "private", "internal", "shared", etc. But it would also be useful to clearly identify the intended audience in a file named README, COPYRIGHT, or LICENSE that indicates the owner, intended audience, and conditions for reuse.
There should also be an option to embargo some materials, e.g. where information required to set up the data pipeline might not be needed by pipeline users for a few years.
We define as private anything that has not been clearly marked for sharing. The owner / creator of information defines the audience.
There is a subset of this information that will need to be made public before the data can be interpreted and reported in a scientific publication.
At this time, the standards committee (and others program wide) have requested
A short-term solution will be for a Lemnatec representative to identify and relocate information that can be shared publicly into a folder marked shared on box, such as https://uofi.box.com/terraref-box-shared. Internal documents can be kept where they are or placed in a subdirectory of a folder such as https://uofi.box.com/terraref-internal (e.g. a 'lemnatec' folder could be owned/adminstered by lemnatec)
Note below is a proposal, though worded as a declaration. Please indicate what needs to be changed, what could be changed, what is just right, and what really doesn't matter as long as it is clearly defined.
key: value
;
sensorname/YYYY-MM-DD/HH-MM-SStz/
)The current proposal is to use key-value pair json files for metadata, each data file and will be in a separate directory, alongside an info.json
meta-data file. The meta-data will be comprehensive and redundant in order to be portable.
We will later extract meta-data from the metadata files into a searchable database (BETYdb).
Meta-data should contain anything that is not 'default' about the sensor, e.g. anything that can be set by the user (including position, optics, etc.). @robkooper suggested using url to point to some resources (Rob please clarify).
There has been some discussion of using json vs. xml. json is more flexible and preferred among many developers although XML formats can be more strictly defined using 'dtd'. Notably, the Postgres hstore extension provides a hstore
data type and hstore_to_json
and
hstore_to_jsonb
functions (from "PostGIS in Action" 2nd
edition).
The scope of information that would be provided by an RGB camera is in this gist that contains the original proposal (info.xml
, and other structures (info2.xml
, info2.json
)
The directory structure will be nested by date then sensor, and divided into 'location independent' data like met:
/Date/Met/met.csv
/Sensorname/Date/Time/data.raw
Each raw datafile will be accompanied by a metadata file, e.g. info.json
or info.xml
metadata file.
This is based on a file system organization of all data. Examples on Box: https://gist.github.com/dlebauer/4a20f0d4512bbe2d1cdb
The folder structure can be accessed and is intuitive for users. The data within the folder structure is organized so that each sensor system stores an independent unit of data. This way, the different sensor systems stay independent from each other. The folder structure may be provided through an ftp server, a simple, robust, and widely used protocol.
definition: Sensors like cameras that fetches data only during an imaging job of the gantry system.
In the folder "locationDependent" you find a sub folder for each camera/sensor unit. The next sub folder structure divides the measurements by date (see folder "VisCamera"). For each measurement within a day an new folder is created that carries the timestamp. Within that folder you find a info.xml
file containing all meta data and a data.XXX
file that contains the raw binary data, as provided by the different sensors. The info.xml
file is supposed to contain the full set of meta data required to interpret a specific measurement like camera type, camera settings, timestamp, location... (e.g. info.xml her. All meta data files will have the same layout. The binary data format depends on the different sensors/cameras and contains the data as provided by the sensor. In case of the Hyperspec cameras these files will be hypercubes, in case of the 3D scanner these files will be ".ply" point clouds.
definition: Sensors with a constant data flow, independent of the current location of the gantry, like wind or temperature sensors. These sensors will deliver data all the time, 24/7 in short intervals (or perhaps sync daily?).
The folder "locationIndependent" stores the constant stream of gantry location independent sensor data. It is divided in sub folders by date and then by timestamp. Each timestamp folder contains a single info.json
file that contains the data and meta data of all sensors.
From @solmazhajmohammadi on March 18, 2016 20:31
@czender, At the moment there is no accurate GPS at the gantry box. It is planned to add geographical coordinates to metadata, based on the fixed point on the ground and a RTK GPS location.
Here are locations from each corner of the field that @rjstrand sent from his phone:
SE Corner 33° 04.470' N / -111° 58.485' W
SW Corner. 33° 04.474' N / -111° 58.505' W
NW Corner. 33° 04.592' N / -111° 58.505' W
NE Corner. 33° 04.591' N / -111° 58.487' W
I have used linear scaling formula to transfer the coordinates, it should be enough for the current GPS location.
To determine the location of each pixel within the image, Extrinsic and Intrinsic calibration matrices are needed.
3 coordinate systems exist:
1 ) Extrinsic calibration matrix:
The extrinsic calibration parameters specify the transformation from the field to the camera coordinates, and it is represented as,
[Xc Yc Zc]'= Ro [Xf Yf Zf]'+ to
Since there is no rotation (Ro), MEX is only translating vector to. to is transition vector from the control point (0,0,0) to camera position [Xc Yc Zc]. Considering that gantry is moving with constant speed in x direction, and metadata information shows the starting time and start position of a scan.
to = [+xg+Vx*(t) +yg +zg]'
Where (xg ,yg ,zg ) is the camera position in gantry box. For Hyperspectral camera (xg, yg ,zg ) is (1.9 , 0.855 , 0.635). Vx is velocity in x direction. t represents time difference. ()' is transpose operation.
2 ) Intrinsic calibration matrix:
Notes:
The intrinsic calibration parameters specify the transformation from the camera coordinate to the pixel coordinates.
Coordinate transformation between camera plane and image plane:
[Xp Yp f]'= A [ I 0 ] [Xc Yc Zc]'
Simple orientation matrix is:
αx = αy =α and it's focal length divided by pixel pitch. u0 and v0 denote the principal point (ideally center of image)
For SWIR camera focal length is 24 mm, and pixel pitch is equal to 25 um. I dont know how much is γ. I have contacted the Headwall group and hopefully soon, they will provide me with more information on calibration of the camera. So, I will update your response soon.
Copied from original issue: terraref/documentation#9
While the VNIR is uninstalled I have taken the opportunity to try out different aperture settings on the StereoVIS system while the RGB cameras are accessible. I have just discovered that the cameras have been physically set at f/16 while the metadata has been reporting them as set at f/4.
I have taken the liberty of changing the fixed metadata file so it will write correct values moving forward. This does mean that all past metadata files should be edited to be correct.
Our plans for settings on the cameras in the future are now much less limited. My initial "by eyeball" look at the output of the cameras, even at f/8 with gain and exposure both significantly reduced, is that image quality will be much improved. If the cameras were indeed intended to be set at f/4 and the DoF is not an issue I anticipate no further issue with noise from high gain, as well as greatly reduced if not eliminated motion blur.
PEcAn uses Climate Forecasting 'standard names' and 'canonical units' conventions (widely used in climate / met community) for meteorological and ecosystem-level mass and energy balance variables.
Here are some examples (note that we can change from canonical units to match the appropriate scale, e.g. "C" instead of "K"; time can use any base time and time step (e.g. hours since 2015-01-01 00:00:00 UTC
, etc. But the time zone has to be UTC, where 12:00:00 is approx (+/- 15 min). solar noon at Greenwich.
Examples:
CF standard-name | units |
---|---|
time | days since 1700-01-01 00:00:00 UTC |
air_temperature | K |
air_pressure | Pa |
mole_fraction_of_carbon_dioxide_in_air | mol/mol |
moisture_content_of_soil_layer | kg m-2 |
soil_temperature | K |
relative_humidity | % |
specific_humidity | 1 |
water_vapor_saturation_deficit | Pa |
surface_downwelling_shortwave_flux_in_air | W m-2 |
surface_downwelling_photosynthetic_photon_flux_in_air | mol m-2 s-1 |
precipitation_flux | kg m-2 s-1 |
wind_speed | m/s |
eastward_wind | m/s |
northward_wind | m/s |
Currently the spectral radiances are uncalibrated, and provided in the environmental logger as:
"spectrometer": {
"maxFixedIntensity": "16383",
"integration time in ?s": "5000",
"band": {
"wavelength": 337.70483,
"spectrum": 1500.0
},
"band": {
"wavelength": 338.16013791719934,
"spectrum": 1500.0
},
"band": {
"wavelength": 338.61548740418232,
"spectrum": 1503.0
},
"band": {
"wavelength": 339.07087845402685,
"spectrum": 1500.0
}, ...
from
2016-04-13_00-38-15_environmentlogger.zip
Calibration files are in EnvironmentLogger/CalibrationData/
Calibrations.zip
The output of the spectrometer is 'raw' counts.
@TinoDornbusch in #26 you wrote
You need to use the attached calibration files to convert it to units of µW m-2 s-1. Careful you need to take the bandwidth of the chip into account (0.4nm) if you want to convert to µmol m-2 s-1.
Could you or @markus-radermacher-lemnatec please clarify, and provide an equation for converting the information in the calibration files to reflectances?
TERRA-Ref Maricopa Sorghum 2016 first season summary:
Summary of manually-collected field measurements (ground truth data collection):
/scripts
directory.This will help understand who can help where, and what we need to fill in.
@czender Can you please take a look at the SWIR data from the following dates:
"time": "06/08/2016 10:57:09"
"Time": "05/06/2016 10:51:25"
It seems that all the bands are overexposed.
@TinoDornbusch any changes have been made on camera setting since last month?
We need a minimum of three additional external members for standards committee
The first sample of hyperspectral image data is ready with units of "Exposure on scale from 0 to 2^16-1 = 65535". Data available as 134MB file foo.nc and much smaller metadata text file foo.cdl.
This has been prepared using the hyperspectral/terraref.sh script.
Next steps include calibration and conversion of exposure to reflectance as defined in #14.
For consistent interpretation of calibration panels, Larry Biehl suggests making sure that the panel is level.
Would it be feasible to attach a circular level similar to this one to the test objects?
Most of the data generated by the Cat 5 platform is intended for distribution and reuse with attribution (e.g. CC-BY or BSD-compatible license) or anything goes (CC-0). It is equally important to allow restrictions on access to and use of proprietary algorithms and data product.
What specific features are required? What solutions are available?
In particular:
Why should a researcher release code / data / etc before publication?
because it is the TERRA REF team mandate - current plans are to release these to maximize reuse with attribution (e.g. MIT or CC-By). However, this could be done after some embargo, e.g. 6-12 months. And we aren't technically on the hook for public release until Nov 2018 (though for the most part, we are making data and code available as we produce and develop it).
So ... even if we make these resources public, they may not start with permissive licenses. In addition to acknowledgement, prior to planned open release we could require co-authorship. If we make the conditions of use and reuse clear, disobeying these conditions would be unethical, and possibly illegal. An academic violating such conditions would risk institutional or professional society discipline and publication retraction. Stealing IP for profit would be theft (though difficult to catch).
Creating a list or catalog of existing and planned data products for the TERRA project -- more than just the individual sensor metadata. It might be worth compiling a catalog similar to the NEON project (http://www.neonscience.org/sites/default/files/basic-page-files/NEON_DOC_002652.pdf). Based on the NEON data catalog, possible attributes of a data product description (e.g., catalog record) are listed in the "Data product catalog" section of the following Google Doc:
https://docs.google.com/document/d/13gXD_OVLffm0hqahDZ3tUvru8IV1fRfM6DiuOcfjr3s/edit?usp=sharing
The resulting list should be put in the documentation repository.
For each sensor, what is the instrument precision, and how does this translate to the numeric precision that we need to store.
Using floats with a fixed number of significant digits. Not optical sensor provides more than 5 significant digits, so single precision float is more than enough.
@ZongyangLi @pless
Right now the origin of the coordinate system is set to the location of calibration object during the first calibration run.
We need to know the part numbers and specs for the calibration panels we are using. For example, SphereOptics says that for each panel
Calibration will be performed on a Perkin Elmer Lambda 19, data will be supplied electronically in 1 nm steps, 50 nm step printed documentation with NIST/PTB traceability with certificate for the range from 250 nm-2500 nm.
We need to collect this documentation. Please attach such documents to this issue.
It will also be useful to have pictures of each panel made with a standard RGB camera.
First draft is here: https://docs.google.com/document/d/1iP8b97kmOyPmETQI_aWbgV_1V6QiKYLblq1jIqXLJ84/edit
The intent is to begin to get feedback on some rough sketches of what some data products might look like.
To this end, I have simulated the type of data that a sensor might observe, along with some of the underlying environmental drivers and physiological traits.
Note that there will be numerical artifacts, quasi-meaningful error terms, and liberal re-application of core concepts for the purposes of developing these datasets.
All of these simulated datasets are released CC-BY-0: do with as you please, but these are not production quality - just trying to meet demand and begin getting feedback.
I have used the names currently used in BETYdb.org/variables, along with names inspired by the more standardized naming Climate Forecasting conventions. However, at this point this is a very early pre-release, and comments on how such data should be formatted and accessed can be discussed in issue #18.
227 lines grown at each of three sites along a N-S transect in Illinois over five years (2021-2025). Two years were dry, two were wet, and one was average.
These are historic data, but the years have been changed to emphasize the point that these are not real data.
year | drought index |
---|---|
2021 | wet |
2022 | dry |
2023 | normal |
2024 | wet |
2025 | dry |
These are approximate locations used to query the meteorological and soil data used in the simulations.
site name | latitude | longitude |
---|---|---|
north | 42.0 | -88.5 |
central | 40.0 | -88.5 |
south | 37.0 | -88.5 |
Each site has four replicate fields: A, B, C, D. This simulated dataset assumes each field within a site has similar, but different meteorology (e.g., as if they were all in the same county).
Two-hundred and twenty-seven lines were grown at each site. They are identified uniquely by an integer in the range [9915:10141]
The phenotypes associated with each genotype is in the file phenotypes.csv
.
These 'phenotypes' are used as input parameters to the simulation model. We often refer to these as 'traits' (as opposed to biomass or growth rates, which are states and proceses). In this example, we assume that 'phenotypes' are time-invariant.
variable_id | name | standard_name | units | Description |
---|---|---|---|---|
genotype | genetically and phenotypically distinct line | |||
Vmax | umol m-2 s-1 | maximum carboxylation of Rubisco according to the Collatz model | ||
38 | cuticular_cond | conductance_of_fully_closed_stomata | umol H2O m-2 s-1 | leaf conductance when stomata fully closed |
15 | SLA | specific_leaf_area | m2 kg-1 | Specific Leaf Area |
39 | quantum_efficiency | mole_ratio_of_carbon_dioxide_to_irradiance_in_leaf | fraction | see Farqhuar model |
18 | LAI | leaf_area_index | m2 leaf m-2 ground | Leaf Area Index |
31 | c2n_leaf | mass_ratio_of_carbon_to_nitrogen_in_leaf | ratio | C:N ratio in leaves |
493 | growth_respiration_coefficient | respiration_coefficient_for_growth | mol CO2 / mol net assimilation | amount of CO2 released due to growth per unit net photosynthesis |
7 | leaf_respiration_rate_m2 | respiration_rate_per_unit_area_in_leaf | umol CO2 m-2 s-1 | Not really ""dark respiration"" Often this is respiration that occurs in the light. Date and time fields ""should"" identify pre-dawn (nightime/dark) leaf resp vs the Rd that comes from a A-Ci or A-PPFD curve |
4 | Vcmax | rubisco_carboxylation_rate_in_leaf_assuming_saturated_rubp | umol CO2 m-2 s-1 | maximum rubisco carboxylation capacity |
404 | stomatal_slope.BB | stomatal_slope_parameter_assuming_ball_berry_model | ratio | slope parameter for Ball-Berry Model of stomatal conductance |
5 | Jmax | electron_transport_flux_in_thylakoid_assuming_saturated_light | umol photons m-2 s-1 | maximum rate of electron transport |
492 | extinction_coefficient_diffuse | extinction_coefficient_for_diffuse_light_in_canopy | canopy extinction coefficient for diffuse light |
This dataset includes what a sensor might observe, daily for five years during the growing season.
note A sensor won't observe roots or rhizomes. Furthermore, Sorghum doesn't have rhizomes. The simulated biology is a little different.
variable_id | name | standard_name | units | Description |
---|---|---|---|---|
sitename | Name of site | |||
plotid | experimental replicate plot | |||
year | ||||
date | YYYY-MM-DD | |||
Stem | stem_biomass_content | Mg / ha | ||
Leaf | leaf_biomass_content | Mg / ha | ||
Root | root_biomass_content | Mg / ha | ||
Rhizome | rhizome_biomass_content | Mg / ha | ||
18 | LAI | leaf_area_index | ratio | Leaf Area Index is the ratio of leaf area to ground area |
NDVI | normalized_difference_vegetation_index | ratio | commonly used vegetation index | |
Height | canopy_height | m |
Please provide feedback. Please provide feedback to [email protected], visit the TerraRef Reference Data chatroom or comment in our GitHub repository.
If you do something cool, please send comments and figures!
Data are located on Box: https://uofi.box.com/sorghum-simulation
Here is a recent meta-data being generated by the Lemnatec Field system is here: https://gist.github.com/dlebauer/ccc1940fefbacaa60296
We will also need the following information:
I'll note that any data that is stable in time (e.g. especially calibration data, but also position and structure of each camera) could be stored separately.
Anything else?
From 2016-06-03 meeting, @LTBen offered to draft sensor calibration protocols, and would do this with help from @smarshall-bmr , @JeffWhiteAZ, @TinoDornbusch
Here is the list of sensors on google drive
Draft of Calibration Protocols
For each sensor, define:
def: a few sentences that describe a task someone wants to accomplish.
will help prioritize feature development and project organization, and get feedback.
ex1: someone is looking at an image in Clowder, have identified a particular trait, and want to find all plants with this trait (within some range or greater than, e.g. top 10% biomass) and the find other data assoc. w/ these plants
ex2: I have an interesting thing I’ve noticed, can I find all plants w/ same feature +/- X%
ex3: Want to upload data so someone else can get to it and its metadata
ex4: want to publish a collection from Clowder
for example:
name | units | description | method of measurement |
---|---|---|---|
SLA | m2 kg-1 | Specific leaf area leaf area per unit mass | hole punch |
Ultimately, these should be entered into the trait database which will look like this: https://www.betydb.org/variables/15.
Tool for managing synonyms integrated w/ Clowder:
Other vocabularies:
@JeffWhiteAZ has data from the Hot Cereal Cereal experiment and cotton yield trials that comply with ICASA variable naming conventions, stored in both JSON and spreadsheet format.
Noah Fahlgren, Rob Pless at WUSTL and Charlie Zender
Include sample data
Private owncloud server?
Just had a query for wheat data. Do we know if any usable data were obtained from the Scanalyzer?
Is Clowder the best place to monitor data availability? I'd guess I'd like a data overview that tells the status of anything in the pipeline - raw, initial QC ... final product
add users
request that users encourage their postdocs and graduate students to sign up
replaced by terraref/computing-pipeline#10
Write a document that includes definitions of
Friday 8/26 the SWIR was uninstalled because its sensor thermoregulation system is failing to maintain a constant temperature. We are currently waiting for an RMA tag from Headwall before we ship it out for repairs. LemnaTec is heading this process but I'll act as a liaison and keep everyone updated as the RMA process continues. If the RMA on the VNIR is anything to go by then we will be without the SWIR for at least a month.
This is a proposal for spectral and imaging data to be provided as HDF-5 / NetCDF-4 data cubes for computing and downloading by end users.
Following CF naming conventions [1], these would be in a netcdf-4 compatible / well behaved hdf format. Also see [2] for example formats by NOAA
Questions to address:
see also PecanProject/pecan#665
variable name | units | dim 1 | dim 2 | dim 3 | dim 4 | dim 5 |
---|---|---|---|---|---|---|
surface_bidirectional_reflectance | 0-1 | lat | lon | time | radiation_wavelength | |
bandwidth | 0-1 | lat | lon | time | radiation_wavelength | |
upwelling_spectral_radiance_in_air | W m-2 m-1 sr-1 | lat | lon | time | radiation_wavelength | zenith_angle |
note: upwelling_spectral_radiance_in_air may only be an intermediate product (and perhaps isn't exported from some sensors?) so the focus is really on the reflectance as a Level 2 product.
dimension | units | notes |
---|---|---|
time | hours since 2016-01-01 | first dimension |
latitude | degrees_north | (or alt. projection_y_coordinate) |
longitude | degrees_east | (or alt. porjection_x_coordinate below) |
projection_x_coordinate | m | can be mapped to lat/lon with grid_mapping attribute |
projection_y_coordinate | m | can be mapped to lat/lon with grid_mapping attribute |
radiation_wavelength | m | |
zenith_angle | degrees | |
optional | ||
sensor_zenith_angle | degrees | |
platform_zenith_angle | degrees |
[1] http://cfconventions.org/Data/cf-standard-names/29/build/cf-standard-name-table.html
[2] http://www.nodc.noaa.gov/data/formats/netcdf/v1.1/
I propose that we keep a separate file containing 'fixed sensor metadata'. It should have the same fields / schema that are currently in the *_metadata.json files as well as additional information.
We should not change the current _metadata.json files that are generated, just have a new canonical source for the information.
There have been a number of times when the 'fixed' information has been changed or updated. This will make it easier to update meta-data without having to rewrite the original metadata files that had incorrect information.
Here are some of the issues that such a static (or infrequently updated) file would help:
We will be producing diverse data products, protocols, and software (hereafter ‘products’). These are intended for distribution and reuse with attribution Products will be continuously revised, updated, and expanded. New versions released annually, with opportunity for community feedback. Other teams will potentially contribute protocols, similar types of data. Data sets from different teams could be merged for analysis (ie more than one attribution)
See also meeting notes: https://goo.gl/QhpwcH
TODO
Recorded camera location in camera box for some of the sensors (including SWIR) is not correct. Stuart is going to fix it.
The first environmental data samples (e.g. 2016-02-15_21-20-08_enviromentlogger.json.txt) are in a json key:value format.
I propose the following changes:
spectrum
and wavelength
but nothing measuring irradiance
In order to validate the algorithm for image alignment and stitching, I need images from full-grown plants.
Our goal is to create data products that are easy to access and use.
There are a few classes of data:
For each class of data:
Each data format should have brief description, focusing on
This is a proposal open for comments and contributions. We plan to update these specifications annually, starting with v0 in Nov. 2016
From @LTBen: The researchers marked the measurement “center” of each range (plot). Stuart and Tino collected the gantry coordinates of these positions
@smarshall-bmr and @TinoDornbusch could you please send these coordinates to me or upload them to google drive?
National Data Service:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.