Giter Site home page Giter Site logo

ankitkumar9018 / research-2020-spectraml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from velexi-research/spectraml-2020

0.0 1.0 0.0 53.66 MB

Lab notebook for research project applying machine learning to classify reflectance spectra from USGS High Resolution Spectral Library.

License: Other

Python 99.99% Makefile 0.01% Shell 0.01%

research-2020-spectraml's Introduction

SpectraML Project

Authors
Kevin T. Chu <[email protected]> Bonita Song Srikar Munukutla


Table of Contents

  1. Overview

    1.1. Software Dependencies

    1.2. Directory Structure

    1.3. Template Files

  2. Setting Up

    2.1. Python Environment

    2.2: Preparing Spectra Data

  3. References


1. Overview

The SpectraML project team researches applications of machine learning to the analysis of spectroscopic data. We are currently focused on the following core areas:

  • feature engineering (e.g., preprocessing algorithms for spectra);

  • machine learning algorithms (e.g., artificial neural networks, CNNs); and

  • performance evaluation framework (e.g., bootstrap, k-fold cross-validation).

As a model problem, we are developing a machine learning system for classifying reflectance spectra from the USGS Spectral Library Version 7 dataset.

1.1 Software Dependencies

Base Requirements

  • Python

Required Python Packages

See requirements.txt for list of Python packages required for this project.

Recommended Python Packages

  • autoenv
  • virtualenv
  • virtualenvwrapper

1.2 Directory Structure

README.markdown
requirements.txt
bin/
config/
data/
docs/
lab-notebook/
lib/
reports/
  • README.markdown: this file

  • requirements.txt: pip requirements file containing Python packages for data science, testing, and assessing code quality

  • bin: directory containing utility programs

  • config: directory containing template configuration files (e.g., autoenv configuration file)

  • data: directory where project datasets should be placed. Note: in general, datasets should not be committed to the git repository. Instead, datasets should be placed into this directory (either manually or using automation scripts) and referenced by Jupyter notebooks. See Section 2 for details.

  • docs: directory containing project documentation and notes

  • lab-notebook: directory containing Jupyter notebooks used for experimentation and development. Jupyter notebooks saved in this directory should (1) have a single author and (2) be dated.

  • lib: directory containing source code developed to support project

  • reports: directory containing Jupyter notebooks that present and record final results. Jupyter notebooks saved in this directory should be polished, contain final analysis results, and be the work product of the entire data science team.

1.3. Template Files

Template files and directories are indicated by the 'template' suffix. These files and directories are intended to simplify the set up of the lab notebook. When appropriate, they should be renamed (with the 'template' suffix removed).


2. Setting Up

2.1. Python Environment

  • Create Python virtual environment for project.

    $ mkvirtualenv -p /PATH/TO/PYTHON PROJECT_NAME
  • Install required Python packages.

    $ pip install -r requirements.txt
  • Set up autoenv.

    • Copy config/env.template to .env in project root directory.

    • Set template variables in .env (indicated by {{ }} notation).

2.2. Preparing Spectra Data

A zip file containing the full USGS Spectra Library (Version 7) is included in the data directory. To prepare the spectra data for use in Jupyter notebooks, use following instructions.

  • Extract the data files in ASCIIdata_splib07a.zip.

    $ cd data
    $ unzip ASCIIdata_splib07a.zip
  • Generate standardized version of spectra by using the standardize-spectra script. standardize-spectra carries out the following operations:

    • fills in missing data points with interpolated values;

    • resamples spectra so that they all have the same abscissa values;

    • saves spectra to CSV files containing wavelength and reflectance values;

    • generate the spectra-metadata.csv database containing metadata for each spectrum; and

    • names each spectrum file using the unique ID (in spectra-metadata.csv) associated with the spectrum.

    Usage

    The following provide several examples of how to use standardize-spectra. Note: if the standardize-spectra command cannot be found, check that bin is on your path.

    • Show help message.

      $ standardize-spectra --help
    • Basic usage uses default output directory and wavelength values.

      $ cd data
      $ standardize-spectra ASCIIdata_splib07a spectrometers
    • Set custom output directory by using the -o OUTPUT_DIR option.

      $ cd data
      $ standardize-spectra ASCIIdata_splib07a spectrometers -o custom-location
    • Set number of wavelengths in spectra directory by using the --num-wavelengths NUM_WAVELENGTHS option.

      $ cd data
      $ standardize-spectra ASCIIdata_splib07a spectrometers \
        --num-wavelengths 2000
  • Use lists of spectra IDs to define collections of spectra. Within Jupyter notebook, use the following directory paths to facilitate access to spectra files.

    # Data directories
    data_dir = os.environ['DATA_DIR']
    spectra_data_dir = os.path.join(data_dir, 'ASCIIdata_splib07a')
    
    # Path to data file for spectra with ID=12345
    spectrum_path = os.path.join(spectra_data_dir, '12345.csv')

3. References


research-2020-spectraml's People

Contributors

fromhello2hello avatar ktchu avatar srikarmunukutla avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.