Giter Site home page Giter Site logo

raman-noodles / raman-spectra-decomp-analysis Goto Github PK

View Code? Open in Web Editor NEW
22.0 3.0 16.0 155.11 MB

Python code to identify and calculate decomposition of materials using Raman spectroscopy

License: MIT License

Python 2.18% Jupyter Notebook 97.82%
raman-spectroscopy decomposition-products analysis chemistry-development-kit data data-visualization data-science engineering-tools analytical-chemistry university-of-washington

raman-spectra-decomp-analysis's Introduction

raman-spectra-decomp-analysis

Code to wrangle Raman spectra, visualize Raman spectra, identify components in a mixture Raman spectra, and ultimately identify decomposition or presence of materials using Raman spectroscopy. Binder

Team members (alphabetical by first name): Brandon Kern, Elizabeth Rasmussen, Parker Steichen

Overall Project Objective

This project identifies, and calculates decomposition in a Raman spectra to output rate data. Advantages to using this method are:

  1. FULLY Open source, no part of the project is dependent on a paid service
  2. AUTOMATED process, analysis is automated leading to fast results
  3. VERIFIABLE, user is made aware of how confident they can be in the results via a statistical software stack

Assumptions and Project Scope

  1. Storing data library that is beyond the decomposition products of formic acid (hydrogen, water, carbon dioxide, carbon monoxide) are not included as other components are beyond the scope of the project at this time.
  • Assumed that the user is trying to analyze the decomposition products of formic acid, or a mixture that consists only of: Formic Acid, Hydrogen, Water, Carbon dioxide, Carbon monoxide.
  1. This project will not be predictive - that is, it will require the user to specifically input what potential compounds will be present in the spectra to be analyzed. This list does not have to be exhaustive; however, the more inclusive the list, the better the fitting and predicting results will be.

Project Breakdown

The project can be thought of as broken down into 3 steps:

  1. Data Wrangling
  2. Peak fitting and identification
  3. Statistical analysis for peak fits

These sections have their own wiki documents and filled juypter notebooks with more detail included throught, see those for more detail on the individual steps.

User Flow and Example of Using Raman Noodles

A user will be able to follow the steps to apply Raman-Noodles to YOUR Formic Acid data set. An example of using the software can be seen in the Example Use Case Wiki Page

Testing of Raman Noodles and Travis-CI

In order to have manageable code we are using Travis' Open Source continuious itegration testing. One thing to note is that on March 1, 2018 Travis-CI switched their model for open source software, the press release about this can be read here

So we do have our team repo viewable on travis-ci.com BUT it will ultimately re-direct you to the old platform for open source software on travis-ci.org.

There is a way to merge the travis-ci.org (open source repos only) to travis-ci.com (now private repos and (closed beta) open source repos) as can be seen by following this link but at this time our team has decided to not join the closed beta as the current (old) method of the dashboard being located on travis-ci.org works just fine.

Future Work

Next steps would include automatic baseline subtraction of spectra to decrease pre-processing time, molar formation calculations to predict reaction pathways, and increased robustness of machine learning for component selection including unsupervised methods.

Conclusion

In conclusion our team successfully created a platform code base for researchers to visualize and analyze their Raman spectra in a fast, automated manor - reducing post-processing time by days and enabling future work to continue on a solid base of open source tools.

  • This software has passed tests to sucessfully identify and analyze the identification of components in mixture Raman data.

  • This work sets up a free and user friendly platform for researchers to be able to analyze their own Raman Spectra.

Acknowledgements

  • Project Sponsor: Igor Novosselov
  • Additional assistance: Dave Beck, and Kelly Thornton
  • Data sets from publicly available from Mendeley Data, “Raman Spectra of Formic Acid Gasification Products in Subcritical and Supercritical Water”
  • Only open source packages were used in this work.

raman-spectra-decomp-analysis's People

Contributors

erasmuss avatar kernb2 avatar parkersteichen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

raman-spectra-decomp-analysis's Issues

formic_supervised_NN_example should remain as supervised and needs better markdown explanations

Although the neural network example takes in unknown experimental spectra and predicts them after initial trained and tested by the known interpolated calibration spectra, the NN uses previously generated classifications to fit its algorithm. This means the NN will only work after the peak identification and classification has been done.
This would still make it a supervised machine learning example. Additionally, the combination of interpolated spectra, logistic regression, and support vector machine fits a lot more in the definition of supervised machine learning.

hdf5 files need to be added to get formic_supervised_NN_example to work

The example uses a preexisting interpolated spectra file. formic_supervised_calibration_interpL-Copy1.hdf5¶
NOTE: To prevent overwritten preexisting interpolated spectr files, always keep the original file unused and duplicate it instead.

For example, if 'formic_supervised_calibration_interpL-Copy1.hdf5' ever gets overwritten Please make sure to duplicate 'formic_supervised_calibration_interpL-Copy1.hdf5' from 'formic_supervised_calibration_interpL.hdf5' before continuing.

formic_supervised_calibration_interpL-Copy1.hdf5 & dataimport_ML_df-Copy1.hdf5 need to be uploaded to the repo.

unnecesary dataframe & syntax error

just comment out the unnecessary stuff
Right before Step 6
area_f = []

area_f_db = pd.DataFrame =( columns = 'ratioAUC')

for index, row in area_base.iterrows():
AUCratio = area_base.auc[index]/area_base.cal_auc[index]
area_f.append((area_base.exp_cond[index], AUCratio))
print("experimental conditions %s with ratio AUC is %s" % (area_base.exp_cond[index], AUCratio))

area_base.append()

Multiple file exists errors in test_dataprep.py & test_dataimport.py

======================================================================
ERROR: This function tests the operation of the peak_assignment

Traceback (most recent call last):
File "C:\Users\user1\Anaconda3\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\tests\test_dataimport.py", line 16, in test_data_import
dataprep.new_hdf5('exp_test_3')
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\dataprep.py", line 36, in new_hdf5
hdf5 = h5py.File('{}.hdf5'.format(new_filename), 'w-')
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 394, in init
swmr=swmr)
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 174, in make_fid
fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 105, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = 'exp_test_3.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = 502)

======================================================================
ERROR: A function that tests the add_calibration function from dataprep. It first tests that no

Traceback (most recent call last):
File "C:\Users\user1\Anaconda3\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\tests\test_dataprep.py", line 28, in test_add_calibration
dataprep.new_hdf5('test')
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\dataprep.py", line 36, in new_hdf5
hdf5 = h5py.File('{}.hdf5'.format(new_filename), 'w-')
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 394, in init
swmr=swmr)
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 174, in make_fid
fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 105, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = 'test.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = 502)

======================================================================
ERROR: A function that tests the add_experiment function from dataprep. It first tests that no

Traceback (most recent call last):
File "C:\Users\user1\Anaconda3\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\tests\test_dataprep.py", line 73, in test_add_experiment
dataprep.new_hdf5('exp_test_1')
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\dataprep.py", line 36, in new_hdf5
hdf5 = h5py.File('{}.hdf5'.format(new_filename), 'w-')
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 394, in init
swmr=swmr)
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 174, in make_fid
fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 105, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = 'exp_test_1.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = 502)

======================================================================
ERROR: A function that tests the adjust_peaks function from dataprep. The function first looks to

Traceback (most recent call last):
File "C:\Users\user1\Anaconda3\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\tests\test_dataprep.py", line 111, in test_adjust_peaks
dataprep.new_hdf5('exp_test_2')
File "C:\Users\user1\Desktop\raman-spectra-decomp-analysis\ramandecompy\dataprep.py", line 36, in new_hdf5
hdf5 = h5py.File('{}.hdf5'.format(new_filename), 'w-')
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 394, in init
swmr=swmr)
File "C:\Users\user1\Anaconda3\lib\site-packages\h5py_hl\files.py", line 174, in make_fid
fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 105, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = 'exp_test_2.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = 502)

Solution:
Make sure to hdf5.close() & os.remove() these files before or figure out why it doesnt work with Brandon's nosetests

Peakidentify_example hdf5 files not deleting themselves after finishing

PermissionError Traceback (most recent call last)
in
----> 1 os.remove('peakidentify_calibration_file.hdf5')
2 os.remove('peakidentify_experiment_file.hdf5')

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'peakidentify_calibration_file.hdf

molardecomp example use of exp key list instead of keylist

Need to update key_list to exp_key_list in the molardecomp example

Example:

#making a loop to identify peaks in all experimental data set and add label to them
frames = []
for i, key in enumerate(exp_key_list):
df = peakidentify.peak_assignment(hdf5_expfilename, key, hdf5_calfilename, 10)
frames.append(df)

dataprep csv input error

Given:
dataprep.add_calibration('dataprep_calibration_test.hdf5','CO2_100wt%.csv',label='CO2')

Error:

XLRDError Traceback (most recent call last)
in
----> 1 dataprep.add_calibration('dataprep_calibration_test.hdf5','CO2_100wt%.csv',label='CO2test')

~\Anaconda3\lib\site-packages\ramandecompy-1.0b0-py3.7.egg\ramandecompy\dataprep.py in add_calibration(hdf5_filename, data_filename, label)
31 # r+ is read/write mode and will fail if the file does not exist
32 cal_file = h5py.File(hdf5_filename, 'r+')
---> 33 data = pd.read_excel(data_filename, header=None, names=('x', 'y'))
34 if data_filename.split('.')[-1] == 'xlsx':
35 data = pd.read_excel(data_filename, header=None, names=('x', 'y'))

~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
186 else:
187 kwargs[new_arg_name] = new_arg_value
--> 188 return func(*args, **kwargs)
189 return wrapper
190 return _deprecate_kwarg

~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
186 else:
187 kwargs[new_arg_name] = new_arg_value
--> 188 return func(*args, **kwargs)
189 return wrapper
190 return _deprecate_kwarg

~\Anaconda3\lib\site-packages\pandas\io\excel.py in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds)
348
349 if not isinstance(io, ExcelFile):
--> 350 io = ExcelFile(io, engine=engine)
351
352 return io.parse(

~\Anaconda3\lib\site-packages\pandas\io\excel.py in init(self, io, engine)
651 self._io = _stringify_path(io)
652
--> 653 self._reader = self._enginesengine
654
655 def fspath(self):

~\Anaconda3\lib\site-packages\pandas\io\excel.py in init(self, filepath_or_buffer)
422 self.book = xlrd.open_workbook(file_contents=data)
423 elif isinstance(filepath_or_buffer, compat.string_types):
--> 424 self.book = xlrd.open_workbook(filepath_or_buffer)
425 else:
426 raise ValueError('Must explicitly set engine if not passing in'

~\Anaconda3\lib\site-packages\xlrd_init_.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
155 formatting_info=formatting_info,
156 on_demand=on_demand,
--> 157 ragged_rows=ragged_rows,
158 )
159 return bk

~\Anaconda3\lib\site-packages\xlrd\book.py in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
90 t1 = perf_counter()
91 bk.load_time_stage_1 = t1 - t0
---> 92 biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
93 if not biff_version:
94 raise XLRDError("Can't determine file's BIFF version")

~\Anaconda3\lib\site-packages\xlrd\book.py in getbof(self, rqd_stream)
1276 bof_error('Expected BOF record; met end of file')
1277 if opcode not in bofcodes:
-> 1278 bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
1279 length = self.get2bytes()
1280 if length == MY_EOF:

~\Anaconda3\lib\site-packages\xlrd\book.py in bof_error(msg)
1270
1271 def bof_error(msg):
-> 1272 raise XLRDError('Unsupported format, or corrupt file: ' + msg)
1273 savpos = self._position
1274 opcode = self.get2bytes()

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'250.34,-'

Code review for peakidentify update dev

I have functioning plots and unit tests for peakidentify update dev in the example folder, but the peak assignments are off. Need help with a code review for peakidentify .

pandas not updated - concat() got an unexpected keyword argument 'join_axes'

In peakidentify.py

ERROR: This function tests the operation of the peak_assignment

Traceback (most recent call last):
File "/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/travis/build/raman-noodles/raman-spectra-decomp-analysis/ramandecompy/tests/test_peakidentify.py", line 42, in test_peak_assignment
precision, False, plot = False)
File "/home/travis/build/raman-noodles/raman-spectra-decomp-analysis/ramandecompy/peakidentify.py", line 157, in peak_assignment
copy=True,sort=True)
TypeError: concat() got an unexpected keyword argument 'join_axes'

ERROR: Evaluates the functionality of the score_table function

Traceback (most recent call last):
File "/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/travis/build/raman-noodles/raman-spectra-decomp-analysis/ramandecompy/tests/test_peakidentify.py", line 872, in test_score_table
peakidentify.score_table(unknown_peaks,H2O_peaks, precision,unknownname,knownname)
File "/home/travis/build/raman-noodles/raman-spectra-decomp-analysis/ramandecompy/peakidentify.py", line 752, in score_table
copy=True,sort=True)
TypeError: concat() got an unexpected keyword argument 'join_axes'

Solution update pandas on travis

test_dataimport.py and dataimport_example issues

Documentation Issues in the test_dataimport.py.
Testing in the test_dataimport.py

  • mostly just need to figure out how to test against the indexing error

In the dataimport_example
IndexError Traceback (most recent call last)
in
2 directory = '../ramandecompy/tests/test_files/'
3 # open hdf5 file as read/write
----> 4 dataimport.data_import(hdf5_filename,directory)
5 dataprep.view_hdf5(hdf5_filename+'.hdf5')

~\anaconda3\lib\site-packages\ramandecompy-1.0b0-py3.7.egg\ramandecompy\dataimport.py in data_import(hdf5_filename, directory)
45 if filename.startswith('FA_') and filename.endswith('.csv'):
46 locationandfile = directory + filename
---> 47 dataprep.add_experiment(str(hdf5_filename)+'.hdf5', locationandfile)
48 print('Data from {} fit with compound pseudo-Voigt model. Results saved to {}.'.format(filename, hdf5_filename))
49 # printing out to user the status of the import (because it can take a long time if importing a lot of data,

~\anaconda3\lib\site-packages\ramandecompy-1.0b0-py3.7.egg\ramandecompy\dataprep.py in add_experiment(hdf5_filename, exp_filename)
202 specs = specs.split('_')
203 time = specs[-1]
--> 204 temp = specs[-2]
205 # write data to .hdf5
206 exp_file['{}/{}/wavenumber'.format(temp, time)] = data['wavenumber']

IndexError: list index out of range

variable names in hdf5

Currently the hdf5 file calls the x-axis/y-axis columns from the excel (.csv) as 'x' and 'y' but they should be properly named as (for x) 'wavelength' and (for y) 'counts'

Think this is under @ParkerSteichen realm, but also not sure how much it impacts @kernb2

Note: Sometimes the y is also called 'intensity' or 'arb. units' in literature because the counts / amount of resonance that the raman probe and cell emitted can be a function of the set up.

interpolatespectra.interp_and_norm function Failed Test

FAIL: A function that tests that the interpolatespectra.interp_and_norm function is behaving

Traceback (most recent call last):
File "/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/travis/build/raman-noodles/raman-spectra-decomp-analysis/ramandecompy/tests/test_interpolatespectra.py", line 32, in test_interp_and_norm
assert isinstance(tuple_list[0][0], np.int32), 'first element of tuple is not a np.int64'
AssertionError: first element of tuple is not a np.int64

Test hdf5

It looks like there are a couple hdf5 files for peakidentify that need to be added to the repo @kernb2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.