conor-horgan / deeper Goto Github PK
View Code? Open in Web Editor NEWDeepeR: deep learning enabled Raman spectroscopy
License: MIT License
DeepeR: deep learning enabled Raman spectroscopy
License: MIT License
Dear Dr. Horgan:
Good morning!
I have read your paper High-throughput molecular imaging via deep learning enabled Raman spectroscopy and your code GitHub - conor-horgan/DeepeR: DeepeR: deep learning enabled Raman spectroscopy.
They are of great help for my current research, which is about Raman spectroscopy and bacteria metabolism .
I wonder if you can help me in several questions:
In your training dataset 159618X500, how can I transfer the 500-length data array back to actual Raman shift? (What is the formula to project X-axis from 0 to 1800 Ramen Shift(cm-1) to 0-500 data array in the Training_Input?
In our research, a D2O/H2O peak between 1700-2700 Raman shift is important to predict the metabolism of the bacteria. However, in your dataset there is only 500 datapoints (I assume your data’s Raman shift is about 500-1800).
Do you have a untruncated dataset that have longer data length (for example data with 500-4000 Raman shift)?
We would be very grateful if you can share with us such longer dataset.
We have about 10K Raman data for E-coli (500-4000 Raman shift). I am currently considering randomly concatenate our 10K data with longer Raman shift with your 160K data (500-1800 Raman shift), do you think this is a viable solution for data augmentation? Radom sampling the 10K longer Raman shift dataset and concatenate with your 160K shorter Raman shift dataset would produce another 160K+ longer dataset.
What about overfitting? I found my current encoder-decoder DL model tend to memorize the average spectrum from the training set. If we encounter a totally new bacteria, the denoising will produce false image (which is understandable). Encoder-decoder models produce the clear output half by encoder-side information, half by decoder side memorization.
In our use cases, we can often find some new bacterium that is out of the training set on the slices.
However, my current model will still produce the spectrum of E-coli (bacteria in the training set) rather than the spectrum of Lactobacillus (new bacteria) in the real test samples.
Do you have any suggestions to prevent the DL model from producing output just by memory, when a incoming sample is clearly a new specimen?
It might be better to give up some accuracy to prevent the model from producing every spectrum according to the training set specimen average.
Maybe putting more weights on the encoder side information and trying to reduce the memorization capacity on the decoder side?
I currently don’t have good solution for this problem, hoping to hear your insight.
Regards!
WEI LI
Thank you that you made this dataset publicly available! However, I think there is a missing "Image_IDs.csv" from the Hyperspectral Super-Resolution dataset. Could you upload this file also to the drive or show me how to create it?
Is it now convenient to open source the dataset? We want to collect spectral images in our research. It's very kind of you to open source your data. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.