The deeper from conor-horgan

Questions for: High-throughput molecular imaging via deep learning enabled Raman spectroscopy

Dear Dr. Horgan:

Good morning!

I have read your paper High-throughput molecular imaging via deep learning enabled Raman spectroscopy and your code GitHub - conor-horgan/DeepeR: DeepeR: deep learning enabled Raman spectroscopy.
They are of great help for my current research, which is about Raman spectroscopy and bacteria metabolism .

I wonder if you can help me in several questions:

In your training dataset 159618X500, how can I transfer the 500-length data array back to actual Raman shift? (What is the formula to project X-axis from 0 to 1800 Ramen Shift(cm-1) to 0-500 data array in the Training_Input?
In our research, a D2O/H2O peak between 1700-2700 Raman shift is important to predict the metabolism of the bacteria. However, in your dataset there is only 500 datapoints (I assume your data’s Raman shift is about 500-1800).
Do you have a untruncated dataset that have longer data length (for example data with 500-4000 Raman shift)？
We would be very grateful if you can share with us such longer dataset.

We have about 10K Raman data for E-coli (500-4000 Raman shift). I am currently considering randomly concatenate our 10K data with longer Raman shift with your 160K data (500-1800 Raman shift), do you think this is a viable solution for data augmentation? Radom sampling the 10K longer Raman shift dataset and concatenate with your 160K shorter Raman shift dataset would produce another 160K+ longer dataset.
What about overfitting? I found my current encoder-decoder DL model tend to memorize the average spectrum from the training set. If we encounter a totally new bacteria, the denoising will produce false image (which is understandable). Encoder-decoder models produce the clear output half by encoder-side information, half by decoder side memorization.

In our use cases, we can often find some new bacterium that is out of the training set on the slices.
However, my current model will still produce the spectrum of E-coli (bacteria in the training set) rather than the spectrum of Lactobacillus (new bacteria) in the real test samples.

Do you have any suggestions to prevent the DL model from producing output just by memory, when a incoming sample is clearly a new specimen？
It might be better to give up some accuracy to prevent the model from producing every spectrum according to the training set specimen average.
Maybe putting more weights on the encoder side information and trying to reduce the memorization capacity on the decoder side?
I currently don’t have good solution for this problem, hoping to hear your insight.

Regards!
WEI LI

Dataset missing csv file

Thank you that you made this dataset publicly available! However, I think there is a missing "Image_IDs.csv" from the Hyperspectral Super-Resolution dataset. Could you upload this file also to the drive or show me how to create it?

Dataset

Is it now convenient to open source the dataset？ We want to collect spectral images in our research. It's very kind of you to open source your data. Thanks!

conor-horgan / deeper Goto Github PK

deeper's People

Contributors

Stargazers

Watchers

Forkers

deeper's Issues

Questions for: High-throughput molecular imaging via deep learning enabled Raman spectroscopy

Dataset missing csv file

Dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent