Giter Site home page Giter Site logo

spectralentropy's Introduction

DOI Python Package using Conda Python package

When use this package, please cite this manuscript:

Li, Y., Kind, T., Folz, J. et al. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat Methods 18, 1524–1531 (2021). https://doi.org/10.1038/s41592-021-01331-z

Search spectra with entropy similarity

To search spectral files with entropy similarity, you can download pre-compiled program from https://github.com/YuanyueLi/EntropySearch/releases.

For advanced user who want to calculate spectral entropy / entropy similarity / other spectral similarity by themself, please use the Python code below.

A jupyter notebook example is provided here: https://github.com/YuanyueLi/SpectralEntropy/blob/master/example.ipynb

The detailed reference for using the 43 different algorithm to calculate spectral similarity can be found here: https://SpectralEntropy.readthedocs.io/en/master/

You might noticed a entropy similarity score higher than 1 in your self-implemented code, this is due to the mistake in merging peaks within MS2-tolerance. You can use the code implemented here to avoid this problem. We are working to provide a R-implemented code for entropy similarity, which will be released soon.

Requirement

Python 3.7, numpy>=1.17.4, scipy>=1.3.2

cython>=0.29.13 (Not required but highly recommended)

# The command below is not required but strongly recommended, as it will compile the cython code to run faster
python setup.py build_ext --inplace

Spectral entropy

To calculate spectral entropy, the spectrum need to be centroid first. When you are focusing on fragment ion's information, the precursor ion may need to be removed from the spectrum before calculating spectral entropy. If isotope peak exitsted on the MS/MS spectrum, the isotope peak should be removed fist as the isotope peak does not contain useful information for identifing molecule.

Calculate spectral entropy for centroid spectrum with python is very simple (just one line with scipy package).

import numpy as np
import scipy.stats

spectrum = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)

entropy = scipy.stats.entropy(spectrum[:, 1])
print("Spectral entropy is {}.".format(entropy))
# The output should be: Spectral entropy is 0.3737888038158417.
print('-' * 30)

For profile spectrum which haven't been centroid, you can use a clean_spectrum to centroid the spectrum, for example:

import numpy as np
import scipy.stats
import spectral_entropy

spectrum = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)

spectrum = spectral_entropy.clean_spectrum(spectrum)
entropy = scipy.stats.entropy(spectrum[:, 1])
print("Spectral entropy is {}.".format(entropy))
# The output should be: Entropy similarity:0.2605222463607788.
print('-' * 30)

We provide a function clean_spectrum to help you remove precursor ion, centroid spectrum and remove noise ions. Please note that this function will not remove the isotope peak, you need to remove the isotope peak by yourself. For example:

import numpy as np
import spectral_entropy

spectrum = np.array([[41.04, 0.3716], [69.071, 7.917962], [69.071, 100.], [86.0969, 66.83]], dtype=np.float32)
clean_spectrum = spectral_entropy.clean_spectrum(spectrum,
                                                 max_mz=85,
                                                 noise_removal=0.01,
                                                 ms2_da=0.05)
print("Clean spectrum will be:{}".format(clean_spectrum))
# The output should be: Clean spectrum will be:[[69.071  1.   ]]
print('-' * 30)

Entropy similarity

Before calculate entropy similarity, the spectrum need to be centroid first. Remove the noise ions is highly recommend. Also, base on our test on NIST20 and Massbank.us database, remove ions have m/z higher than precursor ion's m/z - 1.6 will greatly improve the spectral identification performance.

We provide calculate_entropy_similarity function to calculate two spectral entropy.

import numpy as np
import spectral_entropy

spec_query = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)
spec_reference = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)

# Calculate entropy similarity.
similarity = spectral_entropy.calculate_entropy_similarity(spec_query, spec_reference, ms2_da=0.05)
print("Entropy similarity:{}.".format(similarity))
# The output should be: Entropy similarity:0.8984397722577456.
print('-' * 30)

Spectral similarity

We also provide 43 different spectral similarity algorithm for MS/MS spectral comparison

You can find the detail reference here: https://SpectralEntropy.readthedocs.io/en/master/

Example code

Before calculating spectral similarity, it's highly recommended to remove spectral noise. For example, peaks have intensity less than 1% maximum intensity can be removed to improve identificaiton performance.

import numpy as np
import spectral_entropy

spec_query = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)
spec_reference = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)

# Calculate entropy similarity.
similarity = spectral_entropy.similarity(spec_query, spec_reference, method="entropy",
                                         ms2_da=0.05)
print("Entropy similarity:{}.".format(similarity))
# The output should be: Entropy similarity:0.8984397722577456.
print('-' * 30)

# Calculate unweighted entropy similarity.
similarity = spectral_entropy.similarity(spec_query, spec_reference, method="unweighted_entropy",
                                         ms2_da=0.05)
print("Unweighted entropy similarity:{}.".format(similarity))
# The output should be: Unweighted entropy similarity:0.9826668790176113.
print('-' * 30)

# Calculate all similarity.
all_dist = spectral_entropy.all_similarity(spec_query, spec_reference, ms2_da=0.05)
for dist_name in all_dist:
    method_name = spectral_entropy.methods_name[dist_name]
    print("Method name: {}, similarity score:{}.".format(method_name, all_dist[dist_name]))

# A list of different spectral similarity will be shown.

Supported similarity algorithm list:

"entropy": Entropy distance
"unweighted_entropy": Unweighted entropy distance
"euclidean": Euclidean distance
"manhattan": Manhattan distance
"chebyshev": Chebyshev distance
"squared_euclidean": Squared Euclidean distance
"fidelity": Fidelity distance
"matusita": Matusita distance
"squared_chord": Squared-chord distance
"bhattacharya_1": Bhattacharya 1 distance
"bhattacharya_2": Bhattacharya 2 distance
"harmonic_mean": Harmonic mean distance
"probabilistic_symmetric_chi_squared": Probabilistic symmetric χ2 distance
"ruzicka": Ruzicka distance
"roberts": Roberts distance
"intersection": Intersection distance
"motyka": Motyka distance
"canberra": Canberra distance
"baroni_urbani_buser": Baroni-Urbani-Buser distance
"penrose_size": Penrose size distance
"mean_character": Mean character distance
"lorentzian": Lorentzian distance
"penrose_shape": Penrose shape distance
"clark": Clark distance
"hellinger": Hellinger distance
"whittaker_index_of_association": Whittaker index of association distance
"symmetric_chi_squared": Symmetric χ2 distance
"pearson_correlation": Pearson/Spearman Correlation Coefficient
"improved_similarity": Improved Similarity
"absolute_value": Absolute Value Distance
"dot_product": Dot-Product (cosine)
"dot_product_reverse": Reverse dot-Product (cosine)
"spectral_contrast_angle": Spectral Contrast Angle
"wave_hedges": Wave Hedges distance
"cosine": Cosine distance
"jaccard": Jaccard distance
"dice": Dice distance
"inner_product": Inner Product distance
"divergence": Divergence distance
"avg_l": Avg (L1, L∞) distance
"vicis_symmetric_chi_squared_3": Vicis-Symmetric χ2 3 distance
"ms_for_id_v1": MSforID distance version 1
"ms_for_id": MSforID distance
"weighted_dot_product": Weighted dot product distance"

spectralentropy's People

Contributors

yuanyueli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.