Giter Site home page Giter Site logo

massbank2db's People

Contributors

bachi55 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

mattoslmp

massbank2db's Issues

Generate new accession title when spectra merging

When multiple accessions (spectra / MassBank records) are merged, we need to generate a new accession title and give it as property of the returned MBSpectrum object.

We can use the procedure from _to_metfrag_output:

ds = re.compile("[A-Z]+").match(self.get("accession")[0])[0]
_base_fn = ds + sha1("".join(self.get("accession")).encode('utf-8')).hexdigest()[:(8 - len(ds))]
# e.g. AU3a1fd8

Return merge spectra

Iterating over a dataset currently returns a list of spectra per iteration, if the option to group the spectra is set to True. The list of spectra can correspond to, e.g., multiple collision energies.

We should add an option to return merged spectra, as those are required by some downstream functions like MetFrag.

Options to add merging:

  • xcms implements the function mzClust_hclust using hierarchical clustering to merge the peaks in a spectrum
    -> take the function (C) and run it from python
    -> Hierarchical clustering implemented from this publication: https://link.springer.com/article/10.1007/s11306-006-0021-7
    - CSI:FingerID implements a merging function as well
    -> reimplement in python
    - implement own merging function

Move MassbankDB class parameters to the dataset insertion function

The following parameters should be moved to the dataset insertion function:

only_with_rt=True
only_ms2=True
use_pubchem_structure_info=True
exclude_deprecated=True
min_number_of_unique_compounds_per_dataset=50
pc_dbfn=None

Those are only relevant, when data is inserted. Furthermore, the MassbankDB class only acts as a wrapper around the SQLite DB file. It provides standardized functionality to access the data in the DB. It, however, does not (need to) store information about the state of the SQLite DB.

Test without PubChem Update

Check what happens if skip the PubChem update step.

  • We cannot use the CID as row-identifier (primary key) in the molecules table
  • Which information we currently assume to be added to the MassBank DB, but they require PubChem?

Implement filter for early eluting molecules

Implement a record filter based on the (estimated) column-dead-time to filter out early eluting / non-retaining molecules.

Questions here:

  • Is my definition of "early" eluting molecules actually a universally applicable definition?
  • How do we deal with datasets for which we cannot estimate the column-dead-time because the necessary meta-information is not provided?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.