Giter Site home page Giter Site logo

ann-solo's People

Contributors

bittremieux avatar issararab avatar wfondrie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ann-solo's Issues

Read query spectra from mzML and mzXML

Extend the support for input peak files to mzML and mzXML. Optionally support reading compressed files (zip, gz, xz) directly without the need for explicit unzipping first.

  • Read spectra from mzML files.
  • Read spectra from mzXML files.
  • Read spectra from compressed peak files.

See e.g. peak file reading in GLEAMS.

Multi-file searching

To more efficiently process large datasets, enable searching multiple files simultaneously using a glob file pattern, rather than having to search each peak file individually.

When searching multiple files, provide an option to rescore each file individually or using a joint model.

Adding a public Python API

Sometimes it would be nice to be able to run ANN-SoLo without ever leaving the comfort of your Python interpreter. I propose adding a Python API such that a function, say ann_solo.search() would perform the same task as running ANN-SoLo at the command line.

The question is how best to specify all the parameters, since they are normally supplied as CLI parameters and a config file.

Facilitate project contributions

  • Enforce a consistent code style using black.
  • Provide a git pre-commit hook for automatic code formatting.
  • Include contributing guidelines.
  • Include a code of conduct.
  • Use setuptools_scm to automatically get the package number.
  • Set up GitHub Action for automatic package release.
  • Set up GitHub Action for unit tests.
  • Extract notebooks to a separate repository and reorganize code directories.
  • Move the documentation to readthedocs.

Can not run the module

python -m ann_solo.ann_solo --help
Traceback (most recent call last):
  File "/users/zhitingz/miniconda3/envs/work/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/users/zhitingz/miniconda3/envs/work/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/users/zhitingz/miniconda3/envs/work/lib/python3.9/site-packages/ann_solo/__init__.py", line 3, in <module>
    from .ann_solo import ann_solo
  File "/users/zhitingz/miniconda3/envs/work/lib/python3.9/site-packages/ann_solo/ann_solo.py", line 4, in <module>
    from ann_solo import spectral_library
  File "/users/zhitingz/miniconda3/envs/work/lib/python3.9/site-packages/ann_solo/spectral_library.py", line 18, in <module>
    from ann_solo import reader
  File "/users/zhitingz/miniconda3/envs/work/lib/python3.9/site-packages/ann_solo/reader.py", line 17, in <module>
    from ann_solo.parsers import SplibParser
  File "ann_solo/parsers.pyx", line 15, in init ann_solo.parsers
ImportError: cannot import name PeptideFragmentAnnotation

Looking at the source code: the spectrum_utils doesn't contain this function.

Spectral library prediction

As spectrum prediction tools are becoming increasingly more accurate, in case a spectral library for a specific proteome is not available, we can accurately simulate a spectral library instead.

Implement functionality to simulate a spectral library using Prosit. Evaluate whether we want to access the web API they provide (slow but no maintenance for us, does not complicate installation) or whether we want to integrate Prosit into ANN-SoLo directly (fast but makes ANN-SoLo more complex).

  • Predict a spectral library from a FASTA file.
  • Predict a spectral library for decoys generated from a FASTA file.
  • Combine an experimental spectral library and a predicted spectral library.

GPU memory issues

As reported via email:

Regarding ANN-SoLo, I've found that there are some errors when using recent Faiss (v1.7.0+) with ANN-SoLo. I get out of GPU memory errors regardless of batch size (the same configuration in Faiss v1.6.5 had no problem at all).

TODO: Investigate this issue.

Mono-isotopic peak correction

Instrument software sometimes miss-assigns a different isotopic peak as the mono-isotopic peak. This can be especially problematic for open searching, as the mass shifts and interpretation will be incorrect.

Implement a procedure for mono-isotopic peak correction. This will require input from spectrum files that also include MS1 information (e.g. mzML or mzXML #21).

Combine metadata and HDF5 spectral library in one single store

Consider storing the metadata in the HDF5 file as well, instead of a separate config file. Having a self-contained HDF5 file will facilitate the portability of the library files, rather than risking different files to become out of sync with each other.

Rescoring features

To try:

  • Intensity correspondence for first three prefix/suffix ions. (Rationale: these are more reproducible than ions further away from the termini.)

Feature Request: Accept Separate Input files for First/Second Search Stages

Hi ANN-SoLo developers,
I'm attempting to use ANN-SoLo to identified peptides which have been chemically modified with a large adduct. We know the adducted peptides have a characteristic neutral loss fragment and have set up the instrument to only fragment peptides which have been modified such that the MS2 scan should not contain unmodified peptides. We also have datasets on the same samples which use traditional data-dependent scanning (DDA) and should contain the unmodified spectra. Would it be possible to run ANN-SoLo with the first stage on one dataset (DDA) and search for the modified peptides on a second dataset (targeted scan)? Apologies if this is confusing, if it sounds like this is something ANN-SoLo is not designed for I'd also understand. Thanks for your help!
-Ben

Efficient internal library format

As described previously, spectrum IO takes up a large amount of the ANN-SoLo runtime:
image

Especially reading library spectra can be slow because this involves many random accesses to the memmapped spectral library file.

There is obvious room for improvement how we represent the library internally. Whereas query files only need to be read once, a spectral library will be reused for many searches. We should explore and benchmark whether we can come up with a better format to quickly read spectra from (potentially very large) spectral libraries.
The HDF5 index used by depthcharge can be a first starting point.

compile failure for spectrum_match.cpp

Hi,
I'm having trouble with installing ann_solo on my multiple different computers with consistent failure.
Please any advice would be greatly appreciated.
Thanks!
===== pip output =====
Installing collected packages: ann-solo
Running setup.py install for ann-solo ... error
Complete output from command c:\ann_solo\python.exe -u -c "import setuptools, tokenize;file='C:\Users\ADMINI1\AppData\Local\Temp\2\pip-install-1_l_j7rh\ann-solo\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\ADMINI1\AppData\Local\Temp\2\pip-record-myttq23i\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\ann_solo
copying ann_solo\ann_solo.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo\config.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo\plot_ssm.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo\reader.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo\spectral_library.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo\spectrum.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo\util.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo\writer.py -> build\lib.win-amd64-3.7\ann_solo
copying ann_solo_init_.py -> build\lib.win-amd64-3.7\ann_solo
running build_ext
skipping 'ann_solo\spectrum_match.cpp' Cython extension (up-to-date)
building 'ann_solo.spectrum_match' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\ann_solo
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Ic:\ann_solo\lib\site-packages\numpy\core\include -Ic:\ann_solo\include -Ic:\ann_solo\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\winrt" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\winrt" /EHsc /Tpann_solo\spectrum_match.cpp /Fobuild\temp.win-amd64-3.7\Release\ann_solo\spectrum_match.obj -O3 -march=native -ffast-math -fno-associative-math -std=c++14
cl : Command line warning D9002 : ignoring unknown option '-O3'
cl : Command line warning D9002 : ignoring unknown option '-march=native'
cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
cl : Command line warning D9002 : ignoring unknown option '-fno-associative-math'
cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
spectrum_match.cpp
c:\ann_solo\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(12) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
ann_solo\spectrum_match.cpp(3136): warning C4244: 'argument': conversion from 'Py_ssize_t' to 'unsigned int', possible loss of data
ann_solo\spectrum_match.cpp(3285): warning C4244: 'argument': conversion from 'Py_ssize_t' to 'unsigned int', possible loss of data
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Ic:\ann_solo\lib\site-packages\numpy\core\include -Ic:\ann_solo\include -Ic:\ann_solo\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\winrt" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\winrt" /EHsc /Tpann_solo/SpectrumMatch.cpp /Fobuild\temp.win-amd64-3.7\Release\ann_solo/SpectrumMatch.obj -O3 -march=native -ffast-math -fno-associative-math -std=c++14
cl : Command line warning D9002 : ignoring unknown option '-O3'
cl : Command line warning D9002 : ignoring unknown option '-march=native'
cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
cl : Command line warning D9002 : ignoring unknown option '-fno-associative-math'
cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
SpectrumMatch.cpp
ann_solo/SpectrumMatch.cpp(21): error C2131: expression did not evaluate to a constant
ann_solo/SpectrumMatch.cpp(21): note: failure was caused by a read of a variable outside its lifetime
ann_solo/SpectrumMatch.cpp(21): note: see usage of 'num_shifts'
ann_solo/SpectrumMatch.cpp(24): error C3863: array type 'unsigned int [num_shifts]' is not assignable
ann_solo/SpectrumMatch.cpp(26): error C2131: expression did not evaluate to a constant
ann_solo/SpectrumMatch.cpp(26): note: failure was caused by a read of a variable outside its lifetime
ann_solo/SpectrumMatch.cpp(26): note: see usage of 'num_shifts'
ann_solo/SpectrumMatch.cpp(27): error C3863: array type 'double [num_shifts]' is not assignable
ann_solo/SpectrumMatch.cpp(30): error C3863: array type 'double [num_shifts]' is not assignable
ann_solo/SpectrumMatch.cpp(53): error C2668: 'fabs': ambiguous call to overloaded function
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include\cmath(397): note: could be 'long double fabs(long double) noexcept'
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include\cmath(103): note: or 'float fabs(float) noexcept'
C:\Program Files (x86)\Windows Kits\10\include\10.0.15063.0\ucrt\corecrt_math.h(477): note: or 'double fabs(double)'
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include\xtgmath.h(97): note: or '::std::enable_if<std::is_integral<double [num_shifts]>::value,double>::type fabs<double [num_shifts]>(_Ty)'
with
[
_Ty=double [num_shifts]
]
ann_solo/SpectrumMatch.cpp(53): note: while trying to match the argument list '(double [num_shifts])'
ann_solo/SpectrumMatch.cpp(82): error C2027: use of undefined type 'std::tuple<double,unsigned int,_Unrefwrap<unsigned int(&)[num_shifts]>::type>'
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include\tuple(220): note: see declaration of 'std::tuple<double,unsigned int,_Unrefwrap<unsigned int(&)[num_shifts]>::type>'
ann_solo/SpectrumMatch.cpp(82): error C2664: 'void std::vector<std::tuple<float,unsigned int,unsigned int>,std::allocator<_Ty>>::push_back(_Ty &&)': cannot convert argument 1 from 'std::tuple<double,unsigned int,_Unrefwrap<unsigned int(&)[num_shifts]>::type>' to 'const std::tuple<float,unsigned int,unsigned int> &'
with
[
_Ty=std::tuple<float,unsigned int,unsigned int>
]
ann_solo/SpectrumMatch.cpp(82): note: Reason: cannot convert from 'std::tuple<double,unsigned int,_Unrefwrap<unsigned int(&)[num_shifts]>::type>' to 'const std::tuple<float,unsigned int,unsigned int>'
ann_solo/SpectrumMatch.cpp(82): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX86\x64\cl.exe' failed with exit status 2

----------------------------------------

Command "c:\ann_solo\python.exe -u -c "import setuptools, tokenize;file='C:\Users\ADMINI1\AppData\Local\Temp\2\pip-install-1_l_j7rh\ann-solo\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\ADMINI1\AppData\Local\Temp\2\pip-record-myttq23i\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\ADMINI~1\AppData\Local\Temp\2\pip-install-1_l_j7rh\ann-solo\

OS Compatibility

May I ask if ANN-SoLo has been tested using Windows? Is it strictly compatible with Linux?

AttributeError: 'MsmsSpectrum' object has no attribute 'query_identifier'

Installed ANN-SoLo from conda, and getting this almost instantly:

Library spectra read: 0spectra [00:00, ?spectra/s]
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/bin/ann_solo", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/ann_solo/ann_solo.py", line 64, in main
    spec_lib = spectral_library.SpectralLibrary(
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/ann_solo/spectral_library.py", line 64, in __init__
    self._library_reader = reader.SpectralLibraryReader(
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/ann_solo/reader.py", line 94, in __init__
    self._create_config()
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/ann_solo/reader.py", line 137, in _create_config
    info_charge['id'].append(spectrum.query_identifier)
AttributeError: 'MsmsSpectrum' object has no attribute 'query_identifier'

Diving into source shows MsmsSpectrum comes from spectrum_utils, and I see no signs of query_identifier ever being there, only identifier.
Is there some special spectrum_utils version to be installed?

identifier -> query_identifier comes from commit in ea3a647

ann_solo_plot error -- TypeError: __init__() takes from 3 to 4 positional arguments but 5 were given

Ran into this while trying to run ann_solo_plot:

(base) lindsay@pug:~/proj/cgas_pulldown/results/annsolo_plots$ ann_solo_plot ../annsolo/output_2_high.mztab 093019_822_2_high_129.3346.3346.3
WARNING:root:Missing spectral library configuration file
Library spectra read: 7091spectra [00:00, 16464.49spectra/s]
Traceback (most recent call last):
  File "/home/lindsay/miniconda3/bin/ann_solo_plot", line 10, in <module>
    sys.exit(main())
  File "/home/lindsay/miniconda3/lib/python3.7/site-packages/ann_solo/plot_ssm.py", line 106, in main
    set_matching_peaks(library_spectrum, query_spectrum)
  File "/home/lindsay/miniconda3/lib/python3.7/site-packages/ann_solo/plot_ssm.py", line 28, in set_matching_peaks
    fragment_annotation = FragmentAnnotation('z', 1, 1, 0)
TypeError: __init__() takes from 3 to 4 positional arguments but 5 were given

Complementary peaks in spectrum vectors

Currently spectra are converted to vectors by binning their peaks, after which they are transformed to lower-dimensional vectors.

We should explore whether appending complementary peaks or neutral losses to the vector improves performance of open searches. Specifically, the high-dimensional spectrum vector would consist of the binned peaks and the binned neutral losses, after which the low-dimensional transformation is performed on the concatenated vector.

The hypothesis is that this will improve the candidate selection step from the ANN index. Candidate selection is performed using the cosine similarity, without considering shifted peaks. Instead, complementary peaks might still capture some peak shifts.

Output Documentation

In the output documentation it says that the "accession" in the mztab refers to "The accession of the library spectrum in the spectral library.". However, in the newest ANN-SoLo version you can find this in "opt_ms_run[1]_cv_MS:1003062_spectrum_index".

Problem with v0.3.0 pip install

I tried installing the version 0.3.0 via pip and ran into this issue on both my Mac and Google Colab:

(base) $ pip install ann_solo
Collecting ann_solo
  Using cached ann_solo-0.3.0.tar.gz (397 kB)
    ERROR: Command errored out with exit status 1:
     command: /usr/local/Caskroom/miniconda/base/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/privat /var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/setup.py'"'"'; __file__='"'"'/private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-pip-egg-info-zbrkw9w8
         cwd: /private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/
    Complete output (13 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/setup.py", line 5, in <module>
        import ann_solo
      File "/private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/ann_solo/__init__.py", line 3, in <module>
        from .ann_solo import ann_solo
      File "/private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/ann_solo/ann_solo.py", line 4, in <module>
        from ann_solo import spectral_library
      File "/private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/ann_solo/spectral_library.py", line 18, in <module>
        from ann_solo import reader
      File "/private/var/folders/kz/w4rz5ym12sj4rz3_p657v3br0000gn/T/pip-install-e9hu9kfw/ann-solo/ann_solo/reader.py", line 17, in <module>
        from ann_solo.parsers import SplibParser
    ModuleNotFoundError: No module named 'ann_solo.parsers'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

It is also worth noting that I do not encounter this error when I install from the source directly with cd src && python setup.py install or cd src && pip install -e .

Low-memory support

Running ANN-SoLo can lead to excessive memory requirements:

  • The candidate mask takes up O(num_candidates x num_library_spectra) memory. For a default batch size of 16,384 and a spectral library of 4 million spectra, this requires more than 8 GB (best-case scenario: 1-bit booleans). This memory requirement is duplicated for the ANN mask. A potential solution would be to iterate over batches of library candidates as well.

  • The ANN index needs to fit into the GPU memory, which will be problematic for large spectral libraries or low-memory GPUs. Potential solution: shard the index. This has some additional benefit that the shards can be processed using multiple GPUs.

specify a GPU when multiple GPUs present

Hi,
It seems like ANN-Solo always takes the GPU 0 to process its job.
I wonder it's possible either utilizing the multiple GPUs for a single job or specifying which GPU should be utilized for a single job so that I can run multiple ANN-Solo instances at the same time.

Thank you so much.

Profile and improve runtime

Profile runtime to identify bottlenecks and improve speed by optimally parallelizing as many parts of the code as possible.

  • Parallelize multiple peak file reading #26.
  • Parallelize spectrum preprocessing.
  • Parallelize spectrum similarity calculations.
  • Parallelize rescoring feature calculations.
  • Use a GPU-powered ML model for rescoring.

return code is always 1 (even on success)

I tried using ann-solo in scripts, and it seemingly always fails because of imporperly set error code.

Checked source, and I see its calling sys.exit with result of main
call, then main returns string, sys.exit coerces this to integer, string is
non-empty -> its effectively exit(1) (error flag set).

suggested patch (works for me, probably could be done better):

--- /home/ubuntu/miniconda3/lib/python3.8/site-packages/ann_solo/ann_solo.py.bak        2020-10-23 13:56:01.308069954 +0000
+++ /home/ubuntu/miniconda3/lib/python3.8/site-packages/ann_solo/ann_solo.py    2020-10-23 13:57:48.123426237 +0000
@@ -51,7 +51,7 @@
     return out_filename


-def main(args: Union[str, List[str]] = None) -> str:
+def main(args: Union[str, List[str]] = None) -> int:
     # Initialize logging.
     logging.basicConfig(format='{asctime} [{levelname}/{processName}] '
                                '{module}.{funcName} : {message}',
@@ -70,7 +70,7 @@

     logging.shutdown()

-    return out_filename
+    return 0


 if __name__ == '__main__':

installation

Hi,

I'm attempting to install ANN-SoLo in a new conda environment. When I run pip install, I get the following error

Building wheels for collected packages: ann-solo
  Building wheel for ann-solo (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [28 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.9
      creating build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/plot_ssm.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/spectrum.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/reader.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/utils.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/__init__.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/config.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/writer.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/spectral_library.py -> build/lib.linux-x86_64-3.9/ann_solo
      copying ann_solo/ann_solo.py -> build/lib.linux-x86_64-3.9/ann_solo
      running build_ext
      Compiling ann_solo/spectrum_match.pyx because it changed.
      [1/1] Cythonizing ann_solo/spectrum_match.pyx
      warning: ann_solo/spectrum_match.pyx:17:58: The keyword 'nogil' should appear at the end of the function signature line. Placing it before 'except' or 'no             except' will be disallowed in a future version of Cython.
      warning: ann_solo/spectrum_match.pyx:23:40: The keyword 'nogil' should appear at the end of the function signature line. Placing it before 'except' or 'no             except' will be disallowed in a future version of Cython.
      building 'ann_solo.spectrum_match' extension
      creating build/temp.linux-x86_64-3.9
      creating build/temp.linux-x86_64-3.9/ann_solo
      gcc -pthread -B /home/colin/miniconda3/envs/ann_solo_39/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /h             ome/colin/miniconda3/envs/ann_solo_39/include -fPIC -O2 -isystem /home/colin/miniconda3/envs/ann_solo_39/include -fPIC -I/home/colin/miniconda3/envs/ann_solo_39             /lib/python3.9/site-packages/numpy/core/include -I/home/colin/miniconda3/envs/ann_solo_39/include/python3.9 -c SpectrumMatch.cpp -o build/temp.linux-x86_64-3.9/             SpectrumMatch.o -O3 -march=native -ffast-math -fno-associative-math -std=c++14
      gcc: error: SpectrumMatch.cpp: No such file or directory
      gcc: fatal error: no input files
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for ann-solo
  Running setup.py clean for ann-solo
Failed to build ann-solo
ERROR: Could not build wheels for ann-solo, which is required to install pyproject.toml-based projects

Any help appreciated,
Colin

Problem with 0.3.3 install

Inside NGC docker container nvcr.io/nvidia/pytorch:22.02-py3:

>>> conda install numpy faiss-cpu
[succeeds]
>>> pip install ann-solo
[...]
      /opt/conda/envs/ppx-workflow/include/python3.10/cpython/unicodeobject.h:551:42: note: declared here
        551 | Py_DEPRECATED(3.3) PyAPI_FUNC(PyObject*) PyUnicode_FromUnicode(
            |                                          ^~~~~~~~~~~~~~~~~~~~~
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> ann-solo

note: This is an issue with the package mentioned above, not pip.

gcc is newest version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.