bittremieux / falcon Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 7.0 1.26 MB

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

clustering mass-spectrometry nearest-neighbor-search

falcon's People

Stargazers

Watchers

Forkers

mwang87 kilianmaes trishorts yguitton tycheyoung yasinel

falcon's Issues

[Bug] When no retention time is present in MGF, it throws error

Can write RT in try catch and default to 0, as its not used.

Error thrown when no cluster was found for at least one charge

Issue Description

When clustering a small dataset and no cluster is found for at least one charge (e.g. for charge +2), an error is thrown :

Traceback (most recent call last):
  File "falcon.py", line 255, in <module>
    main()
  File "falcon.py", line 113, in main
    current_label = np.amax(clusters[mask_no_noise]) + 1
  File "<__array_function__ internals>", line 5, in amax
  File "/home/maesk/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2733, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/maesk/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

This is because clusters[mask_no_noise] is an empty array, which is not supported by the function np.amax().

Solution

When no cluster is found for a specific charge, the current label should stay the same (see the PR), there is no need to call np.amax().

@bittremieux could you check if that seems correct? Thank you!

EDIT : I used an environment built from your environment.yml. If you need an .mgf file in order to reproduce the error, I can upload it.

Several issues encountered at the execution

Dear bittremieux,

I just tried the falcon program for the first time (at least with python3 falcon.py --help), and it seems that python experiences some errors during execution.

A first error was related to the absence of a so-called __version__ script. This caused an import error message from python as follows:

Traceback (most recent call last):
  File "/home/eli_rousseau/.local/lib/python3.10/site-packages/falcon/falcon.py", line 21, in <module>
    from . import __version__
ImportError: attempted relative import with no known parent package

When I then edited the source code of both config.py and falcon.py to take out the lines causing the error messages, I got another error message. This time because the config module would be missing the parse attribute, as follows:

Traceback (most recent call last):
  File "c:\Users\rouss\AppData\Local\Programs\Python\Python311\Lib\site-packages\falcon\falcon.py", line 439, in <module>
    sys.exit(main())
             ^^^^^^
  File "c:\Users\rouss\AppData\Local\Programs\Python\Python311\Lib\site-packages\falcon\falcon.py", line 47, in main
    config.parse(args)
    ^^^^^^^^^^^^
AttributeError: module 'config' has no attribute 'parse'

So that's where I'm stuck for now. I have the impression that a lot of errors occur during configuration. Maybe you have an idea what it could be due to. I installed the program with the pip package manager as described. I tried the program both on Linux Command-Line and Windows PowerShell.

Thanks in advance for your support.

Eli Rousseau.

Investigating splitting

Sometimes we have same precursor m/z and very similar MS/MS not ending up in the same cluster, even with high EPS values. One example here:

Falcon Clustering
Networking

We can see here repetitions of 327 m/z

Specifically, we can see the reptitions here in the clustering specifically:

Link

Just one example, two clusters

mzspec:GNPS:TASK-48f893dc8a4147e59798910e6c866ce2-workflow_results/clustered_result.mgf:scan:327
mzspec:GNPS:TASK-48f893dc8a4147e59798910e6c866ce2-workflow_results/clustered_result.mgf:scan:326

Further dimensionality reduction

Hello,

As discussed by email with @bittremieux, I compared the hashing trick currently used in falcon with other dimensionality reduction methods. Here are my results.

Dimensionality reduction methods

I compared the following methods :

Hashing trick (by using the functions directly available in falcon);
PCA (sklearn implementation);
Gaussian random projection (sklearn implementation).

Since the dimensionality reduction methods proposed in sklearn require high dimensional vectors as input, I've implemented a function to convert spectra to high dimensional vectors, sp_to_vecHD (which is not required for the hashing trick since the spectra are converted to low dimensional vectors on the fly, without storing the high dimensional vectors in memory).

Comparison for 800 components

Since falcon uses the cosine distance to measure distances between vectors, I compared how the cosine distance between pairs of spectra was conserved from the HD vector representation of the spectra and the corresponding LD vector. Ideally, the distance should be conserved as much as possible, s.t. the loss of precision caused by the dimensionality reduction is insignificant.

In the following figure, I plotted these distances (HD distance vs LD distance) for all the pairs of a subset of 5000 spectra (charge 2, the first 5000 spectra with a precursor mz greater than 600 in the draft of the human proteome).

It shows that for close pairs of spectra (distances close to 0), the hashing trick does the job (it works well if small epsilons are used in DBSCAN, e.g. 0.1). However, for larger distances (e.g., if we use an epsilon equal to 0.3), the distance between some pairs of points is overestimated: some pairs of spectra for which the HD vectors are separated by a large distance are converted to LD vectors for which the cosine distance is under-evaluated (bottom right part of the plot). It seems PCA is not a good candidate (the mean squared error, used to assess the difference of distance between HD and LD pairs of vectors, is worse than for the hashing trick). However, Gaussian random projection shows good results: it preserves better the cosine distance during dimensionality reduction.

Note that some distances between LD vectors can become larger than 1: LD vectors obtained with random projection can have negative components (which was not the case with the hashing trick), thus the cosine can be negative and the cosine distance > 1. This is not a problem.

Comparison for a similar MSE

Since Gaussian random projection preserves better the distances than the hashing trick (of course, it would be useful to verify that on other subsets to make sure these results are generalizable), we could consider further reduce the dimensionality: instead of using 800 components, we can obtain similar precision compared to the hashing trick with only 200 components by using Gaussian random projection:

Performances

Of course, the performance is crucial for falcon. I cannot provide a clear comparison between the two methods, since the implementations differ. During my tests, I generated HD vectors corresponding to the vectors, then I used the implementation provided by sklearn for random projections. However, it would easily be possible to perform the conversion on the fly as is the case for the hashing trick. Note that random projections introduce a memory overhead: it requires storing the matrix used for the projection, which is of size d * d' (where d is the number of components in HD and d' is the number of components in LD). For example, for d = 28 000 (default) and d' = 200, the matrix would contain ~5M entries which (only) requires 44 MB (8 bytes/float).

I can propose a PR implementing random projection in falcon, such that we can evaluate the gain in performance (reducing the number of dimensions should improve the speed of the approximate neighbor search).

Bug: new version of spectrum_utils changed API interface

falcon/falcon/ms_io/mzxml_io.py

Line 65 in ad615b5

mz_array, intensity_array, None, retention_time)

Singleton cluster number

Singletons always have the label -1 in the cluster assignment CSV. Instead, make sure they have a unique cluster number when export_include_singletons is enabled.

mzML parsing issue

Some mzML files are reported to have 0 MS/MS spectra. For example: MSV000083463/Milk_final_11712.SUBJECT57.MBM.1333.mzML. Equivalent mzXML files do work.

Help needed with error with positional arguments??

Hi I keep getting this error when trying to run falcon

TypeError: __init__() takes from 6 to 7 positional arguments but 8 were given

This is the argument I am using
falcon spectra_Bt1/*.mzML network_Bt1 --work_dir ~/Documents/falcon_t1 --export_representatives --export_include_singletons --precursor_tol 0.05 Da --rt_tol 0.1 --min_peaks 3 --min_mz_range 100.0 --min_mz 150.0 --max_mz 2200.0 --min_intensity 0.01 --max_peaks_used 500

And the full log I get when trying to run it:

2022-11-03 16:07:43,222 INFO [falcon/MainProcess] falcon.main : falcon version 0.1.3
2022-11-03 16:07:43,222 DEBUG [falcon/MainProcess] falcon.main : work_dir = /Users/dulce_guillen/Documents/falcon_t1
2022-11-03 16:07:43,222 DEBUG [falcon/MainProcess] falcon.main : overwrite = False
2022-11-03 16:07:43,222 DEBUG [falcon/MainProcess] falcon.main : export_representatives = True
2022-11-03 16:07:43,222 DEBUG [falcon/MainProcess] falcon.main : usi_pxd = USI000000
2022-11-03 16:07:43,222 DEBUG [falcon/MainProcess] falcon.main : precursor_tol = 0.05 Da
2022-11-03 16:07:43,222 DEBUG [falcon/MainProcess] falcon.main : rt_tol = 0.1
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : fragment_tol = 0.05
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : eps = 0.100
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : min_samples = 2
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : mz_interval = 1
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : hash_len = 800
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : n_neighbors = 64
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : n_neighbors_ann = 128
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : batch_size = 65536
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : n_probe = 32
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : min_peaks = 3
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : min_mz_range = 100.00
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : min_mz = 150.00
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : max_mz = 2200.00
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : remove_precursor_tol = 1.50
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : min_intensity = 0.01
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : max_peaks_used = 500
2022-11-03 16:07:43,223 DEBUG [falcon/MainProcess] falcon.main : scaling = off
2022-11-03 16:07:43,223 WARNING [root/MainProcess] falcon.main : Working directory /Users/dulce_guillen/Documents/falcon_t1 already exists, previous results might get overwritten
2022-11-03 16:07:43,224 INFO [falcon/MainProcess] falcon._prepare_spectra : Read spectra from 51 peak file(s)
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/falcon/falcon.py", line 348, in _read_spectra
    for spec in ms_io.get_spectra(filename):
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/falcon/ms_io/ms_io.py", line 40, in get_spectra
    for spec in spectrum_io.get_spectra(filename):
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/falcon/ms_io/mzml_io.py", line 38, in get_spectra
    yield _parse_spectrum(spectrum_dict)
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/falcon/ms_io/mzml_io.py", line 75, in _parse_spectrum
    return sus.MsmsSpectrum(spectrum_id, precursor_mz, precursor_charge,
TypeError: __init__() takes from 6 to 7 positional arguments but 8 were given
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/bin/falcon", line 8, in <module>
    sys.exit(main())
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/falcon/falcon.py", line 122, in main
    buckets = _prepare_spectra()
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/falcon/falcon.py", line 300, in _prepare_spectra
    for file_spectra in joblib.Parallel(n_jobs=-1)(
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/parallel.py", line 1098, in __call__
    self.retrieve()
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/Users/dulce_guillen/opt/anaconda3/envs/py3/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
TypeError: __init__() takes from 6 to 7 positional arguments but 8 were given

My files don't have any underscore nor dots in the name (not sure if it's relevant) and the data comes from a timsTOF and was converted using the timsconvert workflow in the GNPS server. I can successfully run falcon in the server, so I know there is nothing wrong with the files. I wanted to change some of the default parameters, so I switched to command line.

I'm very new to coding and python, so I'm not sure what is the mistake and how to fix it. Any help would be greatly appreciated!! Thanks!

[Bug] Exporting consensus MS2 spectra as MGF

Or other format

DLL load failed while importing swigfaiss

When trying to run Falcon on Windows Powershell (x64 run as admin) with Python 3.9, I get the following error:

falcon : Traceback (most recent call last):
At line:1 char:1
+ ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
 
  File "c:\users\chras\appdata\local\programs\python\python39\lib\site-packages\faiss\loader.py", line 34, in <module>
    from .swigfaiss import *
  File "c:\users\chras\appdata\local\programs\python\python39\lib\site-packages\faiss\swigfaiss.py", line 13, in 
<module>
    from . import _swigfaiss
ImportError: DLL load failed while importing _swigfaiss: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\chras\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\chras\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\chras\AppData\Local\Programs\Python\Python39\Scripts\falcon.exe\__main__.py", line 4, in <module>
  File "c:\users\chras\appdata\local\programs\python\python39\lib\site-packages\falcon\falcon.py", line 22, in <module>
    from .cluster import cluster, spectrum
  File "c:\users\chras\appdata\local\programs\python\python39\lib\site-packages\falcon\cluster\cluster.py", line 6, in 
<module>
    import faiss
  File "c:\users\chras\appdata\local\programs\python\python39\lib\site-packages\faiss\__init__.py", line 17, in 
<module>
    from .loader import *
  File "c:\users\chras\appdata\local\programs\python\python39\lib\site-packages\faiss\loader.py", line 39, in <module>
    from .swigfaiss import *
  File "c:\users\chras\appdata\local\programs\python\python39\lib\site-packages\faiss\swigfaiss.py", line 13, in 
<module>
    from . import _swigfaiss
ImportError: DLL load failed while importing _swigfaiss: The specified module could not be found.

Falcon was installed via pip.

Automatic precursor/fragment mass tolerance

Automatically determine the precursor mass tolerance and fragment mass tolerance.

Can this be done using Param-Medic? Porting the software to Python 3 would be necessary. Alternative, relevant code can be integrated in falcon directly.

np.object has been deprecated

NumPy 1.24.4 results in the following error:

2023-07-12 15:21:00,375 WARNING [py.warnings/MainProcess] warnings._showwarnmsg : /home/wout/.conda/envs/falcon/lib/python3.11/site-packages/falcon/cluster/cluster.py:509: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
neighborhoods_arr = np.empty(len(neighborhoods), dtype=np.object)

Traceback (most recent call last):
File "/home/wout/.conda/envs/falcon/bin/falcon", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wout/.conda/envs/falcon/lib/python3.11/site-packages/falcon/falcon.py", line 181, in main
clusters = cluster.generate_clusters(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wout/.conda/envs/falcon/lib/python3.11/site-packages/falcon/cluster/cluster.py", line 509, in generate_clusters
neighborhoods_arr = np.empty(len(neighborhoods), dtype=np.object)
^^^^^^^^^
File "/home/wout/.conda/envs/falcon/lib/python3.11/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'object_'?

This can be fixed by changing np.object to object here

Full File Path in output results

It would be great to list the full file paths of the input files rather than just the filename without extension in the USI and also list out the scan number explicitly as a separate column.

Considerably less clusters when scaling is applied

Hello,

I'm using falcon to cluster a large proteomic dataset (20M spectra). I've compared the number of clusters generated depending on the scaling method applied on the peaks (root, log, rank or none), and I get very different results :

Parameters

I kept all the parameters by default excepted :
eps = 0.35
charge = (2, 3, 4, 5)

Results for ~39 000 MS2 spectra (one fraction)

Scaling	# clustered spectra	% clustered spectra	# clusters
root	148	0.38%	54
log	2	0.01%	1
rank	2	0.01%	1
None	883	2.24%	359

Results for ~460 000 MS2 spectra (12 fractions)

Scaling	# clustered spectra	% clustered spectra	# clusters
root	4642	1.00%	1656
log	26	0.01%	12
rank	39	0.01%	19
None	34519	7.45%	13336

Discussion

I expected to get a different number of clusters depending on the scaling method applied on the peaks, but I think that the difference is very large for my dataset. Moreover, even when no scaling is applied (and the number of clusters is the largest), only a few percent of spectra are being clustered, which is far from the order of magnitude in the results published, where the proportion of clustered spectra ranges from ~20% to ~60% (with the 'rank' scaling).

@bittremieux did you observe similar results during your experiments on some dataset? Do you have some idea that could explain the difference presented above? I would be very grateful... If needed, I can upload the files in order to reproduce these results.

Cc @lgatto

Adding commandline input to override configs

We will likely be using things like glob internally to help find input files

Similar MS/MS did not make it into clusters output

These set of spectra are nearly the same, but do not appear in the clustered result set - Link.

Some examples to make it more clear:

mzspec:GNPS:TASK-f5094e83a4f042e88f0d423dcb52b11c-query_results/extracted/extracted_mzML/extracted_53.mzML:scan:2678
mzspec:GNPS:TASK-f5094e83a4f042e88f0d423dcb52b11c-query_results/extracted/extracted_mzML/extracted_59.mzML:scan:2544

We can note in the output from Falcon, there is no precursor m/z in this mass range - Link

Provide optional rough RT window for clustering

Should be off by default

MS2 spectra without a precursor charge are ignored

Hello, I was trying to cluster mzXML files from https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=88a7dfeeecb74131a6d6bfb7a9db0a46 in WSL:Ubuntu-22.04 but it does not seem to recognize any spectra. My parameters and output are below:

falcon BAX89_BA1_01_23240.mzXML falcon
2024-05-04 18:26:51,147 INFO [falcon/MainProcess] falcon.main : falcon version 0.1.3
2024-05-04 18:26:51,147 DEBUG [falcon/MainProcess] falcon.main : work_dir = None
2024-05-04 18:26:51,147 DEBUG [falcon/MainProcess] falcon.main : overwrite = False
2024-05-04 18:26:51,148 DEBUG [falcon/MainProcess] falcon.main : export_representatives = True
2024-05-04 18:26:51,148 DEBUG [falcon/MainProcess] falcon.main : usi_pxd = USI000000
2024-05-04 18:26:51,148 DEBUG [falcon/MainProcess] falcon.main : precursor_tol = 20.00 ppm
2024-05-04 18:26:51,148 DEBUG [falcon/MainProcess] falcon.main : rt_tol = None
2024-05-04 18:26:51,148 DEBUG [falcon/MainProcess] falcon.main : fragment_tol = 0.05
2024-05-04 18:26:51,149 DEBUG [falcon/MainProcess] falcon.main : eps = 0.100
2024-05-04 18:26:51,149 DEBUG [falcon/MainProcess] falcon.main : min_samples = 2
2024-05-04 18:26:51,149 DEBUG [falcon/MainProcess] falcon.main : mz_interval = 1
2024-05-04 18:26:51,149 DEBUG [falcon/MainProcess] falcon.main : hash_len = 800
2024-05-04 18:26:51,150 DEBUG [falcon/MainProcess] falcon.main : n_neighbors = 64
2024-05-04 18:26:51,150 DEBUG [falcon/MainProcess] falcon.main : n_neighbors_ann = 128
2024-05-04 18:26:51,150 DEBUG [falcon/MainProcess] falcon.main : batch_size = 65536
2024-05-04 18:26:51,150 DEBUG [falcon/MainProcess] falcon.main : n_probe = 32
2024-05-04 18:26:51,150 DEBUG [falcon/MainProcess] falcon.main : min_peaks = 5
2024-05-04 18:26:51,150 DEBUG [falcon/MainProcess] falcon.main : min_mz_range = 10.00
2024-05-04 18:26:51,150 DEBUG [falcon/MainProcess] falcon.main : min_mz = 40.00
2024-05-04 18:26:51,151 DEBUG [falcon/MainProcess] falcon.main : max_mz = 1500.00
2024-05-04 18:26:51,151 DEBUG [falcon/MainProcess] falcon.main : remove_precursor_tol = 1.50
2024-05-04 18:26:51,151 DEBUG [falcon/MainProcess] falcon.main : min_intensity = 0.01
2024-05-04 18:26:51,151 DEBUG [falcon/MainProcess] falcon.main : max_peaks_used = 50
2024-05-04 18:26:51,151 DEBUG [falcon/MainProcess] falcon.main : scaling = off
2024-05-04 18:26:51,156 INFO [falcon/MainProcess] falcon._prepare_spectra : Read spectra from 1 peak file(s)
2024-05-04 18:27:02,645 DEBUG [falcon/MainProcess] falcon._prepare_spectra : 0 spectra written to 0 buckets by precursor charge and precursor m/z
2024-05-04 18:27:02,655 ERROR [falcon/MainProcess] falcon.main : No valid spectra found for clustering

Tried to convert the mzXML to mzML and mgf via ProteoWizard 3.0.24124 but that did not solve the issue. I have confirmed that the files contain indeed MS2 spectra.

Thank you for the support!

Experiencing Error in Script Execution.

Greetings,

Firstly, I would like to thank you for providing this open-source tool. I am new to programming and was trying to run the package you provided. I am consistently experiencing error. Please refer to screenshot of error message below. Please do advise me on how to troubleshoot.

Regards,
Ben

Precursor m/z correction

Make sure the correct monoisotopic precursor m/z is used. See for example Monocle.

Memory consumption

Manage memory consumption when reading the input files initially. Currently all files are read in bulk, potentially a producer–consumer model can be used to chunk the input instead.