emanuega / merlin Goto Github PK
View Code? Open in Web Editor NEWMERlin is an extensible analysis pipeline applied to decoding MERFISH data
License: MIT License
MERlin is an extensible analysis pipeline applied to decoding MERFISH data
License: MIT License
I have been getting a true divide runtime warning during the filterbarcodes task. Would you know what is causing this? Any help would be appreciated.
-Alex
Error:
Job 9: Running AdaptiveFilterBarcodes 0
C:\Users\alexws2\Anaconda3\envs\merlin_env\lib\site-packages\matplotlib\axes_axes.py:8192: RuntimeWarning: invalid value encountered in true_divide
vals = 0.5 * width * vals / vals.max()
C:\Users\alexws2\Anaconda3\envs\merlin_env\lib\site-packages\pandas\core\series.py:856: RuntimeWarning: divide by zero encountered in log10
result = getattr(ufunc, method)(*inputs, **kwargs)
MERlin - the MERFISH decoding pipeline
Running AdaptiveFilterBarcodes
c:\users\alexws2\merlin\merlin\analysis\filterbarcodes.py:122: RuntimeWarning: divide by zero encountered in true_divide
blankFraction = blankHistogram / totalHistogram
c:\users\alexws2\merlin\merlin\analysis\filterbarcodes.py:122: RuntimeWarning: invalid value encountered in true_divide
blankFraction = blankHistogram / totalHistogram
c:\users\alexws2\merlin\merlin\analysis\filterbarcodes.py:130: RuntimeWarning: overflow encountered in true_divide
blankBarcodeCount + codingBarcodeCount)
C:\Users\alexws2\Anaconda3\envs\merlin_env\lib\site-packages\scipy\optimize\zeros.py:341: RuntimeWarning: Tolerance of 0.09999999999999998 reached.
warnings.warn(msg, RuntimeWarning)
c:\users\alexws2\merlin\merlin\analysis\filterbarcodes.py:122: RuntimeWarning: invalid value encountered in true_divide
blankFraction = blankHistogram / totalHistogram
c:\users\alexws2\merlin\merlin\analysis\filterbarcodes.py:130: RuntimeWarning: overflow encountered in true_divide
blankBarcodeCount + codingBarcodeCount)
Sometimes when rerunning analysis that has been copied to a new location, snakemake will rerun all analysis since the timestamps are no longer in the correct order. Should we update the snakefile writer to ignore timestamps by adding the ancient tag (https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#ignoring-timestamps)?
Some processes could be faster if they were done on a GPU. Thoughts on whether this is worth testing? Clusters of GPUs are I think rare and probably expensive so maybe it doesn't make that much sense for the typical use case of analyzing 100s of FOV in parallel?
I'm little bit confused about the plotting function. I want to plot my own data using your plotting function to create those atlas image with my data points on them. which specific function should I implement? Little bit confused about the doc. Thanks!
It might be a good idea to update Zhuanglab/MERlin so that people know that it is not the most recent version. The README.md of the current version on Zhuanglab does not have a link to this repository.
Just wanted to mention that I noticed that I've built up a lot (85 Gb) of .snakemake/tmp.* files in my MERlin directory. I'm assuming this happens because we don't lock the directory, so if an analysis fails there's no need to remove this prior to running the next instance. I'm not sure what would be safe to use to eliminate these directories at the start of an analysis, but I think it might be worth including something to do this.
I've been encountering this error in my most recent data analyses during the adaptive filtering. Have you run into this, or have a sense of what this might be about? I can dig into it more tomorrow.
File "/n/home13/seichhorn/MERlin/merlin/analysis/filterbarcodes.py", line 379, in _run_analysis
self.parameters['misidentification_rate'])
File "/n/home13/seichhorn/MERlin/merlin/analysis/filterbarcodes.py", line 178, in calculate_threshold_for_misidentification_rate
args=[targetMisidentificationRate], tol=0.001, x1=0.3)
File "/n/home13/seichhorn/.conda/envs/MERlin/lib/python3.6/site-packages/scipy/optimize/zeros.py", line 340, in newton
raise RuntimeError(msg)
RuntimeError: Tolerance of -0.003639641998025245 reached. Failed to converge after 7 iterations, value is 0.44139608611659104.
I'm getting this error in the CleanCellBoundaries step of the pipeline. Is this a bug in the code, or something I'm doing wrong?
Traceback (most recent call last):
File "C:\Users\ckern\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\ckern\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\users\ckern\merlin\merlin\__main__.py", line 3, in <module>
merlin()
File "c:\users\ckern\merlin\merlin\merlin.py", line 138, in merlin
e.run(task, index=args.fragment_index)
File "c:\users\ckern\merlin\merlin\core\executor.py", line 52, in run
task.run(index)
File "c:\users\ckern\merlin\merlin\core\analysistask.py", line 329, in run
raise e
File "c:\users\ckern\merlin\merlin\core\analysistask.py", line 320, in run
self._run_analysis(fragmentIndex)
File "c:\users\ckern\merlin\merlin\analysis\segment.py", line 168, in _run_analysis
.read_features(currentFOV)
File "c:\users\ckern\merlin\merlin\util\spatialfeature.py", line 500, in read_features
self._load_feature_from_hdf5_group(featureGroup[k]))
File "c:\users\ckern\merlin\merlin\util\spatialfeature.py", line 456, in _load_feature_from_hdf5_group
zGroup['p_' + str(p)]))
File "c:\users\ckern\merlin\merlin\util\spatialfeature.py", line 443, in _load_geometry_from_hdf5_group
return geometry.shape(geometryDict)
File "C:\Users\ckern\Anaconda3\lib\site-packages\shapely\geometry\geo.py", line 35, in shape
if not ob["coordinates"]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I'm attaching my analysis tasks file (renamed from .json to .txt because Github wouldn't let me attach a JSON). Let me know if you need any other information.
MERlin/merlin/analysis/globalalign.py
Line 10 in 9024430
I am wondering if the code that distinguishes the nuclear from the cytoplasmic RNA is available?
Thank you!
I was thinking it might be nice to allow users to run adaptive filter based on partitioned barcodes. The higher background in cells seems to result in a somewhat higher false-positive rate when considering only parsed barcodes. I was thinking the generate adaptive filter task could have a parameter that enables it to consider either all or just partitioned barcodes.
Since shapely version 1.7a2, it is not longer possible to create a geometry object using a numpy array of coordinates. We should update SpatialFeature to be compatible with this change in shapely.
I followed the instructions to install merlin. I have also created the appropriate .env file with the path to parameters, data and analysis folder. I wanted to test the setup using the code structure mentioned in the documentation
$ merlin -a analysis.json -m microscope_parameters.json -o dataorganization.csv -c codebook_0_M22E1_0.csv -n 5 testdata
The code when run shows error when importing the tables module
File "C:\ProgramData\Anaconda3\envs\merfish2\Scripts\merlin-script.py", line 33, in <module>
sys.exit(load_entry_point('merlin', 'console_scripts', 'merlin')())
File "C:\ProgramData\Anaconda3\envs\merfish2\Scripts\merlin-script.py", line 25, in importlib_load_entry_point
return next(matches).load()
File "C:\ProgramData\Anaconda3\envs\merfish2\lib\site-packages\importlib_metadata\__init__.py", line 194, in load
module = import_module(match.group('module'))
File "C:\ProgramData\Anaconda3\envs\merfish2\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "c:\users\dpant\downloads\merlin\merlin\__init__.py", line 8, in <module>
from merlin.core import dataset
File "c:\users\dpant\downloads\merlin\merlin\core\dataset.py", line 20, in <module>
import tables
File "C:\ProgramData\Anaconda3\envs\merfish2\lib\site-packages\tables\__init__.py", line 99, in <module>
from .utilsextension import (
ImportError: DLL load failed: The specified module could not be found.
Here are the lsit of packages I have installed
$ conda list
# packages in environment at C:\ProgramData\Anaconda3\envs\merfish2:
#
# Name Version Build Channel
alabaster 0.7.12 pypi_0 pypi
appdirs 1.4.4 pypi_0 pypi
atomicwrites 1.4.0 pypi_0 pypi
attrs 21.2.0 pypi_0 pypi
babel 2.9.1 pypi_0 pypi
blas 1.0 mkl
blosc 1.21.0 h19a0ad4_0
boto3 1.20.3 pypi_0 pypi
botocore 1.23.3 pypi_0 pypi
bzip2 1.0.8 he774522_0
cached-property 1.5.2 pypi_0 pypi
cachetools 4.2.4 pypi_0 pypi
certifi 2021.5.30 py36haa95532_0
charset-normalizer 2.0.7 pypi_0 pypi
colorama 0.4.4 pypi_0 pypi
configargparse 1.5.3 pypi_0 pypi
connection-pool 0.0.3 pypi_0 pypi
coverage 6.1.2 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
cython 0.29.24 pypi_0 pypi
datrie 0.8.2 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
docutils 0.15.2 pypi_0 pypi
filelock 3.3.2 pypi_0 pypi
gitdb 4.0.9 pypi_0 pypi
gitpython 3.1.18 pypi_0 pypi
google-api-core 2.2.2 pypi_0 pypi
google-auth 2.3.3 pypi_0 pypi
google-cloud-core 2.2.1 pypi_0 pypi
google-cloud-storage 1.42.3 pypi_0 pypi
google-crc32c 1.3.0 pypi_0 pypi
google-resumable-media 2.1.0 pypi_0 pypi
googleapis-common-protos 1.53.0 pypi_0 pypi
h5py 2.10.0 py36h5e291fa_0
hdf5 1.10.4 h7ebc959_0
icc_rt 2019.0.0 h0cc432a_1
idna 3.3 pypi_0 pypi
imageio 2.10.3 pypi_0 pypi
imagesize 1.3.0 pypi_0 pypi
importlib-metadata 4.8.2 pypi_0 pypi
iniconfig 1.1.1 pypi_0 pypi
intel-openmp 2021.4.0 haa95532_3556
ipython-genutils 0.2.0 pypi_0 pypi
jinja2 3.0.3 pypi_0 pypi
jmespath 0.10.0 pypi_0 pypi
joblib 1.1.0 pypi_0 pypi
jsonschema 3.2.0 pypi_0 pypi
jupyter-core 4.9.1 pypi_0 pypi
kiwisolver 1.3.1 pypi_0 pypi
libspatialindex 1.9.3 h6c2663c_0
lz4-c 1.9.3 h2bbff1b_1
lzo 2.10 he774522_2
markupsafe 2.0.1 pypi_0 pypi
matplotlib 3.3.4 pypi_0 pypi
merlin 0.1.6 dev_0 <develop>
mkl 2020.2 256
mkl-service 2.3.0 py36h196d8e1_0
mkl_fft 1.3.0 py36h46781fe_0
mkl_random 1.1.1 py36h47e9c7a_0
mock 4.0.3 pyhd3eb1b0_0
nbformat 5.1.3 pypi_0 pypi
networkx 2.5.1 pypi_0 pypi
numexpr 2.7.3 py36hcbcaa1e_0
numpy 1.19.5 pypi_0 pypi
numpy-base 1.19.2 py36ha3acd2a_0
opencv-python 4.5.4.58 pypi_0 pypi
packaging 21.2 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
pillow 8.4.0 pypi_0 pypi
pip 21.2.2 py36haa95532_0
pluggy 1.0.0 pypi_0 pypi
protobuf 3.19.1 pypi_0 pypi
psutil 5.8.0 pypi_0 pypi
pulp 2.5.1 pypi_0 pypi
py 1.11.0 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pyclustering 0.10.1.2 pypi_0 pypi
pygments 2.10.0 pypi_0 pypi
pyparsing 2.4.7 pypi_0 pypi
pyqt5 5.15.6 pypi_0 pypi
pyqt5-qt5 5.15.2 pypi_0 pypi
pyqt5-sip 12.9.0 pypi_0 pypi
pyreadline 2.1 py36_1
pyrsistent 0.18.0 pypi_0 pypi
pytables 3.6.1 py36h1da0976_0
pytest 6.2.5 pypi_0 pypi
pytest-cov 3.0.0 pypi_0 pypi
python 3.6.13 h3758d61_0
python-dateutil 2.8.2 pypi_0 pypi
python-dotenv 0.19.2 pypi_0 pypi
pytz 2021.3 pypi_0 pypi
pywavelets 1.1.1 pypi_0 pypi
pywin32 302 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
ratelimiter 1.2.0.post0 pypi_0 pypi
requests 2.26.0 pypi_0 pypi
rsa 4.7.2 pypi_0 pypi
rtree 0.9.7 py36h2eaa2aa_1
s3transfer 0.5.0 pypi_0 pypi
scikit-image 0.17.2 pypi_0 pypi
scikit-learn 0.24.2 pypi_0 pypi
scipy 1.5.4 pypi_0 pypi
seaborn 0.11.2 pypi_0 pypi
setuptools 58.0.4 py36haa95532_0
shapely 1.5.9 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_0
smart-open 5.2.1 pypi_0 pypi
smmap 5.0.0 pypi_0 pypi
snakemake 6.10.0 pypi_0 pypi
snowballstemmer 2.1.0 pypi_0 pypi
sphinx 4.3.0 pypi_0 pypi
sphinx-rtd-theme 1.0.0 pypi_0 pypi
sphinxcontrib-applehelp 1.0.2 pypi_0 pypi
sphinxcontrib-devhelp 1.0.2 pypi_0 pypi
sphinxcontrib-htmlhelp 2.0.0 pypi_0 pypi
sphinxcontrib-jsmath 1.0.1 pypi_0 pypi
sphinxcontrib-qthelp 1.0.3 pypi_0 pypi
sphinxcontrib-serializinghtml 1.1.5 pypi_0 pypi
sqlite 3.36.0 h2bbff1b_0
stopit 1.1.2 pypi_0 pypi
tables 3.6.1 pypi_0 pypi
tabulate 0.8.9 pypi_0 pypi
threadpoolctl 3.0.0 pypi_0 pypi
tifffile 2020.9.3 pypi_0 pypi
toml 0.10.2 pypi_0 pypi
tomli 1.2.2 pypi_0 pypi
toposort 1.7 pypi_0 pypi
traitlets 4.3.3 pypi_0 pypi
typing-extensions 3.10.0.2 pypi_0 pypi
urllib3 1.26.7 pypi_0 pypi
vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
wheel 0.37.0 pyhd3eb1b0_1
wincertstore 0.2 py36h7fe50ca_0
wrapt 1.13.3 pypi_0 pypi
xmltodict 0.12.0 pypi_0 pypi
zipp 3.6.0 pypi_0 pypi
zlib 1.2.11 h62dcd97_4
zstd 1.4.9 h19a0ad4_0
FutureWarning: item
has been deprecated and will be removed in a future version
MERlin/merlin/data/dataorganization.py
Line 137 in 907f4c2
Line 207 in be3c994
Set imagej = True raise the errors "ImageJ does not support non-contiguous series". It fails to save additional stack to existing tif file. Set imagej to False get rid of this error.
It would be useful to be notified if a run crashes due to snakemake/cluster failure so we could resubmit the job asap.
Hello,
I plan to use MERlin for some experiment that we will perform in the lab in the very next future, so I was trying to use it on some published data in order to familiarize with the tool.
The image I wanted to use as a "toy" data is "aligned_images0.tif" from here:
https://download.brainimagelibrary.org/cf/1c/cf1c1a431ef8d021
The name I gave to this file in my local folder is "aligned_images_0.tif" and it is in the folder: /DATA_HOME/experiment1
When I run this command:
merlin -a test_decode_and_segment.json -m microscope.json -o dataorganization.csv -c codebook.fasta -n 1 experiment1
I get the following error:
MERlin - the MERFISH decoding pipeline
Traceback (most recent call last):
File "/home/ssarnataro/MERlin/merlin/data/dataorganization.py", line 272, in _map_image_files
self.fileMap = self._dataSet.load_dataframe_from_csv('filemap')
File "/home/ssarnataro/MERlin/merlin/core/dataset.py", line 361, in load_dataframe_from_csv
with open(savePath, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ssarnataro/Work/data/MERFISH_Atlas_Zhang_et_al/MERlin/ANALYSIS_HOME//experiment1/filemap.csv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ssarnataro/miniconda3/envs/merlin_env/bin/merlin", line 33, in
sys.exit(load_entry_point('merlin', 'console_scripts', 'merlin')())
File "/home/ssarnataro/MERlin/merlin/merlin.py", line 111, in merlin
analysisHome=_clean_string_arg(args.analysis_home)
File "/home/ssarnataro/MERlin/merlin/core/dataset.py", line 1008, in init
self, dataOrganizationName)
File "/home/ssarnataro/MERlin/merlin/data/dataorganization.py", line 71, in init
self._map_image_files()
File "/home/ssarnataro/MERlin/merlin/data/dataorganization.py", line 309, in map_image_files
currentType))
merlin.core.dataset.DataFormatException: Unable to identify image files matching regular expression (?P[\w|-]+)(?P[0-9]+).tif for image type aligned_images_0.
It is like my regular expression is not correct.
Note: I do not have any fiducial image, the images are already aligned.
I attach the dataorganization.csv file that I created for that.
dataorganization.zip
Do you have any idea on what is wrong?
Thank you very much
I recently installed MERlin on a new machine, and discovered that the installation instructions no longer work. I had to make 3 changes to get it to work.
First, I was getting a DLL load failed error when the tables package was trying to import hdf5extension. To fix this, I removed the pytables installed through conda and re-ran the pip install command for MERlin so that pytables was installed through pip. So it seems like the step in the installation instructions that says pip doesn't correctly install tables isn't the case anymore, and following the instructions to install pytables with conda no longer works.
Next, I was getting an error in my DeconvolutionPreprocess tasks saying that it couldn't find the files that should have been created by the FiducialCorrelationWarp tasks, however these files did exist when I checked for them. I believe that the DeconvolutionPreprocess tasks weren't waiting for the FiducialCorrelationWarp tasks to complete. I was able to fix this by reverting my snakemake version to 5.12.0. I don't know exactly which version introduced this bug, but 5.12.0 was the version installed on another machine where MERlin does work, so I just used that version.
Finally, I got an error from tifffile that "ImageJ does not support non-contiguous series". I reverted tifffile to 0.14.0, the minimum version required in requirements.txt, which also required reverting scikit-image to 0.15.0 to avoid dependency issues. Everything seems to be working now.
The installation instructions do not work for current versions of pip. The flag --process-dependency-links was removed in pip v 19.0 and above.
But I guess you already knew that.
I was gearing up to update the clustering code I and others use to make it 1) compatible with the most recent stable release of scanpy instead of the development version I was originally wrapping and 2) integrated into merlin. In thinking about it though, it didn't fit as cleanly into the merlin framework as I was originally thinking, largely because the clustering will often be performed on several datasets.
I felt like the two cleanest options were to make a "metaanalysis" class in merlin that takes in many datasets and performs analysis tasks on the aggregated data, and clustering would be one such analysis. The other was to just not integrate the clustering code and instead just make it easy to port the data from one to the other. Do you have any thoughts on this?
I feel like the metaanalysis class would only be worth it if we were going to use it for more than only clustering analyses. It's also possible to just let the clustering be a normal analysis task, tie it to a particular dataset, but let the user pass in multiple datasets via parameters. This seemed like something you wouldn't like, and I don't really favor it.
I encountered a segmentation fault when trying to run GenerateMosaic using opencv-python 4.1.2.30. When I downgraded opencv-python to 4.0.1.24 the problem was resolved.
Full Environment:
name: merlin_env
channels:
Hi there,
I installed MERlin following the instructions in the documentation and implemented some workarounds:
pandas concat()
by adding "axis=".However, I am still encountering an issue with test_merfish.py
running extremely slow (over 16 hours and not finished yet). I came across a previous issue posted by @HazenBabcock, where it seems that his computer took less than 2 minutes to complete all the tests, including passing the test_merfish
.
Is this extended test run time expected? Should I continue waiting, or do you have any suggestions on how to monitor the progress and identify the slow-running processes?
Additionally, could I request information on your current working environment's package versions in case it's still an environment-related issue?
PS: My processor information
Caption DeviceID MaxClockSpeed Name NumberOfCores Status
AMD64 Family 23 Model 49 Stepping 0 CPU0 3800 AMD Ryzen Threadripper 3960X 24-Core Processor 24 OK
Best,
Yuan
It isn't clear what needs to be done to get the tests to run. Do I need to do anything beyond installing the project and all it's dependencies?
Also every test is failing with the following:
`
______________ ERROR at teardown of test_remove_overlapping_cells ______________
item =
@pytest.mark.hookwrapper
@pytest.mark.trylast
def pytest_runtest_teardown(item):
"""
Hook called after each test tear down, to process any pending events and
avoiding leaking events to the next test. Also, if exceptions have
been captured during fixtures teardown, fail the test.
"""
_process_events()
_close_widgets(item)
_process_events()
yield
_process_events()
capture_enabled = _is_exception_capture_enabled(item)
../../../pyenv/mydev/lib/python3.6/site-packages/pytestqt/plugin.py:156:
item =
def _is_exception_capture_enabled(item):
"""returns if exception capture is disabled for the given test item.
"""
disabled = item.get_marker('qt_no_exception_capture') or \
item.config.getini('qt_no_exception_capture')
E AttributeError: 'Function' object has no attribute 'get_marker'
../../../pyenv/mydev/lib/python3.6/site-packages/pytestqt/exceptions.py:81: AttributeError
`
Which I think is actually an issue with pytest or pytest-cov. Is pytest
the recommended way to run the tests?
`
$ pip list | grep pytest
pytest 5.3.1
pytest-cov 2.8.1
pytest-faulthandler 1.5.0
pytest-forked 0.2
pytest-mock 1.10.0
pytest-qt 2.3.1
pytest-remotedata 0.3.2
`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.