losonczylab / sima Goto Github PK

View Code? Open in Web Editor NEW

99.0 22.0 50.0 5.9 MB

Python package for analysis of dynamic fluorescence microscopy data

License: GNU General Public License v2.0

PowerShell 0.39% Python 98.41% Batchfile 0.42% Shell 0.42% Dockerfile 0.36%

sima's Introduction

https://travis-ci.org/losonczylab/sima.svg?branch=master

https://ci.appveyor.com/api/projects/status/q3bgxcoget1xef33/branch/master?svg=true

https://coveralls.io/repos/losonczylab/sima/badge.png

Join the chat at https://gitter.im/losonczylab/sima

Overview

SIMA (Sequential IMage Analysis) is an Open Source package for analysis of time-series imaging data arising from fluorescence microscopy. The functionality of this package includes:

correction of motion artifacts
segmentation of imaging fields into regions of interest (ROIs)
extraction of dynamic signals from ROIs

The included ROI Buddy software provides a graphical user interface (GUI) supporting the following functionality:

manual creation of ROIs
editing of ROIs resulting from automated segmentation
registration of ROIs across separate imaging sessions

Installation and Use

For complete documentation go to <http://www.losonczylab.org/sima>

Dependencies

Python 2.7
numpy >= 1.6.2
scipy >= 0.13.0
scikit-image >= 0.9.3 (0.11.0 recommended)
scikit-learn >= 0.11
shapely >= 1.2.14 (Windows users: be sure to install from Christophe Gohlke's built wheels)
pillow >= 2.6.1
future >= 0.14

Optional dependencies

OpenCV >= 2.4.8, required for segmentation, registration of ROIs across multiple datasets, and the ROI Buddy GUI
picos >= 1.0.2, required for spike inference (>= 1.1 required for Python 3)
pyfftw, allows faster performance of some motion correction methods when installed together with FFTW.
h5py >= 2.2.1 (2.3.1 recommended), required for HDF5 file format
bottleneck >=0.8, for faster calculations
matplotlib >= 1.2.1, for saving extraction summary plots
mdp, required for ICA demixing of channels

If you build the package from source, you may also need:

Cython

If you are using the spike inference feature, we strongly recommend installing MOSEK (free for academic use) which greatly speeds up the inference.

Citing SIMA

If you use SIMA for your research, please cite the following paper in any resulting publications:

Kaifosh P, Zaremba J, Danielson N, and Losonczy A. SIMA: Python software for analysis of dynamic fluorescence imaging data. Frontiers in Neuroinformatics. 2014 Aug 27; 8:77. doi: 10.3389/fninf.2014.00080.

License

Unless otherwise specified in individual files, all code is

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

sima's People

Contributors

Stargazers

Watchers

Forkers

pkaifosh emb2162 jzaremba nbdanielson j3tsai tamachado asaich gordek soudry csn92 zhenruiliao peltonen neurodroid jbowler06 llerussell simone-codeluppi maximilianhoffmann benvermaercke rochefort-lab jacknojunk sebirolotti angelc134 vjlbym deep-introspection hywujiang sebtuncdemir hblockus andres9713 anegrean thomashainmueller raj347 baylorbrangers jyang-2 agiovann blitu12345 pierrebizouard james-priestley stmhub gmnamra incuvers-inc drljj zhounapeuw ruyi-yang scottclowe ahmeddeladly phaedrus2018 akhambhati eesha-jain tian-jie

sima's Issues

package ROI Buddy for Mac OS X

Perhaps py2app (https://pythonhosted.org/py2app/) could reduce the difficulties that some MacOSX users have with installing the software.

slicing of imaging datasets

This needs to be implemented properly. The implementation needs to consider the slicing of ROIs and other associated structures that may result.

Currently, the doctests related to this are marked to be skipped. These should be re-included again once the implementation is complete.

Making im_shape a mandatory argument to 3D ROI class?

For 3D ImagingDatasets, it could be possible to initialize an ROI in any of the following ways...

1.) a 2D array (applied across all planes)
2.) a 3D array
3.) a lil_matrix (applied across all planes)
4.) a list of lil_matrices (one for each plane)
5.) a list of 2D arrays (one for each plane)

For case 2, should we enforce mask.shape[2] == ROI.im_shape[2]? Alternatively, if the user passes in a list of lil_matrices, should we enforce that lil_matrix.shape == ROI.im_shape[:2]? And that the length of the lists in cases 4 and 5 is equal to the number of imaging planes?

If we want to enforce this, I think this would require making im_shape a mandatory argument to the ROI class (currently it's only mandatory when initializing with polygons). I'm inclined to do this -- it seems potentially dangerous to allow masks that don't match the image shape. If we do want to support mask shapes not matching im_shape, then we should make the resulting behavior more obvious in the API. What do you guys think?

additional segmentation methods

call to motion.hmm in 64-bit python results in value error

OS: Windows 7
Python: Python 2.7.7 |Anaconda 2.0.1 (64-bit)
SIMA: 0.3.0

NOTE: this problem is solved when switching to 32-bit python

This example was generated when using 'workflow_data.zip' .

user query when writing to existing location

Currently, when SIMA is asked to save an imaging dataset to a location with an existing imaging dataset, an exception is raised. It would be better if there were a request for the user to decide whether or not to overwrite the pre-existing dataset, or whether to save to a different location.

parallelization in _motion.pyx

Two places in _motion.pyx are marked with #TODO comments suggesting parallelization. Based on a quick search, this seems to be the best guide: http://docs.cython.org/src/userguide/parallelism.html

loading datasets after files have moved

We would like to be able to open saved datasets, even if the path to the raw data has been moved. This will require the user to be called to input the new location to which the data has been moved. The user input should be called for in sima.sequence._resolve_paths in place of the exception that is raised.

ROI_Buddy documentation

Locking ROI lists during alignment and importing ROI lists from one set to another are not explained in the online docs.

GUI for workflow

I have received some requests/suggestions for a simple GUI to manage the basic workflow, i.e. loading raw data, performing motion correction, performing segmentation, performing extraction, and exporting data. This could be quite helpful for some of the less Python-savvy users.

3D test cases

Test cases need to be adapted to or created for the new 3D version SIMA.

3D example data

need to add example 3D data and associated ImagingDataset directory to sima/tests/data

motion correction for 3D displacements

Although we have implemented motion correction for 3D datasets, it currently only corrects motion artifacts within imaging planes (i.e. in X-Y), but not between imaging planes (i.e. in Z).

python 2/3 compatibility

We would like SIMA to be compatable with Python 2.7 and Python 3. Currently, it is only compatable with Python 2.7.

Some references:

No good planes in single-plane imaging dataset

Because the plane occupancy is calculated over the largest bounding box on the plane, the mean occupancy per pixel could be < min_occupancy, resulting in no good planes (even when only one plane is imaged).

This results in an error on line 154 (calling .min() on empty list) in _trim_coords of motion.py

support for single-image TIFF files

It would be useful if SIMA could read data where each frame is saved in a separate TIF file, e.g.

20141021 GCaMP ruby_C00_t0000.ome.tif
20141021 GCaMP ruby_C00_t0001.ome.tif
20141021 GCaMP ruby_C00_t0002.ome.tif
20141021 GCaMP ruby_C00_t0003.ome.tif
20141021 GCaMP ruby_C00_t0004.ome.tif
20141021 GCaMP ruby_C00_t0005.ome.tif
20141021 GCaMP ruby_C00_t0006.ome.tif
20141021 GCaMP ruby_C00_t0007.ome.tif
20141021 GCaMP ruby_C00_t0008.ome.tif
20141021 GCaMP ruby_C00_t0009.ome.tif
20141021 GCaMP ruby_C00_t0010.ome.tif
....

generalize HMM motion correction beyond row-wise assumption

We currently assume that all pixels within a row have the same displacement, but that each row can have a different displacement. It would be nice to allow for more flexibility here. For example, some imaging techniques may acquire entire planes at a same time, but allow for different displacements for each plane. At high acquisition rates, it might be reasonable to assume that a chunk of several consecutive rows has the same displacement.

replace chain(*iterable) with chain.from_iterable(iterable)

The latter is much more memory efficient.

motion correction with missing data

Motion correction should be able to work with some amount of missing data (e.g. if one channel is not recorded during some time interval). The 2D version had some support for missing rows in one channel. In the 3D version, we plan to have a simpler approach of making the motion correction calculations treat NaN values as missing data.

faster motion correction

Faster motion correction would be helpful when analyzing very large data sets. Some possible approaches include:

spatial down-sampling
algorithm changes
profiling and optimization of implementation
CPU parallelization
GPU support
cluster support

os.path.samefile not in Windows

From Tim:

os.path.samefile is not a valid function on Windows machines and
therefore building fails even if all dependencies are up to date. To
fix this problem, I added the following to the top of sequence.py:

try:
from os.path import samefile
except ImportError:
# Windows does not have the samefile function
from os import stat
def samefile(file1, file2):
return stat(file1) == stat(file2)

This workaround should work as long as the two files being compared
don't change during runtime (which might mean that file1 != file2 even
if they are actually the same file).

call to segment in 32-bit python results in memory error

OS: Windows 7
Python: Python 2.7.7 |Anaconda 2.0.1 (32-bit)
SIMA: 0.3.0

Note: switching to 64 bit python remedies this issue.

Transform structure

Ben (@poolio) has suggested that we use an abstract Transformation class to define the interface for the various possible ways in which images may be transformed to correct for motion artifacts. This would help to clean up a number of parts of the code and improve flexibility going forward.

I have started sketching this in transfrom.py, included in commit 28ee215.

acquisition times information

Overview

Information about the times at which imaging data are acquired is useful for motion correction, and also for interpreting the exported signals.

Considerations

Simplicity: It should be easy to enter the meta-data, especially for simple cases such as a single imaging plane acquired at a constant frame rate.

Missing entries: There should be reasonable default behavior if some of the meta-data is not supplied.

Flexibility: Ideally this would be flexible enough to support spinning disk or spatial light modulation (SLM) data acquisition (see #7), but these could be handled separately.

Possibility

We could store an object with the following attributes:

frame_times: either a single number (the frame rate) or a list of all the times at which frames start (the latter could be useful in the case of dropped frames)
planes: the order in which planes are acquired. This defaults to [0, 1, 2, ..., P-1]. If planes are acquired first on an upward sweep and then in reverse order, this can be [0, 1, 2, 2, 1, 0].
plane_times: Either a single number (the plane acquisition rate) or a list of times at which planes are acquired relative to the beginning of the period. The list should have the same number of entries as planes. The default is to assume that the planes are evenly spaced in time throughout the period.
rows, pixels: similar to planes
row_times, pixel_times: similar to plane_times -- Note, pixel_times could also be a single dwell time.

test cases for other motion correction methods in cases of missing data

memory efficient use of numpy functions with Sequence objects

Sequences are array like objects. Therefore, numpy functions can be applied to them. However, these may (not confirmed) involve converting the entire object to a numpy array that is stored in memory all at once. This is not feasible for large datasets. Ideally, the numpy functions could run without performing such a conversion. Perhaps some ideas on how to make this work could be gleaned from the source code for numpy.memmap objects:

https://github.com/numpy/numpy/blob/master/numpy/core/memmap.py

parallelization of frame-by-frame motion correction

Comments have been added inside the function _frame_alignment_correlation() to suggest places for locking.

dataset.pkl filesize for motion-corrected data is large

displacements is a int64 array of shape (n_frames, n_planes, n_rows, 2) which can grow to be relatively large, a 14GB imaging file gave a 250MB dataset.pkl file.

Two options that might help:

storing as int8 or int16 instead of int64.
storing the array sparesly, i.e. subtract off the mode, unravel the array and store as a sparse matrix.

Or maybe this isn't an issue...thoughts?

more efficient slicing of HDF5-based Sequences

Currently, SIMA loads the whole frame into memory as a numpy array, and then slices that frame in the case of a sliced Sequence. It would be better to just load the data required to return the sliced frame.

method convert from old .sima format to new .sima format

Another option is just to be able to load the old file format.

use bottleneck if available

We should use bottleneck if it is installed, but not make it a requirment. The following strategy should work:

try:
    from bottleneck import nanmean
else:
    from numpy import nanmean

Import ROIs issues

If a pair of datasets fails to transform, there is no error message, it just quietly returns.
The affine transform is run from sima.misc, UI_tSeries.transform is used elsewhere, so the results are not necessarily the same.
The transform does not use the processed image, which is probably the main reason that it gives different results from the transform calculated by UI_tSeries.transform.

frame alignment on partitions

If the magnitude of the displacements are small relative the size of the images, we can partition the images (in x and y, e.g. into 4 quarters) and estimate the displacements separately on each partition. This can be advantageous for the follwoing reasons:

The fourier transform scales supra-linearly in the number of pixels
The quality of the alignment can be judged based on the degree to which estimates from different partitions agree, and then the time average can be computed with only the displacements that agree well between the partitions
If most of the displacement estimates from the first few partitions are in close agreement (or if the correlations are high), then the estimates from the later partitions need not be computed
Calculation of the displacements on different partitions can be performed in parallel with no shared data or synchrony issues

Using Shapely Polygons to set and store z-coordinate information

In the current implementation of ROI.ROI, the user can initialize a an ROI with a Polygon, a list of Polygons, a MultiPolygon, or a list of 2D coordinate sequences.

For 3D ROIs, we should maintain this flexibility. However, this requires implementation of a _validate_z method for ensuring that each polygon has the same z-coordinate.

If the user passes in shapely Polygon objects initialized with 2D coordinates, a value of np.nan is assigned to z. However, it seems shapely.Polygon.exterior.coords is not "set-able". We must initialize a new shapely.Polygon object with the z-coordinates.

An additional problem is illustrated by the following example:

In [7]: polys = [[0, 0, 1], [1, 0, 1], [.5,  .5, 1]]

In [8]: Polygon(polys).has_z
Out[8]: True

In [9]: polys = [[0, 0, np.nan], [1, 0, np.nan], [.5, .5, np.nan]]

In [10]: Polygon(polys).has_z
Out[10]: False

So it seems that the has_z property does not accept np.nan.

So there are 2 issues here:
1.) If the user passes in 2D shapely Polygons as ROIs, we need to re-initialize them using 3D coordinates
2.) np.nan is not valid for Polygon.has_z

I don't see a problem with 1, and as a work-around for 2, we could define a polygon for each plane with the appropriate z-coordinates and initializing that list as a shapely Polygon?

more helpful error messages

We to provide more helpful errors messages. This will make the software easier to use, save time debugging, and reduce the amount of support that users require. Test cases should verify that this errors are correctly raised, e.g. in the case of invalid input arguments. If the user passes invalid arguments to a function, we should try to catch these as soon as possible, rather than having them cause a confusing error message in some subordinate function. Eamples of errors that should be caught:

initializing an ImagingDataset with a Sequence instead of a list of Sequences
passing the wrong number of filenames to an export function

creation of sequences from thunder arrays

Make sure that it is possible to make Sequence objects from Thunder "block" structures. This will allow SIMA to be used for segmentation on a cluster within the Thunder framework.

I expect that this may already be possible without any changes, since SIMA already supports creation of Sequence objects from numpy arrays, although some axis swapping may be required on the Thunder side before calling Sequence.create().

ROI mask interface

We should rethink the interface for the ROI masks. Currently, the are returned as a list of 2D sparse arrays, which makes them a little awkward to work with. It's okay if we store them that way, but they are interfaced should be independent of how they are stored. Ideally they should have the same interface as a numpy array.

NNMF segmentation / source extraction

adapt tutorial to 3D API

The tutorial needs to be adapted to match the new 3D API.

TIFF file saving bug

Problem

TIFF files saved with SIMA's ImagingDataset.export_frames() cannot be opened with MATLAB.

Details

Since this problem can be corrected by changing the RowsPerStrip tag of the TIFF file to equal the number of rows in the image, this problem may be due to improper setting of these tags in tifffile.py.

Work-arounds

A) Open and then re-save the TIFF files using ImageJ. They can then be opened with MATLAB.
B) Edit the RowsPerStrip tag of the image files.

multiprocessing on Windows

From Tim:

I just updated my repository and re-ran the test suite and
there's a fourth error that seems related to Nathan's commit from 2
days ago:

======================================================================
ERROR: test__hmm.Test_HiddenMarkov2D.test_correlation_based_correction
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Users\tamachado\Documents\My
Dropbox\Code\Python\sima\build\testenv\Lib\site-packages\sima\motion\tests\test__hmm.py",
line 189, in test_correlation_based_correction
    shifts = self.hm2d._correlation_based_correction(self.dataset)
  File "C:\Users\tamachado\Documents\My
Dropbox\Code\Python\sima\build\testenv\Lib\site-packages\sima\motion\_hmm.py",
line 416, in _correlation_based_correction
    max_displacement, n_processes=n_processes).estimate(dataset)
  File "C:\Users\tamachado\Documents\My
Dropbox\Code\Python\sima\build\testenv\Lib\site-packages\sima\motion\motion.py",
line 36, in estimate
    return self._make_nonnegative(self._estimate(dataset))
  File "C:\Users\tamachado\Documents\My
Dropbox\Code\Python\sima\build\testenv\Lib\site-packages\sima\motion\frame_align.py",
line 95, in _estimate
    params.n_processes)
  File "C:\Users\tamachado\Documents\My
Dropbox\Code\Python\sima\build\testenv\Lib\site-packages\sima\motion\frame_align.py",
line 188, in _frame_alignment_base
    next(map_generator)
  File "c:\Python27\lib\multiprocessing\pool.py", line 655, in next
    raise value
AttributeError: __exit__

sparse PCA based segmentation

_todict() methods save class

Saving the class requires that it exist in future versions. This could cause problems with backward compatibility. Currently, this is a problem when trying to create a parser in sima 1.0 that can read .sima files from sima 0.x

Manual ROI alignment/registration in ROI buddy

Some datasets in which the same cell population was imaged, resulted in dissimilar projected time series, at least according to the ROI buddy algorithm, to match the data sets in alignment mode. It would be a nice feature to override the automatic matching and allow manual alignment, where all the ROIs can be shifted at once or individually as needed. This could be especially useful for hand drawn generated in imagej.

EOFError after export_signals or signals is called

OS: Windows 7
Python: Python 2.7.7 |Anaconda 2.0.1 (64-bit)
SIMA: 0.3.0

For example, this line: raw_signals = dataset.signals('GCaMP')['GCaMP_signals']['raw'], leads to this error:

Also, this line: dataset.export_signals('example_signals.csv', channel='GCaMP',
signals_label='GCaMP_signals'), leads to this error:

FastICA from scikit-learn instead of mdp

We should use just one FastICA implementation throughout our code (currently we use two different ones in different places). Let's go with scikit-learn and thus eliminate the mdp dependency.

StrictVersion and OpenCV

StrictVersion fails on the version format for some OpenCV releases. We should fix this on the 0.3.x branch and then pull into master.

patrick@herschel:~/code/sima$ python runtests.py 
Building, see build.log...
Build OK
SIMA: Python package for sequential image analysis.
Developed by Patrick Kaifosh, Jeffrey Zaremba, Nathan Danielson.
Copyright (C) 2014 The Trustees of Columbia University in the City of New York.
Licensed under the GNU GPL version 2 or later.
Documentation: http://www.losonczylab.org/sima
Version 1.0.0-dev
Traceback (most recent call last):
  File "runtests.py", line 419, in <module>
    main(argv=sys.argv[1:])
  File "runtests.py", line 222, in main
    __import__(PROJECT_MODULE)
  File "/home/patrick/code/sima/build/testenv/lib/python2.7/site-packages/sima/__init__.py", line 9, in <module>
    from sima.imaging import ImagingDataset
  File "/home/patrick/code/sima/build/testenv/lib/python2.7/site-packages/sima/imaging.py", line 20, in <module>
    import sima.misc
  File "/home/patrick/code/sima/build/testenv/lib/python2.7/site-packages/sima/misc/__init__.py", line 12, in <module>
    cv2_available = StrictVersion(cv2.__version__) >= StrictVersion('2.4.8')
  File "/usr/lib/python2.7/distutils/version.py", line 40, in __init__
    self.parse(vstring)
  File "/usr/lib/python2.7/distutils/version.py", line 107, in parse
    raise ValueError, "invalid version number '%s'" % vstring
ValueError: invalid version number '2.4.9.1'

suppress annoying TIFF tag warnings

When we read TIFF files (e.g. in _Sequence_TIFF_Interleaved) with libtiff, the following warning repeats many many times.

TIFFReadDirectory: Warning, Unknown field with tag 34362 (0x863a) encountered.

@jzaremba Can you modify the code below so that it only suppresses warnings containing "TIFFReadDirectory: Warning, Unknown field with tag"?

import os

class suppress_stdout_stderr(object):
    """
    A context manager for doing a "deep suppression" of stdout and stderr in
    Python, i.e. will suppress all print, even if the print originates in a
    compiled C/Fortran sub-function.
       This will not suppress raised exceptions, since exceptions are printed
    to stderr just before a script exits, and after the context manager has
    exited (at least, I think that is why it lets exceptions through).

    From:
    http://stackoverflow.com/questions/11130156/suppress-stdout-stderr-print-from-python-functions

    """
    def __init__(self):
        # Open a pair of null files
        self.null_fds = [os.open(os.devnull, os.O_RDWR) for x in range(2)]
        # Save the actual stdout (1) and stderr (2) file descriptors.
        self.save_fds = (os.dup(1), os.dup(2))

    def __enter__(self):
        # Assign the null pointers to stdout and stderr.
        os.dup2(self.null_fds[0], 1)
        os.dup2(self.null_fds[1], 2)

    def __exit__(self, *_):
        # Re-assign the real stdout/stderr back to (1) and (2)
        os.dup2(self.save_fds[0], 1)
        os.dup2(self.save_fds[1], 2)
        # Close the null files
        os.close(self.null_fds[0])
        os.close(self.null_fds[1])

Single plane ROIs don't always show in ROIBuddy

If you load a multi-plane dataset, switch to plane 1, and then load a single-plane dataset, none of the ROIs will show. It looks like the active plane for the new dataset is being set correctly, so I'm not sure where this is getting confused. The set loads fine on its own, or if you are on plane 0 of a multi-plane dataset when you load the new one. If anyone has a chance to look into this I have example datasets.

calculate correlations in whole_frame_shifting

Since certain frame alignment methods may not return correlation values, these correlations should be calculated during whole_frame_shifting

losonczylab / sima Goto Github PK

sima's Introduction

Overview

Installation and Use

Dependencies

Optional dependencies

Citing SIMA

License

sima's People

Contributors

Stargazers

Watchers

Forkers

sima's Issues

Overview

Considerations

Possibility

Problem

Details

Work-arounds

Recommend Projects

Recommend Topics

Recommend Org