fish-quant / big-fish Goto Github PK
View Code? Open in Web Editor NEWToolbox for the analysis of smFISH images.
Home Page: https://big-fish.readthedocs.io/en/stable/
License: BSD 3-Clause "New" or "Revised" License
Toolbox for the analysis of smFISH images.
Home Page: https://big-fish.readthedocs.io/en/stable/
License: BSD 3-Clause "New" or "Revised" License
Would be nice to have the option to specify the delimiter in this function.
It calls save_data_to_csv
here: https://github.com/fish-quant/big-fish/blob/master/bigfish/stack/postprocess.py#L559, which provides this option.
Hi I just installed Big-FISH but I am a bit lost on how to use it. I created the virtual environment with conda, activated it, and installed Big-FISH. I recently performed RNAScope on two genes. I have 3 channel z-stacks (DAPI, Cy3, Cy5) acquired on the Zeiss Z2 epifluorescence microscope and saved as .CZI files. How can I load these files into Big-FISH so that I can start the analysis?
Hi,
Could you edit the plot elbow function so that it returns the detected optimal threshold in addition to the graph?
Hi,
I am not sure for spot detection, can user set different thresholds for spots in nuclei and spots in cytoplasm?
We want to prevent the confusion between cluster decomposition step and clusters detection step (foci, transcription sites, etc).
Cluster decomposition -> dense decomposition
foci detection -> clusters detection
The function extract_cell
allows combining spot/foci detection results and cell/nuclei segmentation.
Cell segmentation results are stored as a label image, where each segmented entity has a unique pixel value, which can serve as an identifier. It would be good to store this value in the cell_results
, maybe as (index_cell
) or something like that. This would allow to directly link cell_results
with the label image.
We should add a generic function to tell if a specific coordinate is in the nucleus or not. It could be used with the spots, the foci, whatever.
Super-expressing cells are slowing down the analysis and I'm not confident that they are being quantified accurately.
Any ideas on how to remove these cells from the quantification within bigfish?
My thought would be to have a segment-first mode where you segmented the cells first, and then run spot detection on each cell individually. But I don't know how much this would slow things down.
The super-expressing cells could then be treated as one big cluster, and then the number of spots could be estimated based on integrated density of an average single spot.
Is this problem too obscure?
Hi all,
Thanks a lot for developing this great software. I wanted to ask if you are planning to update the version of pandas. I noticed that you are using pandas 0.24.0. Unfortunately, this version is quite old, and it is causing some issues when I try to integrate Big-Fish with other modules or when used in Google Colab.
Best,
Luis
Is this already possible? Would be very helpful!
j
Is there a way within big-fish to manually segment cells, similar to the tool included in the Matlab based FISHQuant? The current methods of segmenting cells (U-net or watershed) are challenging to use for neurons, and does not allow me to segment cells the way I need to. Or, is there another program recommended to do manual segmentation in? Thanks!
We should gather information from cell extraction in a dataframe like object and export in a a readable format.
Might be useful to permit header rows in the results files when read from a csv file (to indicate what the columns contain) . To read those, a slight modification in the function would be sufficient, by allowing access to the skiprows
parameter of numpy.loadtext
.
The function multistack.from_binary_to_coord()
returns external boundary coordinates. It helps to build back the original mask with multistack.from_coord_to_surface()
, but might lead to incoherence when the coordinate is directly processing. The user should be able to extract cell (and nucleus) coordinates with the external and internal boundaries according to its need.
The notebooks pointed to by the link on the main page ( https://mybinder.org/v2/gh/fish-quant/fq-imjoy/binder?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252Ffish-quant%252Fbig-fish-examples%26urlpath%3Dtree%252Fbig-fish-examples%252Fnotebooks%26branch%3Dmaster )
fail with this error:
Step 40/53 : RUN ${KERNEL_PYTHON_PREFIX}/bin/pip install --no-cache-dir -r "requirements.txt"
---> Running in 5b17aeb977fd
Collecting big-fish==0.6.2
Downloading big_fish-0.6.2-py3-none-any.whl (123 kB)
ERROR: Could not find a version that satisfies the requirement imjoy>=0.11.38 (from versions: 0.7.20, 0.7.22, 0.7.24, 0.7.25, 0.7.26, 0.7.28, 0.7.30, 0.7.31, 0.7.40, 0.7.41, 0.7.50, 0.7.51, 0.7.52, 0.7.53, 0.7.54, 0.7.56, 0.7.57, 0.7.58, 0.7.59, 0.7.60, 0.7.63, 0.7.64, 0.7.65, 0.7.66, 0.7.67, 0.7.68, 0.8.7, 0.8.8, 0.8.13, 0.8.14, 0.8.16, 0.8.17, 0.8.18, 0.8.19, 0.8.20, 0.8.21, 0.8.22, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.9.7, 0.9.9, 0.9.10, 0.9.11, 0.9.12, 0.10.0, 0.10.2, 0.10.3, 0.10.4, 0.10.5, 0.10.6, 0.10.7, 0.10.8, 0.10.9, 0.10.10, 0.10.12, 0.11.0, 0.11.1, 0.11.2, 0.11.3, 0.11.4, 0.11.5, 0.11.6, 0.11.7, 0.11.8, 0.11.9, 0.11.10, 0.11.11, 0.11.12, 0.11.13, 0.11.14, 0.11.15, 0.11.16, 0.11.17, 0.11.18, 0.11.20)
ERROR: No matching distribution found for imjoy>=0.11.38
Compute time on our HPC ranges from 1-30 minutes, depending on the number of spots.
Any general advice on how to speed this up? I've split up the analysis using the multiprocessing module, but returns are rather limited.
Is there GPU implementation for bigfish?
Data and computer specs
2047x2047x20 px
100-10000 spots per image
(Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz) x 30 processors
115GB RAM
Also opening a separate issue to discuss the possibility of analyzing images by cell instead of whole frame, and analyzing super-expressing cells in analogue mode (or ignoring these cells to process separately).
Cheers,
Josh
How can I do intensity analysis of all the FISH spots or a selected group of those spots
Sorry for my french...
Integrate code coverage in the repository.
The current function reads header-less csv files. To increase it's flexibility, would be nice to have an option to specify how many headers a files has.
https://github.com/fish-quant/big-fish/blob/master/bigfish/stack/io.py#L118
could be done with the skiprows
option of the used np.loadtext
: https://numpy.org/doc/1.20/reference/generated/numpy.loadtxt.html
Hi, thanks for creating this great tool. I am really enjoying it.
I ran into some unusual behavior with one of my images at the decompose_cluster() step. I think it may be related to having a couple bright pixels at the very edge of the image. After decomposition, the number of spots assigned to z-slice 40 goes from 5 to 469, even though there obviously aren't many spots there. This region is then called as a focus in the detect_foci() step and ends up having >3-fold more RNAs assigned to it than any of the other foci (despite there being no visible focus in that region of the original image).
Here I’ve shown the original spots detected and I also plotted the spots in xyz with different foci shown in different colors. The strange focus with 469 ‘invisible’ RNAs is circled in red. Please let me know if there’s anything you think I should do to troubleshoot this more or if you have any ideas what might be causing it.
I put a notebook showing the analysis and the input image here:
https://github.com/marykthompson/temp_images
Thanks,
MK
Implement a 3D version of the feature.
Hi, is there a way to change the colour scheme for the plots generated in Big-FISH. The heat maps currently generated aren't ideal for my purposes.
Sachin
The current function reads csv files with a header and columns separated by ";".
Might be interesting to add some more flexibility.
Hi,
Thanks for the useful tool!
I'm now analyzing our own data with big-FISH. When segmenting the nuclei and cytoplasm, I failed to see the nucleus in the picture, also failed to generate a line to label the boundary as the example shows but instead, I saw some dotted line. I feel that this is weird, and found one difference from the example is that the dtype of my nuc_label is int32 instead of int64, and I changed it with astype for watershed and plot, do you have any idea why I met this problem and how to solve this?
More about the result:
the pictures (nucleus segments)
(boundary segment )
and here are the codes I used :
nuc_mask = segmentation.thresholding(dapi_2d, threshold=6)
nuc_mask = segmentation.clean_segmentation(nuc_mask, small_object_size=50, fill_holes=True)
nuc_label = segmentation.label_instances(nuc_mask)
# apply watershed
cell_label = segmentation.cell_watershed(posS_2d, nuc_label.astype(np.int64), threshold=0.00001, alpha=0.00001)
plot.plot_segmentation(dapi_2d, nuc_label.astype(np.uint16), rescale=True, framesize=(15, 5))
Many thanks if you have any suggestions!
Iris
Is there an integrated density method to estimate the # of spots in foci, similar to FQ in Matlab?
I see nb_rna_in_foci = foci_coord[:, ndim].sum()
in the features.foci_features() function. My understanding is that it works by counting up the number of spots in the foci ROI. Is that correct, and is that the only method available?
Sorry if I am overlooking something obvious.
Thanks,
j
I've accidentally entered mismatching values for voxel size and psf values to detection.detect_spots
, resulting in no spots detected. The error message I got was the default:
index -1 is out of bounds for axis 0 with size 0
I think it might be helpful to tell the user that no spots were detected, and maybe they should check the input.
Values entered (for my image of simulated points):
voxel_size_yx = 1
voxel_size_z = 1
psf_yx = 4
psf_z = 4
Otherwise, it's a great package! Very helpful.
Thanks!
Add a function to display the L-curve spot-threshold.
Is this possible?
Hello, I noticed that some elements in the 'sharpness measure' computed with stack.compute_focus were below 1, and this can't be by its definition. I tested the original code in bigfish/stack/quality.py and realised that ratio_1 is overwritten after ratio_2 = np.divide(), resulting in elements < 1 in ratio_1 where original image > filtered image. I guess this is because ratio_1 is somehow connected with ratio_default that is overwritten when ratio_2 is assigned with np.divide(..., out = ratio_default). One simple workaround would be to have two distinct ratio_default for each ratio_1 and ratio_2... (?)
Refactor bigfish.deep_learning and update examples with pre-trained deep learning models.
Hello, I have been using the cellpose algorithm for cell and nuclei segmentation. The segmentation appears to work well, however when I proceed to perform cell information extraction. It severely underrepresents the number of cells present. For instance, for the image provided it says only 12 cells are detected.
Refactor bigfish.classification and add examples in a new notebook.
Currently, we don't report if an RNA is in the cytoplasm or nucleus. However, this information might be useful for different biological questions.
Add parameters to control number of spots decomposed per cluster and the number of clusters detected.
Sphinx Sphinx Sphinx !!!
Using big-fish 0.6.1 and running the following code from your example notebook
spots, threshold = detection.detect_spots(
images=rna,
return_threshold=True,
voxel_size=(300, 103, 103), # in nanometer (one value per dimension zyx)
spot_radius=(300, 150, 150)) # in nanometer (one value per dimension zyx)
Generates the following error:
TypeError Traceback (most recent call last)
File ~\anaconda3\envs\fish\lib\site-packages\bigfish\detection\spot_detection.py:516, in automated_threshold_setting(image, mask_local_max)
514 # select threshold where the break of the distribution is located
515 if count_spots.size > 0:
--> 516 optimal_threshold, _, _ = get_breaking_point(thresholds, count_spots)
518 # case where no spots were detected
519 else:
520 optimal_threshold = None
File ~\anaconda3\envs\fish\lib\site-packages\bigfish\detection\utils.py:668, in get_breaking_point(x, y)
645 """Select the x-axis value where a L-curve has a kink.
646
647 Assuming a L-curve from A to B, the 'breaking_point' is the more distant
(...)
665
666 """
667 # check parameters
--> 668 stack.check_array(x, ndim=1, dtype=[np.float64, np.int64])
669 stack.check_array(y, ndim=1, dtype=[np.float64, np.int64])
671 # select threshold where curve break
File ~\anaconda3\envs\fish\lib\site-packages\bigfish\stack\utils.py:135, in check_array(array, ndim, dtype, allow_nan)
133 # check the dtype
134 if dtype is not None:
--> 135 _check_dtype_array(array, dtype)
137 # check the number of dimension
138 if ndim is not None:
File ~\anaconda3\envs\fish\lib\site-packages\bigfish\stack\utils.py:172, in _check_dtype_array(array, dtype)
169 break
171 if error:
--> 172 raise TypeError("{0} is not supported yet. Use one of those dtypes "
173 "instead: {1}.".format(array.dtype, dtype))
TypeError: int32 is not supported yet. Use one of those dtypes instead: [<class 'numpy.float64'>, <class 'numpy.int64'>].
The error was suppressed by changing line 546 of spot_detection.py
from: thresholds = np.array(thresholds)
to: thresholds = np.array(thresholds, dtype=np.int64)
Hi,
How would I go about calculating the standard deviation for the PSF required by the automated spot detection function as input?
Sachin
Package which is marked as stable should not pin exact version of requirements.
Mature python project does not break backward compatibility until increase main version number.
any code which will work in matplotlib 3.0.2 should work with any from 3.x.x. So markers should be >=3.0.2,<4 \
Also other packages are pinned to old versions. This force use to create separate environment to use big-fish or force them to edit sources to change ==
to >=
.
Only pandas
may be problematic because they break few things when bump to 1.x.x line. But this line introduce also many improvements and force to use 0.24.0 may be to big pay to use big-fish in many projects (for example numpy 1.16.0 cannot be simple installed with python 3.8 and 3.9).
Github provides amazing tool fo open source projects which is named Github Actions and allow to automatically run even daily test to have information about any compatibility problem.
The current implementation of the SNR calculation is very slow for full -size images and many spots.
Case example: images were 2024x2024x30 with 10K spots. SNR computation was not finished after 1h.
A more brute-force method looping over all spots was substantially faster (see below). Maybe we can add this approach as an alternative and let the user decide which implementation to use?
Also, I think it might be informative to further return the values for signal, noise, and background. These might be useful for users to judge other aspects of their smFISH signal quality.
def compute_snr_per_spot_3d(image, spot, crop_spot, w_bgd):
"""Extract a 3-d volume around a spot. From this volume,
signal (I) is determined as the maximum pixel intensity. The outer most
layer(s) are used to calculate background (B) as the mean intensity,
and noise (N) as the standard deviation. The signal-to-noise ratio (SNR)
is then calculated as SNR = (I-B)/N for each spot.
Parameters
----------
image : np.ndarray
Image with shape (z, y, x).
spot : np.ndarray, np.int64
Coordinate of a spot, with shape (1, 3). One coordinate per dimension
(zyx coordinates).
crop_spot : Tuple, np.int64
Size of cropping area to which analysis will be restricted to. Defined
around each detected spot, one scalar per dimension (z, y, x).
w_bgd : Tuple, np.int64
Size of outer pixel layer of cropped spot which will be used
to determine background.(z, y, x).
Returns
-------
snr : float
Signal-to-Noise ratio of the spot.
signal : np.int64
Maximum spot intensity.
background : np.int64
Mean of background.
noise : float
Standard deviation of background.
"""
# get spot coordinate
spot_z, spot_y, spot_x = spot
# get spot and background radii
crop_z, crop_yx, crop_yx = crop_spot
w_bgd_z, w_bgd_y, w_bgd_x = w_bgd
# crop a volume around spot
z_min = max(0, int(spot_z - crop_z))
z_max = min(image.shape[0], int(spot_z + crop_z))
y_min = max(0, int(spot_y - crop_yx))
y_max = min(image.shape[1], int(spot_y + crop_yx))
x_min = max(0, int(spot_x - crop_yx))
x_max = min(image.shape[2], int(spot_x + crop_yx))
# Crop image around detected position
image_crop = image[z_min:z_max + 1,
y_min:y_max + 1,
x_min:x_max + 1].copy().astype(np.float64)
# Measure signal as maxim value in this image
signal = image_crop.max()
# Extract image background as outermost layers of cropped image
image_crop[w_bgd_z:-w_bgd_z, w_bgd_y:-w_bgd_y, w_bgd_x:-w_bgd_x] = -1.
image_background = image_crop[image_crop > 0]
# compute background and noise
background = image_background.mean()
noise = max(image_background.std(), 10e-6)
# compute SNR
snr = max(0, (signal - background)) / noise
return snr, signal, background, noise
Add cell label while extracting cell level results.
ModuleNotFoundError Traceback (most recent call last)
/var/folders/45/8191dpy56n373_4kt1cgwb2w0000gn/T/ipykernel_35809/740557443.py in
2 import bigfish
3 import bigfish.stack as stack
----> 4 import bigfish.multistack as multistack
5 print("Big-FISH version: {0}".format(bigfish.version))
ModuleNotFoundError: No module named 'bigfish.multistack'
Integrate continuous integration in the repository.
When running check_input_data(path_input)
on Windows, the example data is not downloaded.
The following error is shown
downloading experience_1_dapi_fov_1.tif...
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
in
4
5 # check input images are loaded
----> 6 check_input_data(path_input)
~\Anaconda3\envs\fq-imjoy\lib\site-packages\data\utils.py in check_input_data(input_directory)
56 stack.load_and_save_url(url_input_dapi,
57 input_directory,
---> 58 filename_input_dapi)
59 stack.check_hash(path, hash_input_dapi)
60
~\Anaconda3\envs\fq-imjoy\lib\site-packages\bigfish\stack\utils.py in load_and_save_url(remote_url, directory, filename)
676
677 # download and save data
--> 678 urlretrieve(remote_url, path)
679
680 return
~\Anaconda3\envs\fq-imjoy\lib\urllib\request.py in urlretrieve(url, filename, reporthook, data)
255 # Handle temporary file setup.
256 if filename:
--> 257 tfp = open(filename, 'wb')
258 else:
259 tfp = tempfile.NamedTemporaryFile(delete=False)
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\Work\\Documents\\data-test\\fish-quant\\big-fish\\data\\input\\experience_1_dapi_fov_1.tif'
Hello,
I'm trying to use the automated threshold detection feature of bigfish. However, I find that the threshold is constantly estimated to be too high, leading to a lot of missed spots.
When looking at the elbow plot, you can clearly see the elbow to the left of the set threshold, and the selected threshold does not actually lie on top of any bend in the line. Any idea what could be the reason for this and how to improve the threshold setting?
Thanks for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.