krishnaswamylab / magic Goto Github PK

MAGIC (Markov Affinity-based Graph Imputation of Cells), is a method for imputing missing values restoring structure of large biological datasets.

License: GNU General Public License v2.0

Python 0.92% Jupyter Notebook 59.48% MATLAB 0.54% R 0.37% Makefile 0.01% M 0.02% HTML 38.67%

magic's Introduction

Markov Affinity-based Graph Imputation of Cells (MAGIC)

Markov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm for denoising high-dimensional data most commonly applied to single-cell RNA sequencing data. MAGIC learns the manifold data, using the resultant graph to smooth the features and restore the structure of the data.

To see how MAGIC can be applied to single-cell RNA-seq, elucidating the epithelial-to-mesenchymal transition, read our publication in Cell.

David van Dijk, et al. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. 2018. Cell.

MAGIC has been implemented in Python, Matlab, and R.

To get started immediately, check out our tutorials:

Python

R

Magic reveals the interaction between Vimentin (VIM), Cadherin-1 (CDH1), and Zinc finger E-box-binding homeobox 1 (ZEB1, encoded by colors).

Python
- Installation
  - Installation with pip
  - Installation from GitHub
- Usage
  - Quick Start
  - Tutorials
Matlab
- Instructions for the Matlab version
R
- Installation
  - Installation from CRAN
  - Installation from GitHub
- Usage
  - Quick Start
  - Tutorials
Help

Python

Installation

Installation with pip

To install with pip, run the following from a terminal:

pip install --user magic-impute

Installation from GitHub

To clone the repository and install manually, run the following from a terminal:

git clone git://github.com/KrishnaswamyLab/MAGIC.git
cd MAGIC/python
python setup.py install --user

Usage

Quick Start

The following code runs MAGIC on test data located in the MAGIC repository.

import magic
import pandas as pd
import matplotlib.pyplot as plt
X = pd.read_csv("MAGIC/data/test_data.csv")
magic_operator = magic.MAGIC()
X_magic = magic_operator.fit_transform(X, genes=['VIM', 'CDH1', 'ZEB1'])
plt.scatter(X_magic['VIM'], X_magic['CDH1'], c=X_magic['ZEB1'], s=1, cmap='inferno')
plt.show()
magic.plot.animate_magic(X, gene_x='VIM', gene_y='CDH1', gene_color='ZEB1', operator=magic_operator)

Tutorials

You can read the MAGIC documentation at https://magic.readthedocs.io/. We have included two tutorial notebooks on MAGIC usage and results visualization for single cell RNA-seq data.

EMT data notebook: http://nbviewer.jupyter.org/github/KrishnaswamyLab/MAGIC/blob/master/python/tutorial_notebooks/emt_tutorial.ipynb

Bone Marrow data notebook: http://nbviewer.jupyter.org/github/KrishnaswamyLab/MAGIC/blob/master/python/tutorial_notebooks/bonemarrow_tutorial.ipynb

Matlab

Instructions for the Matlab version

run_magic.m -- MAGIC imputation function
test_magic.m -- Shows how to run MAGIC. Also included is a function for loading 10x format data (load_10x.m)

R

Installation

To use MAGIC, you will need to install both the R and Python packages.

If python or pip are not installed, you will need to install them. We recommend Miniconda3 to install Python and pip together, or otherwise you can install pip from https://pip.pypa.io/en/stable/installing/.

Installation from CRAN

In R, run this command to install MAGIC and all dependencies:

install.packages("Rmagic")

In a terminal, run the following command to install the Python repository.

pip install --user magic-impute

Installation from GitHub

To clone the repository and install manually, run the following from a terminal:

git clone git://github.com/KrishnaswamyLab/MAGIC.git
cd MAGIC/python
python setup.py install --user
cd ../Rmagic
R CMD INSTALL .

Usage

Quick Start

After installing the package, MAGIC can be run by loading the library and calling magic():

library(Rmagic)
library(ggplot2)
data(magic_testdata)
MAGIC_data <- magic(magic_testdata, genes=c("VIM", "CDH1", "ZEB1"))
ggplot(MAGIC_data) +
  geom_point(aes(x=VIM, y=CDH1, color=ZEB1))

Tutorials

You can read the MAGIC tutorial by running help(Rmagic::magic). For a working example, see the Rmarkdown tutorials at http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/bonemarrow_tutorial.html and http://htmlpreview.github.io/?https://github.com/KrishnaswamyLab/MAGIC/blob/master/Rmagic/inst/examples/emt_tutorial.html or in Rmagic/inst/examples.

Help

If you have any questions or require assistance using MAGIC, please contact us at https://krishnaswamylab.org/get-help.

magic's People

Contributors

Stargazers

Watchers

Forkers

rintukutum lkmklsmn kmoon3 cyang-2014 iandriver ml-lab vincent6liu olgabot dpeerlab genesofeve iosonofabio caot jellepiepenbrock ruchira-ray airysen dongjt0727 chlee-tabin zhangxf-ccnu yaelba tomkellygenetics him72 rymdpiloten tyoung1221 dotafterfootball eegk anu-bioinfo dejoelson taoshengxu yurasong flo-compbio esadr csbioazim softbear n01261 sandy4321 rowhit bacemdatascience cloudfora jcastr01 volkerbergen dustincys nsavalia23 xpl1986 ran485 hzongyao shicheng-guo drizzlezyk atarashansky mbernste poshine p4rkerw kpandey008 metricix vishalbelsare jiahuaqu nvrivera davidtingley zb-wcm jianguozhou3 yuanzhiyuan jpickard1 sadiexiaoyu mossi8 xingzhis noissee python-repository-hub biosyy mpmbq2 jsgro dumitru1216 kunppbu evertonrocha2 nbahti mariusrklein n1-inc yamajackr ambre339 yonda310 nitanshuj szawan jinhuili-lab epickack

magic's Issues

Issues with MAGIC GUI

Hi,

I've been trying to run MAGIC to analyze some single cell RNA seq data, but I'm running up against a number of issues that makes it hard to use the program. I installed MAGIC on my Mac which runs OS 10.12.5.

Here is one error pertaining to loading a saved session:

Exception in Tkinter callback
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tkinter/init.py", line 1699, in call
return self.func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.6/bin/magic_gui.py", line 412, in processData
scdata = magic.mg.SCData.load(os.path.expanduser(self.dataFileName))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/magic/mg.py", line 140, in load
scdata = cls(data['_data'], data['_metadata'])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/magic/mg.py", line 96, in init
if not data_type in ['sc-seq', 'masscyt']:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/generic.py", line 955, in nonzero
.format(self.class.name))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I believe I have Python 3.6. When I type 'Python' in Terminal the Python version that is returned is 2.7, however, so frankly I'm not sure if it's the right one that I'm using. Any help would be appreciated.

cannot import properly

I installed magic on a mac, and if I try loading it in the magic directory, I get this error:

/usr/local/lib/python3.6/site-packages/matplotlib/init.py:1405: UserWarning:
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

warnings.warn(_use_error_msg)

That still works.

If I try loading it in any other directory, I get an error:

Traceback (most recent call last):
File "", line 1, in
File "/Users/chenlingantelope/ResearchProjects/scRNA_simulation/magic.py", line 16, in
scdata = magic.mg.SCData.from_csv(os.path.expanduser(rawcounts),
AttributeError: module 'magic' has no attribute 'mg'

And I can run it in python environment but not at all from command line. Any suggestions?

TypeError: init() got an unexpected keyword argument 'metavar'

Just installed the most recent update, however it got the mentioned error. What might be the issue?

$ MAGIC.py
Traceback (most recent call last):
  File "/opt/anaconda3/bin/MAGIC.py", line 120, in <module>
    main(sys.argv[1:])
  File "/opt/anaconda3/bin/MAGIC.py", line 82, in main
    args = parse_args(args)
  File "/opt/anaconda3/bin/MAGIC.py", line 69, in parse_args
    help='Plot R2 plot generated in optimal t calculation (Default=False).')
  File "/opt/anaconda3/lib/python3.6/argparse.py", line 1334, in add_argument
    action = action_class(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'metavar'

$ git log -n 1
commit 044038e4b70a79af4950a63075c3aaeb51eea893
Merge: 0929f2b 03bb376
Author: Pooja Kathail <[email protected]>
Date:   Fri Jan 26 17:19:52 2018 -0500

    fix merge conflicts

Installation of R package

Hi I am trying to install the package in my local system but I am getting some errors

Downloading GitHub repo pkathail/magic@master
from URL https://api.github.com/repos/pkathail/magic/zipball/master
Installing Rmagic
"C:/PROGRA~~1/R/R-34~~1.1/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD
INSTALL
"C:/Users/ydhungan/AppData/Local/Temp/RtmpMfhQae/devtools2f6842e33966/pkathail-magic-25f1c5f"
--library="C:/Users/ydhungan/Documents/R/win-library/3.4" --install-tests

installing source package 'Rmagic' ...
** libs

*** arch - i386
no DLL was created
ERROR: compilation failed for package 'Rmagic'

removing 'C:/Users/ydhungan/Documents/R/win-library/3.4/Rmagic'
Installation failed: Command failed (1)

Can you please let me know whats going on here? Thank you for the help.

kNN-DREMI

Hi, is it possible to add about how to calculate kNN-DREMI in the tutorial? Thank you!

Rationale for filtering based on library size

Dear MAGIC developers,
I am running the example notebook and I'd like to understand how to proper filter out cells based on library size. From the notebook:

The first step in data processing for MAGIC is to determine the molecule per cell and molecule per gene cutoffs with which to filter the data.

From these histograms, choose the appropriate cutoffs to filter the data.

Could you point me to some rationale on how to select the appropriate cutoffs? If I read the paper correctly, this is not defined in your original paper and I think this may be defined elsewhere but I could not find any paper about it online.

Thank you,
Francesco

error loading phenograph

Hello,

Thanks for developing such a good tool. However I found some problem installing it on my macbook.

I used anaconda version of python.

After running the 'pip3 install .' command, everything seems okay:

Requirement already satisfied (use --upgrade to upgrade): magic==0.0 from file:///Users/rui/AUDREY_LAB/imputation/magic/magic in /Users/rui/anaconda/lib/python3.5/site-packages
Requirement already satisfied: numpy>=1.10.0 in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: pandas>=0.18.0 in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: scipy>=0.14.0 in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: matplotlib in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: seaborn in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: sklearn in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: networkx in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: fcsparser in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: statsmodels in /Users/rui/anaconda/lib/python3.5/site-packages (from magic==0.0)
Requirement already satisfied: python-dateutil>=2 in /Users/rui/anaconda/lib/python3.5/site-packages (from pandas>=0.18.0->magic==0.0)
Requirement already satisfied: pytz>=2011k in /Users/rui/anaconda/lib/python3.5/site-packages (from pandas>=0.18.0->magic==0.0)
Requirement already satisfied: cycler in /Users/rui/anaconda/lib/python3.5/site-packages (from matplotlib->magic==0.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,>=1.5.6 in /Users/rui/anaconda/lib/python3.5/site-packages (from matplotlib->magic==0.0)
Requirement already satisfied: scikit-learn in /Users/rui/anaconda/lib/python3.5/site-packages (from sklearn->magic==0.0)
Requirement already satisfied: decorator>=3.4.0 in /Users/rui/anaconda/lib/python3.5/site-packages (from networkx->magic==0.0)
Requirement already satisfied: setuptools in /Users/rui/anaconda/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg (from fcsparser->magic==0.0)
Requirement already satisfied: six>=1.5 in /Users/rui/anaconda/lib/python3.5/site-packages (from python-dateutil>=2->pandas>=0.18.0->magic==0.0)

But when I run magic_gui.py, I got and error. Here is the error message:

magic_gui.py
Traceback (most recent call last):
File "/Users/rui/anaconda/bin/magic_gui.py", line 12, in
import magic
File "/Users/rui/anaconda/lib/python3.5/site-packages/magic/init.py", line 2, in
from . import mg
File "/Users/rui/anaconda/lib/python3.5/site-packages/magic/mg.py", line 42, in
import phenograph
ImportError: No module named 'phenograph'

I thought this may be an issue of intalling 'phenograph', so I did: 'pip3 install phenograph'

But I got the following error message:
'Collecting phenograph
Could not find a version that satisfies the requirement phenograph (from versions: )
No matching distribution found for phenograph'

Any suggestions? Thanks!

installation on cluster

Is there a way to install magic in a local folder and run it on a cluster?

Thanks!

re-producing the figures in MAGIC paper

Hello! In order to make sure we are using MAGIC correctly, I’ve been trying to perform MAGIC imputation on the mouse bone marrow dataset, and compare the result with Fig.3C-H, Fig.12D, and Fig. 14B.

I downloaded GSE72857_umitab.txt.gz from GEO database(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72857). Skipped filtering. Normalized the data by library size. Log-transformed the data with log10(x+1). Imported the normalized and log-transformed data into magic without normalization. Used the default setting (n_pca_components=20, random_pca=True, t=6, k=30, ka=10, epsilon=1, rescale_percent=99) in magic (in python console). Created gene-gene biaxial plot before and after magic imputation with ‘scdata.scatter_gene_expression([gene1, gene2])’ and ‘scdata.magic.scatter_gene_expression([gene1, gene2])’.

For most genes, the biaxial plot for gene-gene relationship after magic looks similar, but different from the trend you showed in the MAGIC paper. Sometimes, the trend is less obvious (more smeared) in my reproduced figures. Sometimes, there are an isolated ‘island’ of cells lying around the lowly expressed region (near axis origin). Also, the original data before magic processing looks not exactly the same with the gene-gene scatterplot in your paper.

I’ve tried t = 9, n_pca_components=1000, too. Because you mentioned that you used PCs that explains 70% of the variations in the paper. The parameter t = 9, seems to help, but n_pca_components=1000 makes the reproduced figure more different from the figures in your paper.

Is there something I can to do to reproduce the figure exactly the same like the ones in your paper? We want the exact figures before and after magic processing.

Thanks!

correcting for batch effects

Hi,
is there a way of filter out batch effects before / during the imputation? i.e. if there is a component that explains a substantial part of the variance and it captures a batch effect how can I discard it from the imputation?

Thanks,

remove "MAGIC " from output

currently writing scdata.magic.data as a text file results with "MAGIC " is prefixed to all the genes which makes this file incompatible with further analysis requiring standard gene names. Not having a proper method to export the data makes this package hard to use outside of running the examples

Discrepancy between the MAGIC paper and Matlab code

Hello Pooja:

Could you possibly answer the following question. In the paper a Gaussian kernel is used and it is recommended to set ka = 1/3 * k, apparently so that the corresponding sigma results in 1/3 of the nearest neighbors having substantial weight. Suppose we take a fixed pair of points, i and j. For simplicity, assume that after knn search is performed we get that i and j are mutual neighbors, Euclidean distances Dij = Dji = D and also sigma_ij = sigmaji = sigma.

After these lines

disp 'Computing distances'
[idx, dist] = knnsearch(data_pc, data_pc, 'k', k);

disp 'Adapting sigma'
dist = bsxfun(@RDivide, dist, dist(:,ka));

the adusted distance is D/sigma. Note that D is not the squared Euclidean distance because knnsearch() does take the square root.

After this line:

disp 'Symmetrize distances'
W = W + W';

the (i,j) and (j,i) elements in W are equal to 2D/sigma. Then you compute the kernel here:

if epsilon > 0
disp 'Computing kernel'
[i,j,s] = find(W);
i = [i; (1:N)'];
j = [j; (1:N)'];
s = [s./(epsilon^2); zeros(N,1)];
s = exp(-s);
W = sparse(i,j,s);
end

and the corresponding elements are set to exp(-2D/sigma). Apparently that's not a Gaussian kernel, because D is never squared (instead, epsilon is squared, but it's 1 by default). According to the paper, we should have ended up with exp{-(D^2) / (2 * sigma^2)}. Could you please clarify the discrepancy?

Regards,
Nik

Test data

Hi, is there any way you could provide a small test data set?

Log-transformation before MAGIC

I noticed that log-transformation before MAGIC shrinks the explained variance of the principal components.

Using

scdata.plot_pca_variance_explained(n_components=n)

was useful to understand how to set up the number of pc parameter in MAGIC. However, with log-transformed data it is not clear which value to choose for this parameter since cumulative explained variance for the first 100 components usually reaches only 10-25% of total explained variance.

I am aware that I shouldn't use rescaling (by setting it to 0) for log-transformed data. However, it seems that log-transformed data can be used with MAGIC. Is it correct and, how to choose a cutoff for number of principal components?

Attached question from stackexchange.

mmread dependency issue

Hello,
Which mmread is the Matlab code trying to use and how can I get it?
Thanks

tutorial link not working

It sees that the tutorial link is saying 404 not found:
http://nbviewer.jupyter.org/github/pkathail/magic/blob/magic_develop/notebooks/Magic_single_cell_RNAseq.ipynb

ModuleNotFoundError: No module named '_tkinter'

Hello,
May I get help solving this?

MacBook-Pro:magic $ magic_gui.py
Traceback (most recent call last):
File "/usr/local/bin/magic_gui.py", line 5, in
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg, NavigationToolbar2TkAgg
File "/usr/local/lib/python3.6/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in
from six.moves import tkinter as Tk
File "/usr/local/lib/python3.6/site-packages/six.py", line 92, in get
result = self._resolve()
File "/usr/local/lib/python3.6/site-packages/six.py", line 115, in _resolve
return _import_module(self.mod)
File "/usr/local/lib/python3.6/site-packages/six.py", line 82, in _import_module
import(name)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tkinter/init.py", line 36, in
import _tkinter # If this fails your Python may not be configured for Tk
ModuleNotFoundError: No module named '_tkinter'

Thanks, A

install.py doesn't correctly request pytables

When running the packaged setup.py script ("pip install ." from a fresh Ubuntu LTS 16.04 image with miniconda installed), the script completes successfully, but MAGIC fails to run with the following error:

  File "/home/ubuntu/miniconda3/bin/MAGIC.py", line 8, in <module>
    import magic
  File "/home/ubuntu/miniconda3/lib/python3.6/site-packages/magic/__init__.py", line 2, in <module>
    from . import mg
  File "/home/ubuntu/miniconda3/lib/python3.6/site-packages/magic/mg.py", line 12, in <module>
    import tables
ModuleNotFoundError: No module named 'tables'

This is rectified by installing pytables separately.

This may also underlie the discrepancy referenced in issue #39 (he installs pytables when explicit getting dependencies, but everything else gets properly installed as needed).

Sklearn dependency incorrect

Should be 'scikit-learn' in the setup.py file, not sklearn. Sklearn actually points to a random malformed package (https://pypi.python.org/pypi/sklearn/0.0) that seems to just depend on scikit-learn.

How to install MAGIC

Your instructions for installation do not work for me:

git clone git://github.com/pkathail/magic.git
cd magic
sudo -H pip3 install .

However, this does work (and it doesn't require sudo):

# First, install miniconda https://conda.io/miniconda.html

# Next, install dependencies
conda install numpy scipy statsmodels scikit-learn pytables seaborn networkx

# Finally, install MAGIC
git clone git://github.com/pkathail/magic.git
cd magic
python setup.py install

Running the python implementation in R

Hello,

Would it be possible to write a tutorial or script showing how to use this python implementation in R?

Thanks,
Assaf

divide by zero error if matrix contains any genes with zero total counts

these should just be automatically filtered out, not cause an error.

unable to import gene expression matrix in csv format

Hi, i have scRNAseq gene expression matrix in csv format (with rows as genes, columns as cells), but i couldn't perform data statistics or any analysis in MAGIC python GUI. The error i got is shown as the following:

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/magic/mg.py:465: RuntimeWarning: divide by zero encountered in log10
rowsum = np.log10(self.data.sum(axis=1))
-inf
7.51657024826
Exception in Tkinter callback
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/tkinter/init.py", line 1549, in call
return self.func(*args)

R package of magic

Hello,

I would like to know if the R version of magic is under construction since I ran into a couple of errors.

Firstly, the R function is missing @import or @importFrom statements and therefore an entry in the NAMESPACE file, so that one needs to manually load the R packages listed under Imports of the DESCRIPTION file.
In line of this problem, the functions developed for sparseMatrix objects from Matrix are not properly called, e.g. transpose function t(). This problem can be avoided by explicitly specifying the namespace of the functions, e.g. Matrix::t(). #60
Secondly, while running the function line by line, I found that the Markov normalization ( W <- W / rowSums(W) ) creates a large vector and my local desktop R is not able to handle that (memory limits). It runs on our number crunchers, though. Also the subsequent conversion of as.matrix(W), creates a 2.4 Gb matrix. I was testing magic on a medium sized matrix with 18000 genes measured across 58 cells.
Lastly, the diffusion step ( W_t <- W %^% t ) throws an error (Error in W %^% t : could not find function "%^%"). I have to say, I have never seen this operator in R before.

Kind regards
Beate

meaning of parameters

Hi,
I'm trying to understand the meaning of the parameters.
While I understand n_pca_components and t, I don't get which is the ka mentioned in the paper, is it knn or knn_autotune? Also, what is epsilon?

Thanks!

How to reproduce Figure 7A of original paper to estimate optimal values for t and ka

I am trying to understand how to set parameters for t and ka since I noticed that changing them in my case severely affects how the values are imputed.
Does MAGIC have a method to reproduce a plot similar to the the Figure 7A?

Figure 7: Finding the optimal diffusion time (t) using intrinsic dimensionality estimation. A) Graph shows intrinsic dimensionality (as measured by correlation dimension) computed on EMT data for different amounts of diffusion time (t) for three values of adaptive kernel (𝑘𝑎 = 4, 10, 30). The peak values suggest optimal diffusion times that restore maximal dimensionality (information) to the data.

Thanks,
Francesco

404 on ipython nb in README

The current (February 27, 2017) link to the EMT iPython notebook appears to be dead?

mg.py - TypeError: Can't convert 'int' object to str implicitly

I found the following bug in mg.py on line 613 when using the python package:

Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/tkinter/init.py", line 1549, in call
return self.func(*args)
File "/usr/local/bin/magic_gui.py", line 624, in _runTSNE
n_iter=self.iterVar.get(), theta=self.angleVar.get())
File "/usr/local/lib/python3.5/site-packages/magic/mg.py", line 613, in run_tsne
index=self.data.index, columns=[self._data_prefix + 'tSNE' + i for i in range(1, 3)])
File "/usr/local/lib/python3.5/site-packages/magic/mg.py", line 613, in
index=self.data.index, columns=[self._data_prefix + 'tSNE' + i for i in range(1, 3)])
TypeError: Can't convert 'int' object to str implicitly

This can be fixed by converting i to string:
self.tsne = pd.DataFrame(tsne.fit_transform(data),index=self.data.index, columns=[self._data_prefix + 'tSNE' + str(i) for i in range(1, 3)])

Error when trying to run magic

I'm having trouble getting MAGIC.py to run. Tried the gui and command line. I'm using macOS 10.12.5 Python stops responding and it gives this error message:

(python3) SLE-BTR-C02TN2K4HF1R:magic kevin$ MAGIC.py -d /Users/kevin/Desktop/test.csv -o /Users/kevin/Desktop/ csv
^CTraceback (most recent call last):
File "/usr/local/bin/MAGIC.py", line 112, in
main(sys.argv[1:])
File "/usr/local/bin/MAGIC.py", line 95, in main
scdata.filter_scseq_data(filter_cell_min=args.mols_per_cell_min, filter_cell_max=args.mols_per_cell_max)
File "/usr/local/lib/python3.6/site-packages/magic/mg.py", line 476, in filter_scseq_data
self.data = self.data.ix[self.data.index[to_keep], :].astype(np.float32)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 118, in getitem
return self._getitem_tuple(key)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 856, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1059, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1118, in _getitem_iterable
keyarr)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2916, in _reindex_non_unique
indexer, missing = self.get_indexer_non_unique(target)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2708, in get_indexer_non_unique
indexer, missing = self._engine.get_indexer_non_unique(tgt_values)
File "pandas/_libs/index.pyx", line 356, in pandas._libs.index.IndexEngine.get_indexer_non_unique (pandas/_libs/index.c:8161)
File "/usr/local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1136, in resize
a = concatenate((a,)*n_copies)
KeyboardInterrupt

Introducing artificial dropout issue

For Fig.8 MAGIC paper, it's stated that random values sampled from an exponential distribution were subtracted from expression values, such that 0%, 60%, 80%, and 90% of the values are 0 after down-sampling. In Fig.1, it's stated that 'artificial dropout' were introduced by randomly setting 80% of the values to 0.
Which method is more appropriate to mimic the 'dropout' in real RNA-seq data? How is the method in Fig. 8 implemented? Will it be appropriate if we set up a threshold and set any expression value lower than the threshold to 0?
Thanks!

optional deps for Python package

Hi and thanks for your awesome work!

I'm including MAGIC into a larger RNA-Seq pipeline and I'm wondering whether any of the Python MAGIC dependencies could be made optional, e.g.:

I/O stuff (fcsparser, tables)
plotting stuff (e.g. seaborn)
statistics stuff (statmodels)

In particular, the I/O deps are used only in mg.py for parsers that maybe could made optional? For instance, the FCS parsing method has line 328:

metadata_channels=['Time', 'Event_length', 'DNA1', 'DNA2', 'Cisplatin', 'beadDist', 'bead1']):

which is probably irrelevant for most users, yet makes fcsparser a hard dependency for MAGIC.

Notice that setuptools has a mechanism in place for optional deps, you don't need to code much:

http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies

What do you think? I can submit a PR if you want, it's a three liner...

request: port to bioconda

conda is an excellent package manager for python3 and seems like is gaining in popularity. Could you create a conda recipe for magic?

R Magic error in "Computing Kernel"

Hi Magic developers!

I kept running into the problem using R magic when it is Computing Kernel.
It gives error (please see the attached screenshot) :

Error in Q$i : $ operator is invalid for atomic vectors

What am I doing wrong? Any help would greatly appreciated. Thank you so much!

interactive 3d plots compatibility

Dear Magic developer,

Any advice on how to make the 3d plots interactive (rotate) in ipytnb? or is it just impossible with Jupyter notebook?

Thanks a ton,
Wen

error on import magic: No module named 'Phenograph'

Hi,
I installed magic on a Mac Book pro (master branch). Installation was successful but I got an error when I import magic: ImportError: No module named 'phenograph'

It seems that it didn't install some dependencies

Thanks for your help,

log-transform data

Is the input data supposed to be log-transformed? It doesn't seem to be in the examples

Error when trying different -k and -l parameters

Hi,
I am very excited in using MAGIC in my data but some issues are happening when I try different parameters. I could install the software with no issues, and when I run in the default mode, I get no errors. However, when I try to change the -k:

MAGIC.py csv -d input.csv -o output.csv --cell-axis columns -k 15

doing PCA
Computing distances
Traceback (most recent call last):
File "/usr/local/bin/MAGIC.py", line 112, in
main(sys.argv[1:])
File "/usr/local/bin/MAGIC.py", line 104, in main
k=args.k, ka=args.ka, epsilon=args.epsilon, rescale_percent=args.rescale)
File "/usr/local/lib/python3.5/dist-packages/magic/mg.py", line 1140, in run_magic
k=k, ka=ka, epsilon=epsilon, rescale=rescale_percent)
File "/usr/local/lib/python3.5/dist-packages/magic/MAGIC_core.py", line 21, in magic
distance_metric='euclidean', ka=ka)
File "/usr/local/lib/python3.5/dist-packages/magic/MAGIC_core.py", line 95, in compute_markov
nbrs = NearestNeighbors(n_neighbors=k, metric=distance_metric).fit(data)
File "/usr/local/lib/python3.5/dist-packages/sklearn/neighbors/base.py", line 803, in fit
return self._fit(X)
File "/usr/local/lib/python3.5/dist-packages/sklearn/neighbors/base.py", line 229, in _fit
self.n_neighbors < self._fit_X.shape[0] // 2) and
TypeError: unorderable types: str() < int()

This happens even when I specify the -k parameter with the default value (30). I also tried changing -ka (5), as recommended to be ~3 times lower than -k. When I don't specify the -k parameter, it runs nicely.

Another issue is when I set the -l parameter (log transformation), like -l 1, I get the following error:

MAGIC.py csv -d input.csv -o output.csv --cell-axis columns -l 1

Traceback (most recent call last):
File "/usr/local/bin/MAGIC.py", line 112, in
main(sys.argv[1:])
File "/usr/local/bin/MAGIC.py", line 101, in main
scdata.log_transform_scseq_data(pseudocount=args.log_transform)
File "/usr/local/lib/python3.5/dist-packages/magic/mg.py", line 514, in log_transform_scseq_data
self.data = np.log(np.add(self.data, pseudocount))
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')

I really appreciate any advice on how to overcome these issues.
Thanks a lot.
Gustavo

'~/Documents/Sophomore/Lab/wishbone/data/sdata_nn_TGFb_day_8_10.csv'

Hi,
I was wondering where I can find the data listed in the jupyter notebook at notebooks (https://github.com/pkathail/magic/blob/63fa1feaa47c02a753de2df0432cd7b1cdb46d0e/notebooks/Magic_single_cell_RNAseq.ipynb).
Thanks,
Yun

R Implementation?

Any chance of an R implementation in the future? Could work nicely with other R-based processing pipelines for scRNA-Seq data (eg. scater, monocle)

Following MAGIC Notebook Tutorial

Hello,
I am fairly new to scRNA-seq analysis, but I found your tool has great potential to resolve the overwhelming zero values in our lab's data set.

Could you please provide more description on the Data preprocessing section in the notebook tutorial? I am not sure what the parameters (CELL_MIN, CELL_MAX, GENE_NONZERO, GENE_MOLECULES) signify, and how to set them appropriately to impute zero values in the data set.

I appreciate your help,
Mustafa

Install fails macOS

Hi, I've had some issues installing the package on macOS Sierra (10.12.3). Below is the error message after following install instructions:

(py3)[~/Documents/GitHub/magic]$ pip3 install .                                                                            *[develop]
Processing /Users/timstuart/Documents/GitHub/magic
    Complete output from command python setup.py egg_info:
    Collecting git+https://github.com/jacoblevine/phenograph.git
      Cloning https://github.com/jacoblevine/phenograph.git to /private/var/folders/9s/hygtzxkd46j8g85t08_9wtlh0000gn/T/pip-79gv61do-build
      Requirement already satisfied (use --upgrade to upgrade): PhenoGraph==1.5.2 from git+https://github.com/jacoblevine/phenograph.git in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages
    Requirement already satisfied: setuptools>=18.0.1 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from PhenoGraph==1.5.2)
    Requirement already satisfied: numpy>=1.9.2 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from PhenoGraph==1.5.2)
    Requirement already satisfied: scipy>=0.16.0 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from PhenoGraph==1.5.2)
    Requirement already satisfied: scikit_learn>=0.17 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from PhenoGraph==1.5.2)
    Requirement already satisfied: psutil>4 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from PhenoGraph==1.5.2)
    Requirement already satisfied: six>=1.6.0 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from setuptools>=18.0.1->PhenoGraph==1.5.2)
    Requirement already satisfied: appdirs>=1.4.0 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from setuptools>=18.0.1->PhenoGraph==1.5.2)
    Requirement already satisfied: packaging>=16.8 in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from setuptools>=18.0.1->PhenoGraph==1.5.2)
    Requirement already satisfied: pyparsing in /Users/timstuart/.virtualenvs/py3/lib/python3.5/site-packages (from packaging>=16.8->setuptools>=18.0.1->PhenoGraph==1.5.2)
    running egg_info
    creating pip-egg-info/magic.egg-info
    writing pip-egg-info/magic.egg-info/PKG-INFO
    writing requirements to pip-egg-info/magic.egg-info/requires.txt
    writing dependency_links to pip-egg-info/magic.egg-info/dependency_links.txt
    writing top-level names to pip-egg-info/magic.egg-info/top_level.txt
    writing manifest file 'pip-egg-info/magic.egg-info/SOURCES.txt'
    reading manifest file 'pip-egg-info/magic.egg-info/SOURCES.txt'
    writing manifest file 'pip-egg-info/magic.egg-info/SOURCES.txt'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/9s/hygtzxkd46j8g85t08_9wtlh0000gn/T/pip-upwnp3_8-build/setup.py", line 52, in <module>
        shutil.copytree(setup_dir + '/data/', data_dir)
      File "/Users/timstuart/.virtualenvs/py3/lib/python3.5/shutil.py", line 303, in copytree
        names = os.listdir(src)
    FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/9s/hygtzxkd46j8g85t08_9wtlh0000gn/T/pip-upwnp3_8-build/data/'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/9s/hygtzxkd46j8g85t08_9wtlh0000gn/T/pip-upwnp3_8-build/

It seems like this is cause by the setup script trying to access a non-existent /data directory. I commented out the last few lines in the setup script (https://github.com/pkathail/magic/blob/develop/setup.py#L49 onwards) and the install ran successfully, although there is no test data.

How can I export the imputed/modified matrix?

Hi @dvdijk,

After running magic, how can I export the imputed matrix file (in python)?
i.e. to convert the SCData file into a regular data frame?

Thank you,
Ajit.

R package 'Error in t.default(W) : argument is not a matrix'

Hi Magic developer. I kept running into the problem when it says "Symmetrize distances", and keeps popping up an error saying 't.default(W) : argument is not a matrix'.
Any help would greatly appreciated!

Regards,
Wen

Gene name error in the tutorial

In the tutorial (http://nbviewer.jupyter.org/github/pkathail/magic/blob/develop/notebooks/Magic_single_cell_RNAseq.ipynb), it says:

2D scatter plot after MAGIC:
In [6]: fig, ax = scdata.magic.scatter_gene_expression(['VIM', 'CDH1'], color ='ZEB1')

However, the gene name after MAGIC is like: "MAGIC VIM"," MAGIC CDH1". So when I ran the above codes it showed "no such genes". It will be great if you can correct this in the tutorial.

Getting bad results using different normalization

Hi,

I am using MAGIC in comparison with a few different methods and I get unsatisfactory results in all benchmarks, which made me wonder if I am doing something wrong. I am masking parts of the expression matrix and predict it using MAGIC, then compare predictions with known values that were masked. I log transformed the expression data (log2(1+x), so all are positive), then norm-infinity normalized the results (so max value in each column is one). I used default parameters ( k = 30; ka = 10; npca = 20), and I increased t from 6 up to 100 (6 is absolutely terrible, by 100 it gets more reasonable). Still, I get bad correlation/relative error for predictions of MAGIC compared to the true values that were masked out in cross-validation. One of the datasets I tried this on is https://support.10xgenomics.com/single-cell-gene-expression/datasets/pbmc4k. Oh, and since my normalization preserves positivity, I rescaled to 99 % (which improved the results partially, but not enough).

Index out of range when changing ka,t,pca values

Hello,
I'm getting this error sometimes depending on which value I choose for ka, t and pca.
I guess there is an issue at some step where certain sets of values can't compute some value and this provides an out of range.

How could you catch that? If you could point me towards a solution, I'll be happy to implement since I'm already playing around those values.

  File "/home/proelli/programs/magic/src/magic/MAGIC.py", line 113, in <module>
    main(sys.argv[1:])
  File "/home/proelli/programs/magic/src/magic/MAGIC.py", line 105, in main
    k=args.k, ka=args.ka, epsilon=args.epsilon, rescale_percent=args.rescale)
  File "/home/proelli/programs/magic/src/magic/mg.py", line 1154, in run_magic
    k=k, ka=ka, epsilon=epsilon, rescale=rescale_percent)
  File "/home/proelli/programs/magic/src/magic/MAGIC_core.py", line 21, in magic
    distance_metric='euclidean', ka=ka)
  File "/home/proelli/programs/magic/src/magic/MAGIC_core.py", line 103, in compute_markov
    if lMaxTempIdxs == 0 or temp[lMaxTempIdxs] == 0:
IndexError: list index out of range

Recommended normalization methods

Hello,

I've been trying out MAGIC on my lab's drop-seq data. I've found that the results vary between different normalization methods. I just wanted to check and see if you've tested different methods and have any recommendations.

The two methods I've tried are just simple library size normalization and then library size normalization + log transformation. Looking at the results, using log transformed values seems to give nicer results (although I'm not sure how biological valid it is). With the log transformed results, gene expression of marker genes is boosted but only localized to the cell clusters we had already identified. Without log transforming the data, it looks like the gene expression is smeared across the entire dataset with most cells expressing a little bit of most genes.

I just wanted to check and see if this matches your experiences and if you have any recommendations? I also haven't tried playing around with any parameters, so maybe changing k or the number of diffusion steps could help?

Thanks,
Brian

Starting Magic: ImportError: No module named '_tkinter'

I'm on Ubuntu 17.04. Magic gave me the error above, so I installed tk and it worked

sudo apt-get install python3-tk

Output csv files from MAGIC?

Hello, I am really interested in the way how MAGIC could impute gene expression in order to fill in the abundant drop-out events in single-cell RNA-Seq, and I think it would be very helpful in my case. I am using MAGIC with python and I attempted to load in a csv file onto MAGIC, and my aim would be to get an output file perhaps also in the format of csv after applying the normalisation method of MAGIC. I've tried to do this using the command lines or the user-friendly python GUI, thanks a lot for developing this tool by the way! I noticed that the output could be saved as a pickle file, but is there a way to make a csv or any text format? Thank you.

save out to txt

Hello,

How do I save out the magic massaged results (cell by gene matrix) into txt file so that I can import to R?

Thank you.

krishnaswamylab / magic Goto Github PK

magic's Introduction

Markov Affinity-based Graph Imputation of Cells (MAGIC)

To get started immediately, check out our tutorials:

Python

R

Table of Contents

Python

Installation

Installation with pip

Installation from GitHub

Usage

Quick Start

Tutorials

Matlab

Instructions for the Matlab version

R

Installation

Installation from CRAN

Installation from GitHub

Usage

Quick Start

Tutorials

Help

magic's People

Contributors

Stargazers

Watchers

Forkers

magic's Issues

Recommend Projects

Recommend Topics

Recommend Org