liulab-dfci / maestro Goto Github PK
View Code? Open in Web Editor NEWSingle-cell Transcriptome and Regulome Analysis Pipeline
License: GNU General Public License v3.0
Single-cell Transcriptome and Regulome Analysis Pipeline
License: GNU General Public License v3.0
When installing MAESTRO (774ce46
) on top of satijalab/seurat/release/4.0.0
:
** byte-compile and prepare package for lazy loading
Error: object ‘CreateGeneActivityMatrix’ is not exported by 'namespace:Seurat'
Execution halted
ERROR: lazy loading failed for package ‘MAESTRO’
The file XXX.PredictedTFTop10.txt has list of transcription factors. The tutorial mentions target genes should be in XXX.GIGGLE folder, but I couldn't find it. Could you point out which file I should look for? Thanks!
Yuping
Hi!
A very good tool.
I now have five scATAC samples.How do I merge multiple samples?
Hi @liulab-dfci, I'm interested in trying out MAESTRO, and the Dockerfile recipe works great. One thing I noticed is that the conda recipe is currently managed under a custom channel (see https://anaconda.org/liulab-dfci/maestro). I was wondering if it would be possible to submit the recipe to bioconda, which makes developing other recipes that depend on MAESTRO easier to manage.
Best,
Mike
The current (v1.0.2) HTML report only includes figures and tables without explanations. Please add more texts explaining how to interpret the results in the report.
https://github.com/liulab-dfci/MAESTRO/tree/master/MAESTRO/html
according to https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02136-7
"scDblFinder achieved comparable or better accuracy than top alternatives while being the fastest method"
Please rename it as "mouse.brain.ALLEN"
The following line for calculating RP scores assumes that peaks are named as "chr_start_end" in peak matrix.
MAESTRO/MAESTRO/scATAC_genescore.py
Line 97 in ffdfffe
However, the standard output "filtered_peak_bc_matrix.h5" from 10X cellranger-atac may use chr:start-end", according to
https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/advanced/h5_matrices
Suggest replacing the separator in the peak column to a standard character (either "_" or "-") in MAESTRO R script after loading the H5 file.
Hi Chenfei,
Thank you for making MAESTRO available! I encountered an error while processing some 10x scRNA-seq data from mice.
It seems that MAESTRO couldn't find PredictedTFTop10.txt
. I have RABIT installed and specified the RABIT index in the yaml file.
[Mon Dec 9 15:03:42 2019]
rule scrna_report:
input: Result/Analysis, Result/QC/M1R1_MAESTRO_scRNA_read_distr.png, Result/QC/M1R1_MAESTRO_scRNA_read_quality.png, Result/QC/M1R1_MAESTRO_scRNA_NVC.png, Result/QC/M1R1_MAESTRO_scRNA_GCcontent.png, Result/QC/M1R1_MAESTRO_scRNA_genebody_cov.png, Result/QC/M1R1_MAESTRO_scRNA_cell_filtering.png
output: Result/M1R1_MAESTRO_scRNA_report.html
jobid: 1
Traceback (most recent call last):
File "/omg/home/username/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO/scRNA_report.py", line 36, in <module>
for line in open(cluster_regulator_file,"r").readlines():
FileNotFoundError: [Errno 2] No such file or directory: 'Result/Analysis/M1R1_MAESTRO.PredictedTFTop10.txt'
[Mon Dec 9 15:03:42 2019]
Error in rule scrna_report:
jobid: 1
output: Result/M1R1_MAESTRO_scRNA_report.html
shell:
python /omg/home/username/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO/scRNA_report.py M1R1_MAESTRO /home/username/Projects/maestro_eval/results/2019-12-06-first/M1R1_MAESTRO/M1R1 GRCm38 10x-genomics True
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/username/Projects/maestro_eval/results/2019-12-06-first/M1R1_MAESTRO/.snakemake/log/2019-12-09T150340.656779.snakemake.log
This appears to the last missing step in MAESTRO as rerunning the pipeline gave me:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 scrna_report
2
We need documentation of APIs. Since users need to know the options of each function, the data structures, in order to fine-tune their pipeline or skip certain steps. Currently, the only available documentation is the tutorial, which is not enough.
As shown in the current code to invoke macs2:
MAESTRO/MAESTRO/Snakemake/scATAC/Snakefile
Lines 138 to 152 in ffdfffe
The option '-n' contains a folder structure. This is not a recommended way to specify an output directory of macs2. Please use '--outdir' instead and keep the '-n' simple such as:
name = config["outprefix"] + "_all"
and
macs2 callpeak -g hs --outdir Result/Analysis -n {params.name} -B -q 0.05 --nomodel --extsize=50 --SPMR -t {input.bam}
The issue is also related to other places in this Snakemake file where MACS2 is invoked.
Dear Tao,
When I ran install_github("liulab-dfci/MAESTRO"), I encountered the below error. Have you ever encountered the same error?
Error: Failed to install 'unknown package' from GitHub:
Failed to connect to api.github.com port 443: Connection refused
Thanks very much for your help.
Best regards,
Leon.
Could you add it to travis.yml? Currently, only the conda
installation part is tested.
the genedistance parameter https://github.com/liulab-dfci/MAESTRO/blob/master/MAESTRO/Snakemake/scATAC/Snakefile#L1166
is not used in the shell command https://github.com/liulab-dfci/MAESTRO/blob/master/MAESTRO/Snakemake/scATAC/Snakefile#L1174
Using default 10,000 bp.
Hi,
I got the error below when running the pipeline starting from bam. Any suggestion what I should try to fix this? Also I wonder if the pipeline can be started at any point. In my case here, I would like to resume the pipeline from this scatac_qcfilter step once I figure out the problem, rather than starting over the whole pipeline again.
rule scatac_qcfilter:
input: Result/Analysis/SI_25417_peak_count.h5
output: Result/QC/SI_25417_filtered_peak_count.h5
jobid: 4
benchmark: Result/Benchmark/SI_25417_QCFilter.benchmark
Traceback (most recent call last):
File "/root/miniconda3/envs/MAESTRO/bin/MAESTRO", line 4, in
import('pkg_resources').run_script('MAESTRO==1.2.1', 'MAESTRO')
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/pkg_resources/init.py", line 665, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/pkg_resources/init.py", line 1463, in run_script
exec(code, namespace, namespace)
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO-1.2.1-py3.7.egg-info/scripts/MAESTRO", line 112, in
main()
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO-1.2.1-py3.7.egg-info/scripts/MAESTRO", line 94, in main
scatac_qc(directory = args.directory, outprefix = args.outprefix, fileformat = args.format, peakcount = args.peakcount, feature = args.feature, barcode = args.barcode, single_stat = args.single_stat, peak_cutoff = args.peak_cutoff, count_cutoff = args.count_cutoff, frip_cutoff = args.frip_cutoff, cell_cutoff = args.cell_cutoff, species = args.species)
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO/scATAC_QC.py", line 147, in scatac_qc
Filter(rawmatrix = peakmatrix, feature = features, barcode = barcodes, peak_cutoff = peak_cutoff, cell_cutoff = cell_cutoff, validcell = validcells_list, outprefix = filename, species = species)
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO/scATAC_QC.py", line 84, in Filter
passed_cell_matrix = rawmatrix[np.ix_(passed_peak.tolist()[0], passed_cell.tolist()[0])]
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/scipy/sparse/_index.py", line 75, in getitem
return self._get_arrayXarray(row, col)
File "/root/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 665, in _get_arrayXarray
major.size, major.ravel(), minor.ravel(), val)
ValueError: could not convert integer scalar
[Wed Sep 2 20:52:51 2020]
Error in rule scatac_qcfilter:
jobid: 4
output: Result/QC/SI_25417_filtered_peak_count.h5
shell:
MAESTRO scatac-qc --format h5 --peakcount Result/Analysis/SI_25417_peak_count.h5 --peak-cutoff 100 --cell-cutoff 10 --directory Result/QC --outprefix SI_25417
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Thanks,
Yuping
Hello,
I found two data files were not accessible:
http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz
http://cistrome.org/~chenfei/MAESTRO/giggle.all.tar.gz
The error is 403 Forbidden.
I'm currently hitting this error on the last step of the Dockerfile
recipe:
Step 10/10 : RUN cd /root/Annotation && tar -xzvf rabit.tar.gz && tar -xzvf giggle.tar.gz && rm rabit.tar.gz && rm giggle.tar.gz
---> Running in a935b7a7a681
tar (child): rabit.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
The command '/bin/sh -c cd /root/Annotation && tar -xzvf rabit.tar.gz && tar -xzvf giggle.tar.gz && rm rabit.tar.gz && rm giggle.tar.gz' returned a non-zero code: 2
How would one go about using MAESTRO (starting at Step 4 of the pipeline) with multiple sc-RNA-Seq samples?
Hi,
I ran MAESTRO by the Docker image successfully, then I tried to explore the data using R following the instruction. In the step 4, there is:
p <- VisualizeTFenrichment(TFs = tfs,
cluster.1 = 0,
type = "ATAC",
SeuratObj = pbmc.ATAC.res$ATAC,
GIGGLE.table = "10X_PBMC_10k_giggle.txt",
visual.totalnumber = 100,
name = "10X_PBMC_10k_Monocyte_filtered")
I think the GIGGLE.table should be '10X_PBMC_10k.PredictedTFScore.txt', because in the directory, there is no such file named '10X_PBMC_10k_giggle.txt'.
Thanks!
I followed the instruction https://github.com/liulab-dfci/MAESTRO to install MAESTRO on HPC cluster for our institution research community.
My python version is 3.7.3-anaconda; conda version 4.9.2-py37h89c1867_0
I followed the instruction “ Installing the full solution of MAESTRO workflow through conda “, here is my procedures
[ris_hpc_apps@tdragon4 ~]$ module load python/3.7.3-anaconda
[ris_hpc_apps@tdragon4 ~]$ which conda
/risapps/rhel7/python/3.7.3/bin/conda
[ris_hpc_apps@tdragon4 ~]$ conda config --add channels defaults
Warning: 'defaults' already in 'channels' list, moving to the top
[ris_hpc_apps@tdragon4 ~]$ conda config --add channels bioconda
Warning: 'bioconda' already in 'channels' list, moving to the top
[ris_hpc_apps@tdragon4 ~]$ conda config --add channels conda-forge
Warning: 'conda-forge' already in 'channels' list, moving to the top
-- no need to add those channels; they are already there.
[ris_hpc_apps@tdragon4 ~]$ conda install mamba -c conda-forge
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /risapps/rhel7/python/3.7.3
added / updated specs:
- mamba
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2020.11.8 | ha878542_0 145 KB conda-forge
certifi-2020.11.8 | py37h89c1867_0 150 KB conda-forge
conda-4.9.2 | py37h89c1867_0 3.0 MB conda-forge
libsolv-0.7.16 | h8b12597_0 455 KB conda-forge
------------------------------------------------------------
Total: 3.8 MB
The following NEW packages will be INSTALLED:
libsolv conda-forge/linux-64::libsolv-0.7.16-h8b12597_0
mamba conda-forge/linux-64::mamba-0.1.2-py37h99015e2_0
The following packages will be UPDATED:
ca-certificates 2020.6.20-hecda079_0 --> 2020.11.8-ha878542_0
certifi 2020.6.20-py37he5f6b98_2 --> 2020.11.8-py37h89c1867_0
conda 4.9.0-py37he5f6b98_1 --> 4.9.2-py37h89c1867_0
Proceed ([y]/n)? *y*
# I choose yes to update the conda to the latest version here
Downloading and Extracting Packages
libsolv-0.7.16 | 455 KB | ########################################################### | 100%
certifi-2020.11.8 | 150 KB | ########################################################### | 100%
ca-certificates-2020 | 145 KB | ########################################################### | 100%
conda-4.9.2 | 3.0 MB | ########################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
[ris_hpc_apps@tdragon4 ~]$ mamba create -n MAESTRO-1.2.1 -c liulab-dfci
(since we already have MAESTO created for version 1.0.1, I choose a different conda environment name MAESTRO-1.2.1 )
.......
Getting liulab-dfci linux-64
Getting liulab-dfci noarch
Getting conda-forge linux-64
Getting conda-forge noarch
Getting bioconda linux-64
Getting bioconda noarch
Getting pkgs/main linux-64
Getting pkgs/main noarch
Getting pkgs/r noarch
Getting r linux-64
Getting pkgs/r linux-64
Getting dranew linux-64
Getting r noarch
Getting dranew noarch
Looking for: []
9 packages in https://conda.anaconda.org/liulab-dfci/linux-64
0 packages in https://conda.anaconda.org/liulab-dfci/noarch
139892 packages in https://conda.anaconda.org/conda-forge/linux-64
43184 packages in https://conda.anaconda.org/conda-forge/noarch
33562 packages in https://conda.anaconda.org/bioconda/linux-64
21237 packages in https://conda.anaconda.org/bioconda/noarch
18648 packages in https://repo.anaconda.com/pkgs/main/linux-64
2356 packages in https://repo.anaconda.com/pkgs/main/noarch
6637 packages in https://repo.anaconda.com/pkgs/r/linux-64
5025 packages in https://repo.anaconda.com/pkgs/r/noarch
6620 packages in https://conda.anaconda.org/r/linux-64
5025 packages in https://conda.anaconda.org/r/noarch
170 packages in https://conda.anaconda.org/dranew/linux-64
0 packages in https://conda.anaconda.org/dranew/noarch
## Package Plan ##
environment location: /risapps/rhel7/python/3.7.3/envs/MAESTRO-1.2.1
Proceed ([y]/n)? y
#
# To activate this environment, use
#
# $ conda activate MAESTRO-1.2.1
#
# To deactivate an active environment, use
#
# $ conda deactivate
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
At this point, I check the environment directory /risapps/rhel7/python/3.7.3/envs/MAESTRO-1.2.1;
I only notice one directory was generated, is this normal?
[ris_hpc_apps@tdragon4 MAESTRO]$ ls -l /risapps/rhel7/python/3.7.3/envs/MAESTRO-1.2.1
total 0
drwxr-xr-x 2 ris_hpc_apps rists 29 Nov 20 16:07 conda-meta
Note: I used our software installation account ris_hpc_apps to install this on HPC cluster. I suppose to let user to do "conda activate MAESTRO-1.2.1" ( "conda init bash" is needed before this)
Regards,
Rong
First, I readRDS for my Seurat/harmony and Signac clustered scRNA and scATAC objects, respectively.
Then, I tried to run the Incorporate function:
coembed <- Incorporate(RNA = rna, ATAC = atac, project = "Maestro_integrated", method = "MAESTRO")
But I get the following error:
Error: Cannot find 'ACTIVITY' in this Seurat object
I also tried:
coembed <- Incorporate(RNA = rna$RNA, ATAC = atac$ATAC, project = "Maestro_integrated", method = "MAESTRO")
But,
Error: Cannot find 'ATAC' in this Seurat object
I used the following command to create a Snakefile following the instructions at https://github.com/liulab-dfci/MAESTRO/blob/master/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md.
Unfortunately, the Snakefile does not work since I get the following error:
IndexError in line 34 of /storage/mathelierarea/processed/anthoma/Projects/single_cell/results/20201216_maestro/10X_PBMC_RNA_8k_MAESTRO_V122/Snakefile:
list index out of range
File "/storage/mathelierarea/processed/anthoma/Projects/single_cell/results/20201216_maestro/10X_PBMC_RNA_8k_MAESTRO_V122/Snakefile", line 34, in <module>
File "/lsc/MAESTRO/1.2.2/envs/MAESTRO/lib/python3.8/site-packages/MAESTRO/scRNA_utility.py", line 44, in getfastq_10x
The problem comes from the fact that the scrna-init command did not provide values for the transcript
, barcode
, and decompress
variable; and these variables are used in the Snakefile for the STAR command.
This should not be the case given your documentation that specifies that --fastq-barcode
and --fastq-transcript
should only be provided for the Dropseq format (not 10x-genomics)
You can find the Snakefile and config.xml files in the archive enclosed.
snakefile_and_config.zip
In the current version 1.0.2, if users use the snakemake pipeline, there is no way to set their customized cell signatures, and they are forced to use CIBERSORT immune cell signatures. The pipeline would be more useful if the cell annotation results from snakemake pipeline can make more sense.
https://github.com/liulab-dfci/MAESTRO/tree/master/MAESTRO/Snakemake
In some cases, we do not know the actual cell types, or we do not know the signature genes of possible cell types, so we'd like to skip cell-type annotation step, and plan to identify the driver TFs for each cluster. Can we do that? I am asking this since according to the tutorial, the regulation potentials (RP, or gene scores) are only to be included in the 'cell type annotation' step, but RPs are needed in the TF identification step.
When peaks are from unusual chromosomes such as "chr11_KI270721v1_random", scATAC_genescore.py and ATACCalculateGenescore.R can't correctly split the peak name such as "chr11_KI270721v1_random_12345_67890" to find the chromosome name, start and end positions since it will use "_" as the separator:
MAESTRO/MAESTRO/scATAC_genescore.py
Lines 96 to 98 in 4af5c11
Please use rsplit("_",maxsplit=2)
instead. Here is an example:
>>> a="chr11_KI270721v1_random_12345_67890"
>>> a.split("_") # this will incorrectly split the chromosome name
['chr11', 'KI270721v1', 'random', '12345', '67890']
>>> a.rsplit("_",maxsplit=2) # this will give exactly three values
['chr11_KI270721v1_random', '12345', '67890']
Is it possible to pass in an existing Seurat object (or converted SingleCellExperiment object)? I have a pretty robust pipeline that does QC, clustering and such, would like to have some level of preservation of those clusters and such.
Hello,
I tried to cluster my scRNA and scATAC datasets using a resolution other than 0.6 (this number was used in the MAESTRO vignettes), but I got an error saying that "ATAC_snn_res.0.6" or "RNA_snn_res.0.6" cannot be found in the Seurat object when I used the "Incorporate" function to coembed the two datasets. Do I need to use a default resolution of 0.6 before coembedding? But can I change the resolution after coembedding?
Thank you!
Currently, R 3.5 is specifically required. Any reason that we should keep using the old version? Or is there any plan to upgrade? The reason why I asked is that the latest Bioconductor (3.10) requires R 3.6.
MAESTRO/conda/MAESTRO/meta.yaml
Line 32 in da64dcb
The current implementation has a potential problem. The bedtools intersect
with -f 1.00
will get rid of the region that is not ENTIRELY included in the chr_limit
file. It would be more reasonable to clip instead of discarding the whole region.
MAESTRO/MAESTRO/Snakemake/scATAC/Snakefile
Lines 637 to 651 in ffdfffe
Check my gist: https://gist.github.com/taoliu/2469050
The value range of gene_peak matrix in RP_model should between 0 and 1. Why does "genes_peaks_score_csr" in scATAC_Genescore.py have a value greater than 1, with a maximum of 1600+.
Scripts like scRNA_qc.py use many positional arguments, which is not particularly user friendly if the user needs to run the script outside of the snakemake pipeline. It would be easier for the user if argparse was used so that they can specify command line flags and have a help doc for the script.
The version of MAESTRO: v1.1.0
Command line bugs:
--fastq-dir
ends with /, the snakemake
cmd will give a ‘double /’ warning.--fastq-dir
is a relative path, it won’t work since this relative path will be added to the config file in a new child directory; the same problem can be found for the option for whitelist file.Tutorial errors:
--directory
option but -d
.GRCh38
or GRCm38
are not species
names!Hello,
I'm currently installing the MAESTRO prerequisites and, after reading the paper, I'd like to ask if MAESTRO is compatible with 10X data derived from nuclear RNA, particularly if I'm looking to integrate single-modal snRNA- and snATAC-seq data?
And more specifically, could the use of a pre-mRNA reference and GTF files for alignment, as opposed to standard reference/annotation files, impact a MAESTRO analysis at all?
Until now I have been using Cell Ranger 4 for my analysis which recommends using a pre-mRNA reference and GTF file for nuclear RNA. I had started creating STARsolo compatible versions of these files for my MAESTRO analysis and wondered if this is the best course of action, particularly as 10X have recently released v5 which includes a new function for dealing with intronic reads without the need of a pre-RNA reference, and STARsolo also provides a similar function.
Regardless, it would be useful to hear if you have any recommendations or points of interest that I should consider when running MAESTRO using single-nuclear data.
Many Thanks,
Darren
I have to process a scATAC-seq dataset with over 48,000 cells after merging three conditions and 2 replicates each condition. I kept 200,000 top peaks. The scATAC_cellranger_count.py can finish successfully, however, even our high memory node (260G mem) of our HPC keeps killing scATAC_genescore.py. We need to optimize the memory usage of this script. This issue ticket will track the progress of the optimization.
It would be appreciated if the documentation explicitly mentioned that rownames
of the input matrixes need to be gene symbols and not other annotations. Took me a while to figure out that LISA
requires symbols, and that the mismatch in rownames
of my ATAC and RNA data kept causing Incorporate()
to segfault. However, I also admit I am using a slightly processed counts matrix for my RNA input, which used ensemble ids as the rownames.
Hello,
I am using MAESTRO docker image winterdongqing/maestro, 1.2.1, 57f41ffbe200. When I ran the scRNA-seq module, it reported errors like:
unable to open file: name = '/project/dev/qqin/LISA/lisa_web/cistromedb_data/hg38/cluster_human/DNase_median_for_each_cluster.h5'
This error should be caused by a hardcoded path of LISA. I changed to web LISA, then this module finished successfully.
If users use MAESTRO for the first time or more accurately use "reticulate" package through ATACCalculateGenescore script for the first time, they will be prompted to decide whether "reticulate" should install an independent miniconda environment by itself.
No non-system installation of Python could be found.
Would you like to download and install Miniconda?
Miniconda is an open source environment management system for Python.
See https://docs.conda.io/en/latest/miniconda.html for more details.
Would you like to install Miniconda? [Y/n]: n
Assuming users are utilizing conda/miniconda version of MAESTRO for their analysis, they should stick to the python in that environment, so answer "n". This has to be done just once. We'd better describe it in our tutorial otherwise users may have a lot of trouble later on if they accidentally answer "Y" to the above question. See the discussion regarding this in rstudio/reticulate#607.
The code related to the issue is
MAESTRO/R/ATACCalculateGenescore.R
Lines 29 to 30 in 92b1d3f
The version should be set as 1.0.2
.
Line 50 in 1eba0a2
genes = ATACCalculateGenescore(peaks, organism = "GRCm38")
this just give me a zero matrix. And the column names are human gene names not mouse gene names.
Hi @DongqingSun96 ,
can you please fix this command for multisample scATACseq? https://github.com/liulab-dfci/MAESTRO/blob/feat/multisample/MAESTRO/Snakemake/multisample/scATAC/rules/sc_atac_report.smk#L27
currently, it assumes the files are in Result/Analysis/, but for multisample, the files are in /Result/Analysis/{sample}/
Thanks a lot!
If we look at MAESTRO/R/ATACRunSeurat.R script, it includes many of Seurat functions, such as RunLSI, FindCluster, and RunUMAP. However, the parameters for these functions are mainly fixed with only few tunable parameters for users. How can users tweak the parameter such as n.neighbors or min.dist for UMAP with this script, or the maximum number of components from LSI (n=50 is fixed)? My suggestion is to break this big script into at least three modules -- LSI, UMAP, and DE, and provide more options!
Hi,
Thank you for this great work! I just have one question when I go through the Galleries & Tutorials. For the scRNA-seq example, after the RNARunSeurat() function, you use str() to check those features. How to plot those figures below the line of str(pbmc.RNA.res$RNA)?
Thanks,
Frank
The main executable script MAESTRO will throw exceptions when no argument is set for init
command:
$ MAESTRO init
Traceback (most recent call last):
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/bin/MAESTRO", line 4, in <module>
__import__('pkg_resources').run_script('MAESTRO==1.0.1', 'MAESTRO')
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/pkg_resources/__init__.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1463, in run_script
exec(code, namespace, namespace)
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO-1.0.1-py3.7.egg-info/scripts/MAESTRO", line 79, in <module>
main()
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO-1.0.1-py3.7.egg-info/scripts/MAESTRO", line 72, in main
init_workflow(args.directory, args.module)
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/lib/python3.7/site-packages/MAESTRO-1.0.1-py3.7.egg-info/scripts/MAESTRO", line 21, in init_workflow
os.makedirs(directory)
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/lib/python3.7/os.py", line 206, in makedirs
head, tail = path.split(name)
File "/mnt/lustre/users/tliu/miniconda3/envs/MAESTRO/lib/python3.7/posixpath.py", line 107, in split
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
The solution is either to add required=True
for "-m"
and "-d"
, or to set default=
for these options.
Lines 61 to 62 in 1eba0a2
Is it possible to generate the giggle and lisa tables for drosophila melanogaster? I would like to use these to generate the driver TFs for both scRNA and scATAC-seq samples.
Hi Chenfei,
Does MASTRO allow to process data of other species, like zebrafish? If so, how should I prepare the reference genome files, like giggle annotation files?
Thank you!
Yujie
To make this script more flexible for customized analysis, please add DefaultAssay(ATAC) <- "ACTIVITY"
to the end of this code block.
Lines 47 to 64 in d58fc18
Also, please fix the hard code at:
Line 56 in d58fc18
Line 84 in d58fc18
Line 86 in d58fc18
Another suggestion is to separate the plotting functions from this script since when users get the CombinedObj
, they can plot whatever they want. To combine plotting with computation is unnecessary and will potentially cause some bugs.
Hi, I am running MAESTRO 1.2.2 following the tutorial. The scRNA-seq module complained:
lisa unpack: error: argument tarball: ERROR: /home/qtu/work/maestro/lisa/hg38_2.1.tar.gz is not a valid file.
Actually I could directly execute lisa unpack hg38_2.1.tar.gz
successful, not sure what caused the error.
BTW, does the option --lisadir lisa/hg38_2.1.tar.gz
mean the pipeline will unpack the huge file each time?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.