broadinstitute / depmap_omics Goto Github PK
View Code? Open in Web Editor NEWWhat you need to process the Quarterly DepMap-Omics releases from Terra
Home Page: https://depmap.org/portal/
What you need to process the Quarterly DepMap-Omics releases from Terra
Home Page: https://depmap.org/portal/
HI. I have downloaded the segment file from depmap portal and the Segment_Mean column (relative copy ratio for that segment) seems somewhat different from the Seg_Mean column in the standard segment file. So I dont know if this segment file can be used to run ABSOLUTE and GISTIC2 directly.
The following is a placeholder for all the issues/PRs that need to be opened for productionalization:
We may think about using sphinx-docs to build a document like this.
Thank you for giving such a good way to download depmap database for those who have difficulty installing depmap from Bioconductor (like me). However, I still have some issues in installing from github...
It shows that:
install_github("depmap_omics")
Error in parse_repo_spec(repo) :
Invalid git repo specification: 'depmap_omics'
May I ask is there any solutions to this issue? Thank you for any response in advance.
yuwei
Hi,
I am interested in having the reference and baits files used for exome data of PureCN pipeline here
How can I get access to this data? Or is this publicly available somewhere?
Thank you
Hello,
I'm trying to process WGS data from a cell line similar to how your pipeline at DepMap. I noticed in your workflow doc there was a note under the mutations slide that "this pipeline requires a matched normal, so we use a pseudo normal for all cell lines samples". Could you explain what this pseudo normal is and would it be possible for you to share this data with me?
Thank you
Hello!
By any chance, is there a way to release the mutational profiling of CCLE as a .vcf file instead of .maf? Also, related with the same data, are you planning to align agains GRch38 instead of 37, like you did for example with transcriptome?
Thank you!
Pedro
Hi DepMap team,
I'm trying to load transcript-level counts from the RNA-seq expression data from the OmicsExpressionTranscriptsTPMLogp1Profile.csv
file. I'm having trouble uniquely resolving the profile identifiers (e.g. "PR-lqUArB"
, "PR-pOBrMJ"
) to the model identifiers ("ACH-000029"
) using the "OmicsProfiles.csv"
file. I'm seeing 16 cell lines that currently have this issue with the 23q2 release.
Is there a rational way to pick between the duplicate profiles for the 16 cell lines?
Here's a working example (in R) with more details:
library(dplyr)
library(pipette)
df <-
import(
con = "https://figshare.com/ndownloader/files/40449635",
format = "csv"
) |>
filter(Datatype == "rna") |>
arrange(ModelID)
dupes <- sort(df[["ModelID"]][duplicated(df[["ModelID"]])])
print(dupes)
## [1] "ACH-000029" "ACH-000095" "ACH-000143" "ACH-000206" "ACH-000328"
## [6] "ACH-000337" "ACH-000455" "ACH-000468" "ACH-000517" "ACH-000532"
## [11] "ACH-000556" "ACH-000597" "ACH-000700" "ACH-000931" "ACH-000975"
## [16] "ACH-001192"
df <- df[df[["ModelID"]] %in% dupes, ]
print(df)
## ProfileID ModelConditionID ModelID Datatype WESKit
## 28 PR-lqUArB MC-000029-BMZc ACH-000029 rna <NA>
## 29 PR-pOBrMJ MC-000029-BMZc ACH-000029 rna <NA>
## 94 PR-6E5fvI MC-000095-UcYl ACH-000095 rna <NA>
## 95 PR-9bHyjI MC-000095-UcYl ACH-000095 rna <NA>
## 143 PR-dlwhbG MC-000143-xMKb ACH-000143 rna <NA>
## 144 PR-eLOZCF MC-000143-xMKb ACH-000143 rna <NA>
## 207 PR-by8s63 MC-000206-Jmpg ACH-000206 rna <NA>
## 208 PR-xissjH MC-000206-Jmpg ACH-000206 rna <NA>
## 328 PR-DjTYZp MC-000328-gA4f ACH-000328 rna <NA>
## 329 PR-S409MD MC-000328-gA4f ACH-000328 rna <NA>
## 338 PR-ZJC2Tm MC-000337-VmHG ACH-000337 rna <NA>
## 339 PR-zvd6KC MC-000337-VmHG ACH-000337 rna <NA>
## 456 PR-HCodtv MC-000455-QvVM ACH-000455 rna <NA>
## 457 PR-JWn3XA MC-000455-QvVM ACH-000455 rna <NA>
## 470 PR-8iDtve MC-000468-c6hY ACH-000468 rna <NA>
## 471 PR-qf7nCW MC-000468-c6hY ACH-000468 rna <NA>
## 518 PR-aOml9R MC-000517-kcbL ACH-000517 rna <NA>
## 519 PR-i9CVhO MC-000517-kcbL ACH-000517 rna <NA>
## 534 PR-Q8g8M0 MC-000532-NN9r ACH-000532 rna <NA>
## 535 PR-t6ctGM MC-000532-NN9r ACH-000532 rna <NA>
## 559 PR-1hnFd4 MC-000556-YK2Z ACH-000556 rna <NA>
## 560 PR-Iug0GM MC-000556-YK2Z ACH-000556 rna <NA>
## 601 PR-3ATZmJ MC-000597-RDyO ACH-000597 rna <NA>
## 602 PR-w00MaJ MC-000597-RDyO ACH-000597 rna <NA>
## 704 PR-hs3wNI MC-000700-mndS ACH-000700 rna <NA>
## 705 PR-uQ6qid MC-000700-mndS ACH-000700 rna <NA>
## 934 PR-CHQ9Av MC-000931-3a7D ACH-000931 rna <NA>
## 935 PR-k0J8JP MC-000931-3a7D ACH-000931 rna <NA>
## 979 PR-eCRyEu MC-000975-PUD5 ACH-000975 rna <NA>
## 980 PR-s28NRl MC-000975-PUD5 ACH-000975 rna <NA>
## 1056 PR-RJkk8B MC-001192-OhhV ACH-001192 rna <NA>
## 1057 PR-V4rEyG MC-001192-OhhV ACH-001192 rna <NA>
Best,
Mike
Currently some boolean columns are populated with either "Y" or NaNs. Need to convert them to True/False
Hello,
Thank you so much for this repo!
Just wanted to know - for the most recent DepMap data release, I was able to find the relative CN and CN ratios per gene, but couldn't find the absolute copy numbers.
I saw from a discussion on the DepMap forum that ABSOLUTE calls are available up to CCLE 2019, but that PureCN is being optimized for future releases.
(https://forum.depmap.org/t/classifying-copy-number-alterations/2287)
Just wanted to check in to see if this data is available for the latest release, and if so, where I could find it.
Thank you so much!
Why are the cell-lines in https://github.com/broadinstitute/depmap_omics/blob/master/data/blacklist.txt, blacklisted? Just curious, thank you!
In 2018, CCLE provided RNAseq data for 1019 cancer cell lines (https://sites.broadinstitute.org/ccle/), which also were downloaded in DepMap. However, the recent DepMap release contains the gene expression profile of 1408 cancer cell line. We observed that the expression values of the same gene in the same cell lines of depmap and CCLE are different. Thus, there are some questions that need to be consulted: Has DepMap used CCLE's RNAseq raw data (overlapped cancer cell lines between CCLE in 2018 and DepMap), just use different analysis processes ? Whether are only specific cancer cell lines in DepMap sequenced by DepMap?
Dear Team,
Are we able to download latest data showed in website from this site? Since the website data is required to download by button and click operation.
Looking forward to having some command line based approach to download the data.
Thanks.
Shicheng
We will extract all information from RNASeqQC.
Hi,
First, I am attempting to run copynumbers.py. However, the error message "No module named 'gumbo_client'" appeared. So I did 'pip install gumbo_client', but I got an error message saying "No matching distribution found for gumbo_client", so I can't find a way. How can I solve it?
Second, using this pipeline, I want to obtain copy number alteration from my cell-line WES data. How can I do it?
Thank you.
While examining the somatic mutation pipeline I noticed recent "debug" commits making corrections for Mutect2 clustered_events: d901564, 559daef. These seem to change clustered_events to PASS when there are <=2 non-germline mutations in a +/- 50 bp region around the site.
It happens that I have also been looking for a reasonable solution aiming to prevent some clustered_events from being filtered out, as it has been known that this filter sometimes removes potentially "real" somatic mutations (e.g. this post). Therefore I am very curious what you find after applying the correction above -- does it help to improve performance (e.g. reducing false negatives w/o greatly increasing false positives)? Is this approach official yet -- will the next DepMap release adopt this correction for clustered_events?
This might be a long term plan..
Hello!
I tried to find information about the chromosomal reference you used for CCLE_segment_cn.csv, but I cannot find it. I guess hg38 was used?
With conda python 3.9 version, the poetry add git+https://github.com/broadinstitute/gumbo_client.git
raise the error:
Note: This error originates from the build backend, and is likely not a problem with poetry but with psycopg2 (2.9.5) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "psycopg2 (==2.9.5) ; python_version >= "3.6""'.
A simple workaround for now is to install gumbo_client manually pip install git+https://github.com/broadinstitute/gumbo_client.git
.
I want to firstly thank you all for the great work on the CCLE/DepMap database!
And recently I am processing whole-exon-sequencing samples from the CCLE_PRJNA523380 project. I am looking for the capture kit and PoN that used for each of the WES samples:
• I found the sample_for_pon.tsv file containing 1668 lanes of records (says from GTEx on the GitHub page), can you let me know if these samples are also used for the CCLE WES CNV process? And can you let me know where I can directly download the PoN file that corresponds to CCLE WES data?
• On the GitHub page, you said you are using Illumina ICE intervals and Agilent intervals for WES samples, I can only find a wes_agilent_hg19_baits.interval_list, which is very different from any Agilent released capture kit bed file. Can you let me know what is the exact capture kit that has been used for exon capture for all the CCLE WES data?
both in CN gene-level matrix and variant-level in the maf
Problem:
Current installation on Mac (not sure for linux) still need troubleshooting.
Idea:
Submit the package to conda or pypi.
Hi DepMap team,
I noticed that there are 5 new cell lines in the DepMap 23q2 release that are not annotated at Cellosaurus currently. Can we work on adding them? Happy to help if possible.
DepMap ID Name Source
ACH-001134 MYLA Academic lab
ACH-001172 U251MGDM HSRRB via Bandhyopadhyay (Broad)
ACH-002002 A375SKINCJ2 Cory Johannessen (Broad)
ACH-002471 PSS008 Alejandro Sweet-Cordero (UCSF)
ACH-002834 PSS131R Alejandro Sweet-Cordero (UCSF)
See related issue filed with the Cellosaurus team calipho-sib/cellosaurus#8
Best,
Mike
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.