Giter Site home page Giter Site logo

peka's Issues

Working with cross-link sites identified using iCount with overlapping indicies

I have been running iCount xlsites to identify cross-link sites using read quantification in my eCLIP sample replicates for PEKA. I selected read quantification as the input BAM files have been UMI-pruned, and the bam files PCR-deduplicated. I have noticed several overlapping indices between identified cross-link sites in each replicate. I have considered using other cross-link site detection such as htseq-clip. However, there isn't a column for cDNA numbers. How would you recommend I deal with the overlapping sites?

Puzzle about the Figure 1g about Heatmaps showing relative occurrences (RtXn) and PEKA-scores for top 40 k-mers

@kkuret Hi:
I am trying the peka to performing motif identification from our in-house generated eCLIP-Seq in plants. I found peka is more suitable to peaks identified from CLIP-Seq (unlike those peaks from ChIP-Seq, usually people use MEME-ChIP or HOMER to identify motifs) based on your bioRxiv preprint. Thanks for your great tool!
I have tested peka on my data and generated a series of output. A file with suffix '*5mer_distribution_whole_gene.tsv' seems to contain the information like your Figure 1g. I want convert the table into heatmap to better understand your paper. But I am confused about the kmer seqence in the left part of the heatmap with first column in tsv file. How to convert the V1 column into the left part of heatmap. I also choose the top 20 ranked rows based on peka-score.
image

Another question is how to present the motif enrichment results of CLIP-Seq like those results from ChIP-Seq in a typical experiments centered paper? like A in https://iiif.elifesciences.org/lax/53278%2Felife-53278-fig6-v2.tif/full/1500,/0/default.jpg Do you have any suggestions? Thanks a lot. I am not use what value to show the significance (peka score?).

image

test data works, but own data gives: ValueError: Overlapping IntervalIndex is not accepted.

Hi there

Thanks for the program. I installed it and running the test data works fine. Moving to my own data (the first is what I understood from the documentation but all fail):

peka -i iCount.deDupBCQFdemux_barcode_RT6.peaks.forPeka.bed -x iCount.deDupBCQFdemux_barcode_RT6.cDNA_unique.forPeka.bed -g refGenome.fasta -gi refGenome.fasta.fai -r refGenome.segs.forPeka.gtf
peka -i iCount.deDupBCQFdemux_barcode_RT6.clusters.forPeka.bed -x iCount.deDupBCQFdemux_barcode_RT6.cDNA_unique.forPeka.bed -g refGenome.fasta -gi refGenome.fasta.fai -r refGenome.segs.forPeka.gtf
peka -i iCount.deDupBCQFdemux_barcode_RT6.clusters.forPeka.bed -x iCount.deDupBCQFdemux_barcode_RT6.peaks.forPeka.bed -g refGenome.fasta -gi refGenome.fasta.fai -r refGenome.segs.forPeka.gtf

gives me an error:

Getting thresholded crosslinks
Thresholding intron
lenght of df_reg for intron is: 1038414
Traceback (most recent call last):
File "/home/name/miniconda3/envs/peka/bin/peka", line 8, in
sys.exit(main())
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 1462, in main
set_seeds
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 1051, in run
df_txn = get_threshold_sites(sites_file, percentile=percentile)
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 497, in get_threshold_sites
df_cut = cut_sites_with_region(df_reg, df_region)
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 422, in cut_sites_with_region
df_temp = cut_per_chrom(chrom, df_p, df_m, df_region_p, df_region_m)
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 409, in cut_per_chrom
df_xl_p["cut"] = pd.cut(df_xl_p["start"], interval_index_p)
File "/home/name/miniconda3/envs/peka/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 226, in cut
raise ValueError('Overlapping IntervalIndex is not accepted.')
ValueError: Overlapping IntervalIndex is not accepted.

Chromosome names and sorting seem to match and the files are iCount outputs (I removed some chromosomes and resorted while testing, but the original files also did not work). I uploaded my files:

https://e.pcloud.link/publink/show?code=XZCJGeZ4hXmsgv6hmLEf71rt8yzRhSgzqWk

What's wrong? :)

PEKA on intronless genomes

When I try to run PEKA I get this message:

"Getting thresholded crosslinks
Thresholding intron
Not able to find any thresholded sites in your sample (NoneType). Exiting."

I have tried this with iCount xlsites and both iCount peak and Clippy peaks as inputs. Is this because my GTF has no annotated introns? I have a custom GTF file with only gene on 3rd collumn.

ValueError: Overlapping IntervalIndex is not accepted.

I downlowd xl and peak file from https://imaps.goodwright.com/collections/868/ and run: peka -i tardbp-egfp-hd-hek293-1-20201021-ju_mapped_to_genome_single.bed1 -x tardbp-egfp-hd-hek293-1-20201021-ju_mapped_to_genome_single_peaks.bed1 -g $genome -gi $gfai -r $segment -k 6
But got error:
Namespace(alloutputs=False, clusters=5, distalwindow=150, genomefasta='/mnt/1/genome/hg38/hg38.fa', genomeindex='/mnt/1/genome/hg38/hg38.fa.fai', inputpeaks='tardbp-egfp-hd-hek293-1-20201021-ju_mapped_to_genome_single.bed1', inputxlsites='tardbp-egfp-hd-hek293-1-20201021-ju_mapped_to_genome_single_peaks.bed1', kmerlength=6, outputpath='/mnt/9/yuan_jianwen/scTRIBE_project/09.fastaMotif/00.rmPseudo/test', percentile=0.7, regions='/mnt/12/yuan_jianwen/hg38/Homo_sapiens.GRCh38.103.segment.gtf.gz', repeats='unmasked', smoothing=6, specificregion=None, subsample=True, topn=20, window=25)
Getting thresholded crosslinks
Thresholding intron
lenght of df_reg for intron is: 70328
Traceback (most recent call last):
File "/home/yuan_jianwen/anaconda3/envs/peka/bin/peka", line 8, in
sys.exit(cli())
File "/home/yuan_jianwen/anaconda3/envs/peka/bin/peka.py", line 1317, in cli
subsample,
File "/home/yuan_jianwen/anaconda3/envs/peka/bin/peka.py", line 1002, in run
df_txn = get_threshold_sites(sites_file, percentile=percentile)
File "/home/yuan_jianwen/anaconda3/envs/peka/bin/peka.py", line 445, in get_threshold_sites
df_cut = cut_sites_with_region(df_reg, df_region)
File "/home/yuan_jianwen/anaconda3/envs/peka/bin/peka.py", line 376, in cut_sites_with_region
df_temp = cut_per_chrom(chrom, df_p, df_m, df_region_p, df_region_m)
File "/home/yuan_jianwen/anaconda3/envs/peka/bin/peka.py", line 363, in cut_per_chrom
df_xl_p["cut"] = pd.cut(df_xl_p["start"], interval_index_p)
File "/home/yuan_jianwen/anaconda3/envs/peka/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 226, in cut
raise ValueError('Overlapping IntervalIndex is not accepted.')
ValueError: Overlapping IntervalIndex is not accepted.

Example input files?

Great work!

Trying to run PEKA on a dataset we have here, would it be possible to get an example dataset complete with small example files? (see below for requirements). Would greatly help to see exactly what kind of input (format) is accepted / required, thanks!, Gregor

required arguments:
  -i INPUTPEAKS, --inputpeaks INPUTPEAKS
                        CLIP peaks (intervals of crosslinks) in BED file
                        format
  -x INPUTXLSITES, --inputxlsites INPUTXLSITES
                        CLIP crosslinks in BED file format
  -g GENOMEFASTA, --genomefasta GENOMEFASTA
                        genome fasta file, ideally the same as was used for
                        read alignment
  -gi GENOMEINDEX, --genomeindex GENOMEINDEX
                        genome fasta index file (.fai)
  -r REGIONS, --regions REGIONS
                        genome segmentation file produced as output of "iCount
                        segment" function

Question on how to make use of CLIP-Seq biological replicates?

Hi @kkuret
Thank for your great work and related bioRxiv paper. I am learning to analysis our peaks and XLS by your peka software. I have used iCounts to perform the eCLIP-Seq analysis. But I am wondering how to make use of the replicates. We have designed three replicates. Should I use peka to identify from rep1 associated cluster and peaks to identify motifs. Then intersect motifs from Rep1, Rep2, and Rep3. Or should I merged the peaks and sum or mean the raw XLS bed files to perform a single motif enrichment process. Thank you so much.
Linhua

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.