Giter Site home page Giter Site logo

norad's Introduction

NORAD manuscript

MYCN knock-down and overlapping gene signature with NORAD knock-down

Raw sequencing data from MYCN knock-down in SK-N-BE(2) (ArrayExpress accession [E-GEOD-84389](https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-84389/, SRA GSE84389) was mapped to GRCh38 coordinates:

#!/bin/bash

#SBATCH -N1 -n16 --mem-per-cpu=2000 -t4:00:00
#SBATCH --array=1-9
#SBATCH -e hisat-%A_%a.err
#SBATCH -o hisat-%A_%a.out

i=$SLURM_ARRAY_TASK_ID
index=HISAT2_INDEX/grch38_snp_tran/genome_snp_tran
sample=$(eval 'sed "${i}q;d" SRA.txt') # SRA.txt contains list of SRA accessions (one per line)
sam_outdir=PRJNA329050_GSE84389/sam/
mkdir -p $sam_outdir
sam=$sam_outdir$sample\.sam
hisat2 -p $SLURM_TASKS_PER_NODE -x $index --sra-acc $sample -S $sam

Counts were obtained with featureCounts:

#!/bin/bash

#SBATCH -d PRJNA329050_GSE84389/sam/
#SBATCH -N1 -n4 --mem-per-cpu=8000 -t4:00:00
#SBATCH -e featureCounts.err
#SBATCH -o featureCounts.out

gtffile=HISAT2_INDEX/grch38_snp_tran/Homo_sapiens.GRCh38.94.gtf
results=all.gene.counts.txt

featureCounts -T 4 -g gene_id  -a $gtffile -o $results SRR3922065.sam SRR3922066.sam SRR3922067.sam SRR3922068.sam SRR3922069.sam SRR3922070.sam SRR3922071.sam SRR3922072.sam SRR3922073.sam

Code describing gene set enrichment, differential expression and overlapping gene signature is available in the script MYCN.NORAD.knockdown.BE2.R. Differential expression from NORAD knock-down was based on data in NORAD.knockdown.CPM.BE2.txt from the neuroblastoma cell line SK-N-BE(2)c. Differential expression results was later compared to hg19 differential expression results obtained from the excel sheet "TEAD4-MYCN KD DGE signature" within Table S7 RNA-seq derived signatures and pathway analysis from shTEAD4, shMYCN ans shWWTR1 from the article Cross-Cohort Analysis Identifies a TEAD4–MYCN Positive Feedback Loop as the Core Regulatory Element of High-Risk Neuroblastoma. Pearson's product-moment correlation comparing hg19 adjusted p-values to hg38 adjusted p-values resulted in r=0.96 (p<2.2e-16, 95% CI 0.959, 0.962).

ENCODE project and identication of proteins, transcription factors and histones binding to NORAD

To identify proteins binding to the NORAD lncRNA, human RNA-binding data was downloaded as bed files from [ENCODE].

library(doParallel)
library(foreach)
library(data.table)

metadata <- fread("https://www.encodeproject.org/metadata/type=Experiment&status=released&assay_slims=RNA+binding&assay_title=eCLIP/metadata.tsv")
metadata <- metadata[metadata$`File format` == "bed narrowPeak", ]

   
    
cores=detectCores(); cl <- makeCluster(cores); registerDoParallel(cl)

temp <- foreach(i = 1:nrow(metadata), .export = "fread") %dopar% {
        t <- fread(paste("wget -nc -O - ", metadata$`File download URL`[i], "| gzip -d | cat"))
        
if(ncol(t) == 10 & nrow(t) > 0) {
            colnames(t)[1:10] <- c("chrom", "chromStart", "chromEnd", "name", "score", "strand", "signalValue", "pValue", "qValue", "peak")
            if(metadata$Assembly[i] == "hg19") t <- t[chromStart >= NORAD.coord$hg19.start & chromEnd <= NORAD.coord$hg19.end]
            if(metadata$Assembly[i] == "GRCh38") t <- t[chromStart >= NORAD.coord$hg38.start & chromEnd <= NORAD.coord$hg38.end]
            
                if(nrow(t) > 0) {
                    t$Assembly <- metadata$Assembly[i]
                    t$CellLine <- metadata$`Biosample term name`[i]
                    t$metadata.index <- i
                }
        }
        return(t)
    }
    stopImplicitCluster(); stopCluster(cl)
    NORAD.ENCODE <- rbindlist(temp, fill = T)

The ENCODE narrowPeak (Narrow or Point-Source Peaks) format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format with the following columns:

   1. chrom - Name of the chromosome (or contig, scaffold, etc.).
   2. chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
   3. chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined aschromStart=0, chromEnd=100, and span the bases numbered 0-99.
   4. name - Name given to a region (preferably unique). Use '.' if no name is assigned.
   5. score - Indicates how dark the peak will be displayed in the browser (0-1000). If all scores were '0' when the data were submitted to the DCC, the DCC assigned scores 1-1000 based on signal value. Ideally the average signalValue per base spread is between 100-1000.
   6. strand - +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned.
   7. signalValue - Measurement of overall (usually, average) enrichment for the region.
   8. pValue - Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.
   9. qValue - Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
   10. peak - Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.

For visual inspection, all accessions within the NORAD gene was inspected using bigWigToBedGraph -chrom=chr20 -start=36045622 -end=36050960. The resulting bedGraph files was then manually inspected with the plotBedgraph function from the R package Sushi. The same pipeline was followed for NORAD hg19 coordinates (chrom=chr20, -start=34633544 -end=34638882).

For ChIP-seq data, the same pipeline was also followed. However, a 500 bp upstream of NORAD transcriptional start site (TSS) was included as well as 2000 bp downstream of NORAD TSS.

Profiling 3' UTR in SK-N-BE(2)c and HCT116 after NORAD knock-down

Profiling of effective length within 3' UTRs was done using Salmon. Effective Length is the computed effective length of the target transcript. It takes into account all factors being modeled that will effect the probability of sampling fragments from this transcript, including the fragment length distribution and sequence-specific and gc-fragment bias. Using salmon, an index was created using sequences of all transcripts as well as 5' and 3' UTRs from the fasta file generate from getUTR.sequence.R. Raw fastq files from both SK-N-BE(2)c and HCT116 after NORAD knock-down was thereafter run using the script salmon.sh. The effective length change was thereafter computed as the effective length in NORAD minus the effective length in control treated cell.

Artwork presentation

Inkscape was used in the development of the artwork associated with the manuscript.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.