Giter Site home page Giter Site logo

tfbs_visualization's Introduction

######################################
# Rebecca Worsley Hunt -- 2014.05.05 #
# Wasserman Lab                      #
# University of British Columbia     #
######################################

README

===========================================================================
R code to generate visualization plots

Visualization_methods_Rcode.R
===========================================================================

If R is not yet installed, visit http://cran.r-project.org

To load the functions within the file: Visualization_methods_Rcode.R, use the command source() e.g. source("/directory1/directory2/Visualization_methods_Rcode.R"), where /directory1/directory2/ is the location (absolute path) of the R Visualization code.

To view the usage of the four plotting functions, with details on function arguments, use the command view.help() (this is the last function provided in the R file). Below is a brief synopsis.

If data files are in one directory, and/or you intend to save output files to the same directory, you can use the setwd() command (set working directory) to avoid typing the absolute path to your files every time you call a Visualization function.
e.g. all your data files are in /Users/yourName/dir1/dir2/. Thus, use: setwd("/Users/yourName/dir1/dir2/") to work within this directory. You can use getwd() command to check that the working directory is set as you expected. R will now automatically use the path specified for retrieving and saving files until you change the path with setwd() or by providing an absolute path to the function arguments. Now, to specify a data file (or to save a plot), you only need to use the name of the file instead of the full path e.g. data.file="testdata_CBplot_Hnf4a_Hepg_200bp.txt" or save.plot.to="CBplot_Hnf4a"

Example:
The Visualization code and examples are in: /directory1/directory2/

   setwd("/directory1/directory2/")
   getwd()
   source("Visualization_methods_Rcode.R")
   view.help()

   landscape.view( data.file="testdata_landscape_or_bimotif_Nfyb_K562_1000bp.txt", col.X=4, col.Y=5)


------------------------------
Composition-bias plot function
------------------------------

function call:  CB.plot( data.file=NA, col.X=0, col.Y=0, col.Y.pval=F, yaxis.lim=F, col.TFnames=0, headerline=T, title.plot="", save.plot.to=F )

Generates a plot with PFMs GC nucleotide composition on the x-axis and motif enrichment scores on the y-axis. The PFM GC nucleotide composition must be provided in the submitted file. We are aware of one motif over-representation software that provides PFM GC composition: oPOSSUM 3.0; however this value can be calculated simply from PFM binding profiles.

Example file provided:

filename = "testdata_CBplot_Hnf4a_Hepg_200bp.txt"

Example function calls:
#provide column numbers
   CB.plot( data.file= filename, col.X=3, col.Y=4  )

#or provide column names; provide TF names
   CB.plot( data.file= filename, col.X="GC.Content", col.Y="Zscore", col.TFnames=1 )


----------------------------
TFBS-landscape view function
----------------------------

function call:  landscape.view( data.file=NA,  col.X=0, col.Y=0, score.type="relative", yaxis.lim1=F, yaxis.lim2=F, headerline=T, title.plot="", save.plot.to=F, threshold=F, heuristic.plot=F , heur.resid.fold=2, heur.score.diff=0.20)

X-axis requires TF motif distances to a point in a sequence, such as a ChIP-Seq peak local maximum (or peak centre), for a set of sequences. The y-axis requires the motif scores for those motifs. Distance measurement (x-axis) is best done from motif centre to peakMax. PWM motif scores can be either the raw score, or the relative score, but the code will not handle p-values. If PWM raw scores are used, they may produce different results than the relative score. Relative scores permit single value hard-coded parameters to be used in the code (such as cutoff values) that are comparable across all PWMs; the hard-coded parameters are not exactly converted to raw score equivalents because the R code could not assume access to the full range of each PWM's raw scores which is necessary to convert the hard-coded parameters. The hard-coded parameters were approximated but will not be a perfect match for each PWM. Use the optional plots (i.e. set 'heuristic.plot=T') which show the heuristic boundaries for motif density (distance from peak local maximum, and motif score threshold) to confirm results.

When the optional arguments 'threshold' or 'heuristic.plot' are set to T (true), the motif enrichment zone boundary and the motif score threshold  values are returned as a vector. They can be stored in a variable, by setting a variable equal to the function call e.g. myresult = landscape.view( data.file, col.X=2, col.Y=5, score.type="relative", headerline=T, threshold=T)
To view the results type the variable name (e.g. myresults) and hit enter/return. If a variable is not set equal to the function, the results will be printed to the screen.


Example file provided:

filename = "testdata_landscape_or_bimotif_Nfyb_K562_1000bp.txt"

Example function calls:
#print results to screen; provide only column numbers
  landscape.view( data.file=filename, col.X=4, col.Y=5 )

#print results to screen; provide column names; set limits to the y-axis; provide a title for the plot
  landscape.view( data.file=alias, col.X="motif1.distancePeakMax", col.Y="motif1.score", yaxis.lim1=0.02, title.plot="Nfyb", threshold=T )

#save results to the variable: myresults
  myresults = landscape.view( data.file=alias, col.X="motif1.distancePeakMax", col.Y="motif1.score" )


---------------------------
TFBS-bi-motif view function
---------------------------

function call: bimotif.view( data.file=NA,  col.X=0, col.Y=0, resolution=5, yaxis.lim=F, headerline=T, title.plot="", save.plot.to=F )

While col.X and col.Y provide the column number for the motif_1 distance to peakMax and motif_2 distance to peakMax respectively within the data.file, the R code will use those distances to calculate and plot the spacing between motif_2 and motif_1 on the x-axis. The distance from motif_1 to the peakMax will be retained for the y-axis. For the user, the initial distance from motif to peakMax is best calculated from the centre of the motif to the peakMax.


Example file provided:

filename = "testdata_landscape_or_bimotif_Nfyb_K562_1000bp.txt"

Example function calls:
#provide column numbers
  bimotif.view( data.file=filename,  col.X=4, col.Y=6 )  

#or provide column names; set resolution; set a plot title
  bimotif.view( filename, col.X="motif1.distancePeakMax", col.Y="motif2.distancePeakMax", resolution=10, title.plot="Nfyb")


--------------------------------------
Dinucleotide-environment view function
--------------------------------------

To obtain the alignment files that are submitted to the Dinucleotide-environment R code, see below: "preVisualization_dinucleotide_alignment_perlCode.pl"

function call: dinucleotide.view( data.file=NA, x1.adj=100, x2.adj=100, yaxis.lim=F, title.plot="", save.plot.to=F )

Plotting this view first requires the use of the provided perl code: preVisualization_dinucleotide_alignment_perlCode.pl
The perl code generates an output file with the extension *.dinuc.align which is submitted to the R function: dinucleotide.view(). If the plot provides a motif alignment that is confusing, check the input into the perl code.

Example file provided:

filename = "testdata_dinucleotideView_Stat1_helas_1000bp.dinuc.align"

Example function call:
  dinucleotide.view( data.file=filename, x1.adj=100, x2.adj=100 )


===========================================================================
perl code to generate dinucleotide alignment files

preVisualization_dinucleotide_alignment_perlCode.pl
===========================================================================

The dinucleotide alignment code, requires that the user have a fasta file for the sequences of interest, and know the location of the feature of interest (e.g. TFBS motif) relative to the start of each sequence. The offset of the feature must be from sequence start (left edge) to feature's left edge (do not use the centre of the feature). If no strand is provided, all features are assumed to be in a 5'-3' orientation (also indicated with +ve sign). If strand is provided, the assumption is that a -ve sign indicates the feature is in a 3'-5' orientation in the sequence and thus the code will reverse compliment the sequence to orient the feature in a 5'-3' direction to align with other 5'-3' features.  

The perl code requires four arguments and has two optional arguments. Required is a fasta file of sequences, a seperate meta file with at least one column providing the offset of the feature start relative to the sequence start (start = 1). There is no expected format for the meta file, therefore the column containing the offset information must be specified (argument 3). The last required argument is an output name for the results file, to which an extension of *.dinuc.align will be appended.  The two optional arguments are necessary when there is an expected orientation to the feature: a column in the metafile for the strand (strand is - or +), and the width of the feature. The width is used to calculate strand-based alignment.

If there are header lines in either of the two files, the lines must start with a number sign (#) to show they are comments and not data.

To view the options, enter on the command line:  

  perl preVisualization_dinucleotide_alignment_perlCode.pl

The following options will be shown:
-f  = fasta seq file
-m  = meta file: motif offset RELATIVE TO START OF SEQ and strand if using it
-p  = column number containing motif offset (a column in -m metafile). First column is 1. 
-o  = output name  (output extension will be: *.dinuc.align)
-s  = (optional) column number containing strand +/- (a column in -m meta file). First columnn is 1.
-w  = (optional) motif 'width' i.e. length of PFM. Necessary for alignment adjustment between +/- strands.


Example files provided:

testdata_dinucleotideView_Stat1_helas_1000bp.fa      # fasta sequence file
testdata_dinucleotideView_Stat1_helas_1000bp.metafile    # meta file with offset and strand

Example command line:

 perl preVisualization_dinucleotide_alignment_perlCode.pl  -f testdata_dinucleotideView_Stat1_helas_1000bp.fa  -m testdata_dinucleotideView_Stat1_helas_1000bp.metafile  -o testdata_Stat1_helas_1000bp  -p 5  -s 6  -w 15 

tfbs_visualization's People

Contributors

rebeccawh avatar amathelier avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.