Giter Site home page Giter Site logo

kalibera / vcfppr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zilong-li/vcfppr

0.0 1.0 0.0 9.07 MB

Rapid processing of VCF/BCF files in R!

Home Page: https://doi.org/10.1093/bioinformatics/btae049

License: Other

Shell 0.81% C++ 2.38% Perl 1.97% C 90.04% R 0.81% Scilab 0.47% Makefile 1.53% M4 1.12% Roff 0.87%

vcfppr's Introduction

vcfppR: rapid manipulation of the VCF/BCF file

R-CMD-check CRAN status https://github.com/Zilong-Li/vcfppR/releases/latest codecov

The vcfppR package implements various useful functions for manipulating the VCF/BCF file in R using the C++ API of vcfpp.h.

Installation

You can install the development version of vcfppR with the system dependencies of libcurl.

## install.package("vcfppR") ## from CRAN
devtools::install_github("Zilong-Li/vcfppR")

API

There are two classes i.e. vcfreader and vcfwriter offering the full R-bindings of vcfpp.h. Check out the examples in the tests folder or refer to the manual.

?vcfppR::vcfreader

In addtion to two classes for R-bindings of vcfpp.h, there are some useful functions in the package, e.g. vcftable, vcfsummary and vcfpopgen. In the following examples, we use the URL link as filename, which can be directly fed to vcfppR, and the performance will depend on your connection to the servers.

Read VCF as tabular data in R

This example shows how to read only SNP variants with genotypes in the VCF/BCF into R list:

library(vcfppR)
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps")
str(res)

This example shows how to read only SNP variants with PL format into R list and drop the INFO column in the VCF/BCF:

vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps", format = "PL", info = FALSE)
str(res)

This example shows how to read only indels variants with DP format in the VCF/BCF into R list:

vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "indels", format = "DP")
str(res)

Variants summarization

This example shows how to summarize small variants discovered by GATK. The data is from the 1000 Genome Project.

vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
region <- "chr21:10000000-10010000"
res <- vcfsummary(vcffile, region)
str(res)
# get labels and do plottiing
ped <- read.table("https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/20130606_g1k_3202_samples_ped_population.txt", h=T)
ped <- ped[order(ped$Superpopulation),]
out <- sapply(unique(ped$Superpopulation), function(pop) {
  id <- subset(ped, Superpopulation == pop)[,"SampleID"]
  ord <- match(id, res$samples)
  res$SNP[ord] + res$INDEL[ord]
})

boxplot(out, main = paste0("Average number of SNP & INDEL variants\nin region ", region))

This example shows how to summarize complex structure variants discovered by GATK-SV.

svfile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20210124.SV_Illumina_Integration/1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz"
sv <- vcfsummary(svfile, svtype = TRUE, region = "chr20")
str(sv)
allsvs <- sv$summary[-1]
bar <- barplot(allsvs, ylim = c(0, 1.1*max(allsvs)),
               main = "Variant Counts on chr20 (all SVs)")

vcfppr's People

Contributors

zilong-li avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.