Giter Site home page Giter Site logo

bigbio / py-pgatk Goto Github PK

View Code? Open in Web Editor NEW
10.0 6.0 11.0 128.35 MB

Python tools for proteogenomics analysis toolkit

License: Apache License 2.0

Python 99.73% Dockerfile 0.27%
python proteogenomics ensembl proteomics mass-spectrometry vcf proteogenomics-analysis-toolkit

py-pgatk's Introduction

ProteoGenomics Analysis Toolkit

Python application install with bioconda Codacy Badge PyPI version PyPI - Downloads

pypgatk is a Python library - part of the ProteoGenomics Analysis Toolkit. It provides different bioinformatics tools for proteogenomics data analysis.

Requirements:

The package requirements vary depending on the way that you want to install it (you need one of the following):

  • pip: if installation goes through pip, you will require Python3 and pip3 installed.
  • Bioconda: if installation goes through Bioconda, you will require that conda is installed and configured to use bioconda channels.
  • Docker container: to use pypgatk from its docker container you will need Docker installed.
  • Source code: to use and install from the source code directly, you will need to have git, Python3 and pip.

Installation

pip

You can install pypgatk with pip:

pip install pypgatk

Bioconda

You can install pypgatk with bioconda (please setup conda and the bioconda channel if you haven't first, as explained here):

conda install pypgatk

Available as a container

You can use the pypgatk tool already setup on a Docker container. You need to choose from the available tags here and replace it in the call below where it says <tag>.

docker pull quay.io/biocontainers/pypgatk:<tag>

NOTE: Please note that Biocontainers containers do not have a latest tag, as such a docker pull/run without defining the tag will fail. For instance, a valid call would be (for version 0.0.2):

docker run -it quay.io/biocontainers/pypgatk:0.0.2--py_0

Inside the container, you can either use the Python interactive shell or the command line version (see below).

Use latest source code

Alternatively, for the latest version, clone this repo and go into its directory, then execute pip3 install . :

git clone https://github.com/bigbio/py-pgatk
cd py-pgatk
# you might want to create a virtualenv for pypgatk before installing
pip3 install .

Usage

The pypgatk design combines multiple modules and tools into one framework. All the possible commands are accessible using the commandline tool pypgatk_cli.py.

The library provides multiple commands to download, translate and generate protein sequence databases from reference and mutation genome databases.

$: pypgatk_cli -h

Usage: pypgatk [OPTIONS] COMMAND [ARGS]...

  This is the main tool that give access to all commands and options
  provided by the pypgatk

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  cbioportal-downloader    Command to download the the cbioportal studies
  cbioportal-to-proteindb  Command to translate cbioportal mutation data into
                           proteindb
  cosmic-downloader        Command to download the cosmic mutation database
  cosmic-to-proteindb      Command to translate Cosmic mutation data into
                           proteindb
  dnaseq-to-proteindb      Generate peptides based on DNA sequences
  ensembl-check            Command to check ensembl database for stop codons,
                           gaps
  ensembl-downloader       Command to download the ensembl information
  generate-decoy           Create decoy protein sequences using multiple
                           methods DecoyPYrat, Reverse/Shuffled Proteins.
  generate-deeplc          Generate input for deepLC tool from idXML,mzTab or
                           consensusXML
  msrescore-configuration  Command to generate the msrescore configuration
                           file from idXML
  peptide-class-fdr        Command to compute the Peptide class FDR
  threeframe-translation   Command to perform 3'frame translation
  vcf-to-proteindb         Generate peptides based on DNA variants VCF files

Full Documentation

https://pgatk.readthedocs.io/en/latest/pypgatk.html

Cite as

Husen M Umer, Enrique Audain, Yafeng Zhu, Julianus Pfeuffer, Timo Sachsenberg, Janne Lehtiö, Rui M Branca, Yasset Perez-Riverol Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides Bioinformatics, Volume 38, Issue 5, 1 March 2022, Pages 1470–1472 https://doi.org/10.1093/bioinformatics/btab838

py-pgatk's People

Contributors

dependabot[bot] avatar dongdongdongw avatar enriquea avatar husensofteng avatar ypriverol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

py-pgatk's Issues

Ensembl release

allow users to specify the release number to be used in ensembl-downloader. This would enable using older genomes (such as GRCh37) and previous gene annotations.

implement spectrumAI in pypgatk

spectrumAI (https://github.com/yafeng/SpectrumAI) is a tool that enables to detect the corresponding b and y ions for an specific mutation. The original algorithm was implemented in R but for better integration with the quantms pipeline and pypgatk would be great to have an implementation in python.

I suggest the following structure:

The commandline tool consume a file with the following format tsv:

canonical peptide | variant peptide | canonical aa | variant aa | position | spectra file | scan

Instead of using the code to generate the theoretical spectra, I suggest using the OpenMS function for that:

example:

from pyopenms import *

tsg = TheoreticalSpectrumGenerator()
spec1 = MSSpectrum()
peptide = AASequence.fromString("DFPIANGER")
# standard behavior is adding b- and y-ions of charge 1
p = Param()
p.setValue("add_b_ions", "false")
p.setValue("add_metainfo", "true")
tsg.setParameters(p)
tsg.getSpectrum(spec1, peptide, 1, 1) # charge range 1:1

# Iterate over annotated ions and their masses
print("Spectrum 1 of", peptide, "has", spec1.size(), "peaks.")
for ion, peak in zip(spec1.getStringDataArrays()[0], spec1):
    print(ion.decode(), "is generated at m/z", peak.getMZ())

refence: https://pyopenms.readthedocs.io/en/latest/theoreticalspectrumgenerator.html

@husensofteng can you provide an example in this format of a valid variant and a wrong variant including the mzML file.

Web application containing all databases and urls to be download

We need to have a simple web application that allows users to download specific precomputed databases depending on their needs. It would be really great if we have only a customized form where we open the parameters of the nextflow workflow in the previous issue and creates in our cloud kubernetes the database on the fly.

Incompatible COSMIC mutation records

Some mutation records from the COSMIC database are not parsable by the cosmic-to-proteindb command due to complexity of the variants and incompatible records.

e.g. in the following record, it is not clear what is the alternative allele from the designated columns. (c.? and p.R882(H^C)):

DNMT3A ENST00000264709.7 2739 2978 2646118 2646118 2506326 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm acute_myeloid_leukaemia NS NS n COSM6498615 178530426 c.? p.R882(H^C) Substitution - Missense - - Variant of unknown origin 25858894 blood-bone marrow primary

also, in this record, the mutation is written as c.? and p.614_615>21

FLT3 ENST00000241453.11 2982 3765 1291198 1291198 1202297 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm acute_myeloid_leukaemia M3 NS n COSM36079 178534521 c.? p.614_615>21 Complex - insertion inframe het - - Variant of unknown origin 9305596 blood-bone marrow NS

vcf-to-proteindb returned incorrect amino acid sequences

Hello
Dear developers of py-pgatk,

I tried to get proteins' sequences of TCGA MAF file. So, I convert maf file to vcf by maf2vcf.pl.
Then, I used vcf-to-proteindb. I also downloads GDC Reference Files: GRCh38.d1.vd1 Reference Sequence and GDC.h38 GENCODE v36 GTF for this.
It returned incorrect amino acid sequences for some mutation.

this is the code I wrote:
python ../py-pgatk/pypgatk/pypgatk_cli.py vcf-to-proteindb --vcf TCGA-06-A5U1-01A-11D-A33T-08.vcf --input_fasta input_fasta.fa --gene_annotations_gtf gencode.v36.annotation.gtf --annotation_field_name '' --output_proteindb var_peptides.fa

How can I fix this problem?

Thanks.

Cleaning the code

Currently, the dependency list in the package is large. We need to clean the dependencies to only those packages needed to run the tool.

  • Clean the dependency list
  • Upgrade the packages to the latest versions if possible.
  • Support python 3.8 if possible.

Download gnomAD variants and matching fasta

Translate gnomAD control variants:
Input:
gene_annotations_gtf: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
genome_fasta: genftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz
vep_annotated_vcf: https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz

Input parameters to be set for VCFtoProteinDB.py to process gnomAD VCF file:
--transcript_index 6
--annotation_field_name vep
--AF_field controls_AF
--biotype_str transcript_type

Implementation of Class FDR

Class FDR is used to filter peptide identifications / PSMs using the target/decoy approach of the class they they represent, e.g pseudo_X. The method should allow filtering by specific peptide class like pseudo_ or COSMIC_ or by group of classes like mutations which group a set of classes [COSMIC, cbio].

The class-FDR should be implemented for the following file formats:

  • idXML
  • Triqler
  • MSstats: Note This file format do not contain the search score for the peptide which make impossible to perform the FDR calculation.

Documentation improvement

@husensofteng probably will be good to add a table in the documentation for all the possible values of each parameter. I'm thinking for example about biotypes, translation table. This can help the users.

move pepBedR to python.

The current version of pepBedR is in R but we will probably need a version in python inside the py-pgtk.

Generating protein DB from custom VCF fails

Hi @ypriverol and @husensofteng ,

I followed the example in the documentation (https://pgatk.readthedocs.io/en/latest/pypgatk.html#variants-vcf-to-protein-sequences) and tried to generate a protein DB from a custom VEP-annotated VCF.
The command exit without error, however, the output is empty.

Here is what I'm running:

python pypgatk_cli.py vcf-to-proteindb --vcf vcf_vep/veped_snp_indel.sorted.vcf \
                                    --input_fasta ensembl_files/Homo_sapiens.GRCh38.cdna.all.fa \
                                    --gene_annotations_gtf ensembl_files/Homo_sapiens.GRCh38.104.gtf \
                                    --annotation_field_name "" \
                                    --output_proteindb var_peptides.fa

Terminal output:

2021-10-12 14:24:44,956 - INFO - Committing changes: 3146000 features
2021-10-12 14:24:46,059 - INFO - Populating features table and first-order relations: 3146131 features
2021-10-12 14:24:46,063 - INFO - Creating relations(parent) index
2021-10-12 14:24:50,761 - INFO - Creating relations(child) index
2021-10-12 14:24:53,919 - INFO - Creating features(featuretype) index
2021-10-12 14:24:58,641 - INFO - Creating features (seqid, start, end) index
2021-10-12 14:25:01,295 - INFO - Creating features (seqid, start, end, strand) index
2021-10-12 14:25:04,036 - INFO - Running ANALYSE features

But there are no records in the output fasta file.
Any idea?

Thanks

Error from ENSEMBL URL retry

  Traceback (most recent call last):
    File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
      timeout=timeout
    File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 641, in urlopen
      _stacktrace=sys.exc_info()[2])
    File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
      raise MaxRetryError(_pool, url, error or ResponseError(cause))
  urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='rest.ensembl.org', port=80): Max retries exceeded with url: /info/species (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2abe03a8f4e0>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Download population variants from ENSEMBL

get SpeciesName_incl_consequences.vcf.gz from ENSEMBL:

file path:
ftp://ftp.ensembl.org/pub/current_variation/vcf/*/*_incl_consequences*.vcf.gz

only for humans the file is split on chromosome while for other species it is a single file for all chromosomes!

GTF to DNA seq

Generate DNA fasta for transcripts in a given GTF file

Input:
file: genome fasta
file: GTF

Output:
file: DNA fasta sequences for each transcript from the GTF

Header of the proteins for search engines

@husensofteng the header of the proteins should be modified to be understood by SEARCH engines. SearchGUI don't understand the ENSEMBL ids. We probably need to move to the following header (
https://github.com/compomics/searchgui/wiki/DatabaseHelp#non-standard-fasta):

>generic[your tag]|[protein accession]|[protein description]

or 

>generic[your tag]|[protein accession]

Note that [your tag] can be empty.

Examples:

>generic_contig-535081|AC:123132|Hypothetical protein
>generic|AC:123132|Hypothetical protein
>generic|AC:123132

Cosmic mutations for celllines

@husensofteng :

I have implemented the download of the Cosmic cell-lines mutations file (03fccf4). It would be great if we can implement:

  • The conversion to proteinDB and test
  • The filter by Sample name which is the cancer cell line used. This can be used in the same way that tissue filter in the tumor mutations file.

wrong cosmic proteins output

if "?" not in snp.dna_mut: # unambiguous DNA change known in CDS sequence

Hey,
I've used py-pgatk and i find it very useful tool. I've took a deep look at some cosmic proteins and I believe there's an undesired effect from the code

file: cgenomes_proteindb.py, line: 71

 @staticmethod
  def get_mut_pro_seq(snp, seq):
    nucleotide = ["A", "T", "C", "G"]
    mut_pro_seq = ""
    # problem with the line below
    if "?" not in snp.dna_mut:  # unambiguous DNA change known in CDS sequence

The problem

considering snp.dna_mut is the cds mutation reported by cosmic, it could contains 5'UTR, intronic and 3'UTR references

image
doi: 10.2353/jmoldx.2007.060081

and since the fasta file contains CDS sequences which lacks 5'UTRs introns and 3'UTRs, line 71 should be modified to

if "?" not in snp.dna_mut and "?" not in snp.aa_mut :

Example

The comic mutation COSV52350905 for transcript RNASEK_ENST00000549393, ENST00000549393.2 with a cds mutation: c.*185_*209del and an unkown protein change p.?:Unknown

current behavior

original CDS

RNASEK_ENST00000549393 ENST00000549393.2 17:7012648-7014519(+)
atgaagaagtgccggttctccctcccctcttccgcactgtcccgtgatgatgacgcctccagagaggacgataatctgggttcctgggagagatggcttggtcactattcccacccttgcctcgaccacttgtctcaatgtcaccacctcacgccctgttccaggtggctgagtccgaatccaa taatgctcggaatatttttcaatgt ccattccgctgtgttgattgaggacgttcccttcacggagaaagattttgagaatggcccccagaacatatacaacctttacgagcaagtcagctacaactgtttcatcgctgcaggcctttacctcctcctcggaggcttctctttctgccaagttcggctcaataagcgcaaggaatacatggtgcgctag

wrongly mutated CDS since the mutation is in the UTR region

RNASEK_ENST00000549393 ENST00000549393.2 17:7012648-7014519(+)
atgaagaagtgccggttctccctcccctcttccgcactgtcccgtgatgatgacgcctccagagaggacgataatctgggttcctgggagagatggcttggtcactattcccacccttgcctcgaccacttgtctcaatgtcaccacctcacgccctgttccaggtggctgagtccgaatccaataatgctcggaatatttttcaatgtccattccgctgtgttgattgaggacgttcccttcacggagaaagattttgagaatggcccccagaacatatacaacctttacgagcaagtcagctacaactgtttcatcgctgcaggcctttacctcctcctcggaggcttctctttctgccaagttcggctcaataagcgcaaggaatacatggtgcgctag

wrong protein

COSV52350905:RNASEK_ENST00000549393:ENST00000549393.2:c.*185_*209del:p.?:Unknown
MKKCRFSLPSSALSRDDDASREDDNLGSWERWLGHYSHPCLDHLSQCHHLTPCSRWLSPNPTIPLC*LRTFPSRRKILRMAPRTYTTFTSKSATTVSSLQAFTSSSEASLSAKFGSISARNTWCA

Thank you.

gffread fasta generated file uses space and semicolon

Prior to version v0.12.6 gffread was using space to separate between features in the FASTA header. However, from version v0.12.6, space is used to separate CDS whereas semicolon is used to separate the other features. Currently, the transcript_description_sep is used to specify the separator but this has to be fixed here to allow for the space too produced by the newer version of gffread.

Alt ORFs from protein sequences

Extract an alternative open reading frame for a given transcript.

Input:
string: Transcript ID
file: Canonical proteins fasta
file: GTF
file: Genome DNA fasta

Output:
str: Proteins sequence containing the translated transcript sequence
str: record ID

Post-filter peptides

Filter a list of peptides based on specified filters

Input:
int: minimum length accepted peptide
int: maximum length accepted peptide
list: reference peptides to filter against (default null)
list: variant peptides to keep if not found in reference peptides (default null, only used when variant peptides are generated).

Output:
list: peptide sequences

Create a Nextflow workflow to create custom databases

We would like to have a nextflow workflow the parameters will define which custom database will be added:

Parameters:

  • Taxonomy (mandatory)
  • Include (lncRNA, sncRNA, ..): We basically need to expose in the workflow the parameter --include_biotypes
  • Include cancer mutations.
  • Tissue (This can be used in combination with the previous parameter).

Fasta input for vcf-to-proteindb

Hi,
I'm trying to use your package to translate vcd files to mutate protein sequence, I don't quite understand how to generate the Fasta file is the a chance to clarify what the arguments to the following command should be:

gffread -F -w input_fasta.fa -g genome.fa gene_annotations_gtf

Thanks!

changing values in the config files have no effect

The values are not used as specified in the configuration file!

Example, changing the --num_orfs parameter in the config/ensembl_config.yaml has no effect.

pypgatk_cli.py dnaseq-to-proteindb 
--config_file config/ensembl_config.yaml 
--input_fasta Meleagris_gallopavo.Turkey_5.1.106.fa 
--output_proteindb lncRNAs.fa 
--include_biotypes lncRNA

Produces 1703*3 = 5109 proteins since there are 1703 lncRNA transcripts in the fasta file and num_ofrs is set to 3 as default value here.

pypgatk_cli.py dnaseq-to-proteindb
--config_file config/ensembl_config.yaml
--input_fasta Meleagris_gallopavo.Turkey_5.1.106.fa
--output_proteindb lncRNAs.fa
--include_biotypes lncRNA
--num_orfs 1

Produces 1703 proteins since there are only 1703 lncRNAs in the fasta file.

However changing the same parameter in the config/ensembl_config.yaml file still produces 5109 proteins.

VCF to proteinDB

Convert a VEP annotated VCF into protein sequences.

Input:

  • GTF file (matching the one used to annotated the VCF file)
  • VCF file (VEP-annotated, e.g from ENSEMBL or gnomAD)
  • Genome fasta file

Process:

  • Convert the GTF file into fasta based on the given genome assembly.
  • For each variant change the variant position in the transcript sequence and translate

Output:

  • fasta file containing protein sequences per variant per affected transcript.

Error: Codon '-CT' is invalid

Hi @ypriverol and @husensofteng ,

I'm trying to generate a protein fasta database from the VCF at https://github.com/ypriverol/pgdb-manuscript/blob/main/Nassar_et_al.vcf.

After annotating the VCF using the VEP tool, I ran the vcf-to-proteindb cli tool and I'm getting the following error:

Traceback (most recent call last):
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/Bio/Seq.py", line 2621, in _translate_str
    amino_acids.append(forward_table[codon])
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/Bio/Data/CodonTable.py", line 402, in __getitem__
    raise KeyError(codon)  # stop codon
KeyError: '-CT'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pypgatk/pypgatk_cli.py", line 49, in <module>
    main()
  File "pypgatk/pypgatk_cli.py", line 45, in main
    cli()
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/enrique/opt/anaconda3/envs/py-pgatk/lib/python3.7/site-packages/pypgatk-0.0.19-py3.7.egg/pypgatk/commands/vcf_to_proteindb.py", line 71, in vcf_to_proteindb
    ensembl_data_service.vcf_to_proteindb(vcf, input_fasta, gene_annotations_gtf)
  File "/Users/enrique/opt/anaconda3/envs/py-pgatk/lib/python3.7/site-packages/pypgatk-0.0.19-py3.7.egg/pypgatk/ensembl/ensembl.py", line 650, in vcf_to_proteindb
    num_orfs)
  File "/Users/enrique/opt/anaconda3/envs/py-pgatk/lib/python3.7/site-packages/pypgatk-0.0.19-py3.7.egg/pypgatk/ensembl/ensembl.py", line 336, in get_orfs_vcf
    alt_orfs.append(alt_seq[n::].translate(translation_table))
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/Bio/Seq.py", line 1185, in translate
    cds, gap=gap)
  File "/Users/enrique/opt/anaconda3/lib/python3.7/site-packages/Bio/Seq.py", line 2638, in _translate_str
    "Codon '{0}' is invalid".format(codon))
Bio.Data.CodonTable.TranslationError: Codon '-CT' is invalid

Any idea related to this issue?

Thanks

PD: If I run the tool restricted to missense_variant and coding_sequence_variant I get the database fine. So, I'm guessing the error is related to a particular consequence/transcript, but not 100% sure.

DNA seq to ORFs:

Extract ORFs from a given DNA sequence.

Input:
string: DNA sequence
int: number of ORFs from the forward side
int: number of ORFs from the backward side
int: translation table (1 for DNA or 2 for mitochondrial)
file: genome fasta

Output:
str: protein sequence for each extracted ORF
str: new record ID

ORFs to peptides

Digest a given sequence into peptides based on a specified enzyme.

Input:
str: ORF sequence
str: digestion enzyme
int: number of allowed missed cleavages

Output:
list: peptide sequences

replace --list_taxonomies with list_names

since the tool takes ensemble (species) names as input for downloading, we should also change the listing option (--list_taxonomies) to list all species names instead of taxonomy ids.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.