Giter Site home page Giter Site logo

ice's Introduction

Synthego Inference of CRISPR Edits (ICE)

Coverage Status CircleCI Status PyPI version PRs Welcome

Synthego ICE

ICE tool is a CRISPR editing analysis tool that infers presence of indels and other mutation. ICE uses non-negative least squares regression to detect the presence or evidence of edits. In contrast to TIDE1, ICE can analyze insertions, deletions, HDR, multiplex edits, and base editing and is available under a open-source license for non-commercial use. ICE can also be used for analysis of other genome engineering methods, such as TALEN and homing endonucleases.

Audience

This document is intended for technical users who have prior experience with CRISPR editing analysis. For more detailed user documentation, please visit Synthego’s Help Center, where you can find our ICE User Guide and additional documentation.

Citing ICE

A preprint Inference of CRISPR Edits from Sanger Trace Data2 provides an overview and empirical test of ICE on over 1,800 real world edits. We ask that you cite our paper if you use ICE in work that leads to publication.

We will continue to improve ICE, so please refer to the version number in your publication. The version number can be found by running ICE with the --version option.

Inputs

  1. Control sample Sanger ab1 file
  2. Edited sample Sanger ab1 file
  3. Sequence of protospacer or gRNA target

Outputs:

Overall Editing efficiency, plots of distribution of edit types, plots of discordance (a calculation of signal agreement to control sequence), annotated Sanger traces of the region flanking the cut site, and JSON files containing the data for all of those plots.

Trace JSON The Sanger sequence traces for the region around the cut site for the edited and control samples are shown.

Discordance & indel distribution files The Discord json shows the agreement of both the control and edited sample to the called base. A discordance of 1 indicates that there is zero signal in the base called at that particular position, whereas a discord of 0 would mean that all non-reference bases have zero signal (all signal agrees with called base). The alignment window is used to align the control sample to the edited sample, while the inference window denotes the subsection of data used for NNLS regression.

The indel distribution json shows the distribution of indel identified, summarized by length. Thus, two different -1 indels would be summarized to the same bin.

Sequence Contributions

Relative contribution of each sequence (normalized)
--------------------------------------------------------
0.3006 	 -1[g1] 	 CCCAACACAACCAGTTGCAGGCGCC|-CATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.1996 	 0[g1] 	     CCCAACACAACCAGTTGCAGGCGCC|CCATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.1818 	 1[g1] 	     CCCAACACAACCAGTTGCAGGCGCC|nCCATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.1128 	 -2[g1] 	 CCCAACACAACCAGTTGCAGGCGC-|-CATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0541 	 2[g1] 	     CCCAACACAACCAGTTGCAGGCGCC|nnCCATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0317 	 -1[g1] 	 CCCAACACAACCAGTTGCAGGCGC-|CCATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0296 	 -4[g1] 	 CCCAACACAACCAGTTGCAGGC---|-CATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0185 	 -3[g1] 	 CCCAACACAACCAGTTGCAGGC---|CCATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0134 	 -19[g1] 	 CCCAACACAACCAGT----------|---------GCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0078 	 -16[g1] 	 CCCAACACAACCAGTTG--------|--------AGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0067 	 -18[g1] 	 CCCAACACAA---------------|---TGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0060 	 -20[g1] 	 CCCAACACAA---------------|-----GTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0053 	 -16[g1] 	 CCCAACACAACCAGTTGCAGGC---|-------------CAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0042 	 -16[g1] 	 CCCAACACAA---------------|-CATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0032 	 -15[g1] 	 CCCAACACAACC-------------|--ATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0028 	 -4[g1] 	 CCCAACACAACCAGTTGCAGGCGCC|----GGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0021 	 -20[g1] 	 CCCAACACAACCA------------|--------AGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0012 	 -17[g1] 	 CCCAACACAA---------------|--ATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA
0.0005 	 -13[g1] 	 CCCAACACAACC-------------|CCATGGTGAGCATCAGCCTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCAAGGTCACATGCTTGTTCATGAGCTCTCAGGCA

The first column indicates the proportion of that sequence inferred in the pool. The second column is a summarized identity indicating the size of the indel and which guide. The third column is a human-readable representation of the sequence with dashes indicating deletions and 'n' indicating insertions.

Additional files such as alignment verification are generated for each sample.

Using ICE

A hosted free version of the ICE tool is available online at https://ice.synthego.com. The online ICE tool supports batch analysis, figure generation, and error handling & sample QC.

The source code behind the core ICE analysis is open source and free to use for non-commercial applications. Commercial use and other licensing options are available. For details, see LICENSE.

Installation

Synthego ICE can be installed as a docker container or directly via pip. Additional developer instructions are located in DEVELOP.md. All examples below use test data found in ice/tests/test_data. The test file (./ice/tests/test_data/batch_example.xlsx) is an example of how to specify batch inputs.

Method 1. Pip install

Install into your favorite python3 virtual environment (virtualenv, conda).

conda create --name ice_env python=3 # create a python3 virtual environment

source activate ice_env # activate the virtual environment

pip install sythego_ice # install synthego ice from pip

After installation, you can use Synthego ICE as a module (see python_example.py) or directly via command line.

Command line tools

synthego_ice

usage: synthego_ice [-h] --control CONTROL --edited EDITED --target TARGET
                    [--out OUT] [--donor DONOR] [--verbose] [--version]

Analyze Sanger reads to Infer Crispr Edit outcomes

optional arguments:
  -h, --help         show this help message and exit
  --control CONTROL  The wildtype / unedited ab1 file (REQUIRED)
  --edited EDITED    The edited ab1 file (REQUIRED)
  --target TARGET    Target sequence(s) (17-23 bases, RNA or DNA, comma
                     separated), (REQUIRED)
  --out OUT          Output base path (Defaults to ./results/single)
  --donor DONOR      Donor DNA sequence for HDR (Optional)
  --verbose
  --version          show program's version number and exit

synthego_ice_batch

usage: synthego_ice_batch [-h] --in INPUT [--out OUT] --data DATA [--verbose]
                          [--line LINE] [--allprops] [--version]

Analyze Sanger reads to infer crispr edit outcomes

optional arguments:
  -h, --help   show this help message and exit
  --in INPUT   Input definition file in Excel xlsx format (required)
  --out OUT    Output directory path (defaults to .)
  --data DATA  Data path, where .ab1 files are located (required)
  --verbose    Display verbose output
  --line LINE  Only run specified line in the Excel xlsx definition file
  --allprops   Output all Edit Proposals, even if they have zero contribution
  --version    show program's version number and exit

Analyzing example data via command line tools

After installing via pip, grab the example data by cloning this repository:

git clone [email protected]:synthego-open/ice.git ice

cd ice # change into the ice directory

Analyzing a single sample

synthego_ice \
	--control ./ice/tests/test_data/good_example_control.ab1 \
	--edited ./ice/tests/test_data/good_example_edited.ab1 \
	--target AACCAGTTGCAGGCGCCCCA \
	--out results/testing \
	--verbose

When complete, you'll find the ICE analysis outputs in the ./results folder.

Analyzing multiple samples (batch analysis)

synthego_ice_batch \
	--in ./ice/tests/test_data/batch_example.xlsx \
	--out ./results/ \
	--data ./ice/tests/test_data/
	--verbose

When complete, you'll find the ICE analysis outputs in the ./results folder.

Method 2. Installing via Docker container

Requirements:

Installation

From a command line, grab the latest version of Synthego ICE from Docker Hub.

docker pull synthego/ice

After installation, you'll be able to run Synthego ICE from the docker container.

Analyzing example data with docker

Grab the example data by cloning this repository:

git clone [email protected]:synthego-open/ice.git ice

cd ice # change into the ice directory

Analyzing a single sample

docker run -it -v ${PWD}:/data -w /ice -i ice:latest \
	python ice_analysis_single.py \
	--control /data/ice/tests/test_data/good_example_control.ab1 \
	--edited /data/ice/tests/test_data/good_example_edited.ab1 \
	--target AACCAGTTGCAGGCGCCCCA \
	--out /data/results/testing \
	--verbose

When complete, you'll find the ICE analysis outputs in the ./results folder.

Analyzing multiple samples (batch analysis)

docker run -it -v ${PWD}:/data -w /ice -i ice:latest \
	python ice_analysis_batch.py \
	--in /data/ice/tests/test_data/batch_example.xlsx \
	--data /data/ice/tests/test_data/ \
	--out /data/results/ \
	--verbose

When complete, you'll find the ICE analysis outputs in the ./results folder.

Contributing

Pull requests are welcome. Please follow the below steps to ensure your work is merged as efficiently as possible.

  1. Make a Github issue outlining the bug you aim to fix or the feature you want to add. This prevents redundant work and lets us reach an agreement on your proposal before you put significant effort into it.
  2. Fork the repository and create your branch from master.
  3. Code your feature or bug fix.
  4. Add tests for you new code and make sure all other tests still pass.
  5. Submit a pull request referencing the issue from step 1.

Testing

Tests are written using pytest. Test files can be found in ice/tests.

  • Run all tests (can take a few minutes):
    • $ pytest
  • Run a specific test:
    • $ pytest ice/tests/{{filename}}.py::{{test_function_name}}
    • $ pytest ice/tests/test_utility.py::test_sequence_util
  • Run all tests with coverage report:
    • $ py.test --cov-report term-missing --cov=ice ice/tests

References

[1] Eva K. Brinkman, Tao Chen, Mario Amendola, and Bas van Steensel. Easy quantitative assessment of genome editing by sequence trace decomposition.Nucleic Acids Res. 2014 Dec 16; 42(22): e168. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267669/

[2] Hsiau et. al, Inference of CRISPR Edits from Sanger Trace Data. BioArxiv. 2018 https://www.biorxiv.org/content/early/2018/01/20/251082

License

Copyright 2018 Synthego Corporation All Rights Reserved

The Synthego ICE software was developed at Synthego Corporation.

Permission to use, copy, modify and distribute any part of Synthego ICE for educational, research and non-profit purposes, without fee, and without a written agreement is hereby granted, provided that the above copyright notice, this paragraph and the following paragraphs appear in all copies.

Those desiring to incorporate this Synthego ICE software into commercial products or use for commercial purposes should contact Synthego support at Ph: (888) 611-6883 ext:1, E-MAIL: [email protected].

In no event shall Synthego Corporation be liable to any party for direct, indirect, special, incidental, or consequential damages, including lost profits, arising out of the use of Synthego ICE, even if Synthego Corporation has been advised of the possibility of such damage.

The Synthego ICE tool provided herein is on an "as is" basis, and the Synthego Corporation has no obligation to provide maintenance, support, updates, enhancements, or modifications. The Synthego Corporation makes no representations and extends no warranties of any kind, either implied or express, including, but not limited to, the implied warranties of merchantability or fitness for a particular purpose, or that the use of Synthego ICE will not infringe any patent, trademark or other rights.

ice's People

Contributors

andrewcchang avatar biogeek avatar hsiaut avatar humbertoevans avatar nicholasarossi avatar richstoner avatar thetechnocrat-dev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ice's Issues

pytest fail on v1.1.0

Hello,

It is essential that I run a prior version to validate prior results. Please advise on the pytest failure, also observed in treatment data. Thanks.

-------------------------------------------------- analyzing 8 donor_no_alignment AAGTGCAGCTCGTCCGGCGT
Base dir: /tmp/tmpxqi1g4d4/batch_analysis
Single Sanger analysis failed ord() expected string of length 1, but int found
Traceback (most recent call last):
  File "/home/tdfyoder/GIT/ice_v1.X/ice/ice/analysis.py", line 186, in multiple_sanger_analysis
    result = single_sanger_analysis(*job_args, **job_kwargs)
  File "/home/tdfyoder/GIT/ice_v1.X/ice/ice/analysis.py", line 77, in single_sanger_analysis
    donor=donor)
  File "/home/tdfyoder/GIT/ice_v1.X/ice/ice/classes/sanger_analysis.py", line 126, in initialize_with
    control_sample.initialize_from_path(control_path)
  File "/home/tdfyoder/GIT/ice_v1.X/ice/ice/classes/sanger_object.py", line 80, in initialize_from_path
    phred_scores.append(ord(c))
TypeError: ord() expected string of length 1, but int found
None of the samples were able to be analyzed
False
===================================== 14 failed, 31 passed in 1.82 seconds ======================================

Substitute NNNs in insertions with an actual sequence

Putting this as a new feature request or question, as a lot of our users are asking for it.
In knockin experiments scientists want to check the actual inserted sequence. Are there any plans to have that in results instead of just a sequence of Ns?

Can one expand the indel size limit?

I did not find a way to increase the maximum indel size in the site or as an option in the command line version. I am interested in deletions generally longer than 50bp (TIDE's maximum) but perhaps as long as a few hundred bases. Is this something ICE is programmed to handle?

Data visualization

Hi, I am pretty new to data analysis on gene editing. I have successfully run the code and got the results. But I wonder how should I visualize the result?

Installation Typo

There is an typo in the Readme file describing installation:

pip install sythego_ice

should be

pip install synthego_ice

The newest docker image is broken?

Hi there,

we've been using synthego/ice:latest for over 10 months. It's been working well till last week when we did a docker pull and got a new image. All the runs failed with the same python error, and I can reproduce it running test_analysis.py:

E TypeError: ord() expected string of length 1, but int found

ice/classes/sanger_object.py:112: TypeError

The test env is
platform darwin -- Python 3.6.4, pytest-3.4.2, py-1.8.0, pluggy-0.6.0

Now we rolled back to an old image we saved before. But would like to report this bug in case it was overlooked.

Thanks,

Pei

AttributeError: 'MultipleSeqAlignment' object has no attribute 'format'

Hi Thanks for sharing this tool.
After installation when running the example I get the following error:

$ synthego_ice --control ./ice/tests/test_data/good_example_control.ab1 --edited ./ice/tests/test_data/good_example_edited.ab1 --target AACCAGTTGCAGGCGCCCCA --out results/testing --verbose

Synthego ICE (https://synthego.com)
Version: 1.1.1-alpha1
Base dir: /rhampseq/kkiaee/ice/ice/results
Exception Caught!
Traceback (most recent call last):
  File "/home/ccadmin/anaconda3/envs/ice_env/lib/python3.8/site-packages/ice/analysis.py", line 82, in single_sanger_analysis
    sa.analyze_sample()
  File "/home/ccadmin/anaconda3/envs/ice_env/lib/python3.8/site-packages/ice/classes/sanger_analysis.py", line 799, in analyze_sample
    alignment.align_all()
  File "/home/ccadmin/anaconda3/envs/ice_env/lib/python3.8/site-packages/ice/classes/pair_alignment.py", line 94, in align_all
    self.all_aligned_clustal = self.align_list_to_clustal(aln, "control", "edited")
  File "/home/ccadmin/anaconda3/envs/ice_env/lib/python3.8/site-packages/ice/classes/pair_alignment.py", line 80, in align_list_to_clustal
    alignment_txt = aln_objs[0].format("clustal").split('\n', 2)[2]
AttributeError: 'MultipleSeqAlignment' object has no attribute 'format'

I'd appreciate your help in troubleshooting this.

docker permission error

docker pull synthego/ice

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Thanks

Analysis failed with 311 in the notes

Hi there,

we use ICE with docker install. It is very easy and fast to use, thanks!

I am curious what exactly the error was if a sample failed analysis. The log shows:

-- analyzing 40 filename.ab1 GGTGAGTGAGTGTGTGCGTG
analyzing 462 number of edit proposals
R_SQUARED 0.998049444356253
discord (aln window): 0.07 after cutsite: 0.13
Exception Caught!
None

I tried on your website, same "311" failed error. Just curious what that means.

Thanks again,

Pei

Divide by zero

Hi there,

we recently encountered a problem running ICE (v1.2.0). See the error below:


Traceback (most recent call last):
File "/ice/ice/analysis.py", line 82, in single_sanger_analysis
sa.analyze_sample()
File "/ice/ice/classes/sanger_analysis.py", line 834, in analyze_sample
self.simple_discordance_algorithm()
File "/ice/ice/classes/sanger_analysis.py", line 664, in simple_discordance_algorithm
len(unexplained_discord_signal)) * 100)
ZeroDivisionError: float division by zero

Any idea what caused this?

Thanks,

Pei

Confusing error message

Hi there,

one of our recent runs got this error "Control ab1 trace quality too upstream of guide too low", maybe it can be better worded. I am not sure if this is a data issue or a design issue?

Thanks,

Pei

Difference in trace.json format

I am currently running synthego-open/ice using the latest docker (I didn't want to explicitly use latest tag): synthego/ice:fix_throwing_out_wildtype_hdr. However, it looks as though the structure of the trace.json file differs from the structure of the trace.json file on your website. For example, when running a batch analysis using the docker version, the trace.json file looks as follows:
{
'ctrl_sample': {'trace_data': [], 'cutsite': #, 'guide_start': #, 'guide_end': #., 'pam_start': #, 'pam_end': #},
'edited_sample': {'trace_data': [], 'cutsite': #}
}

But when I run the same batch on your website and ask for the results back, the format looks as follows:
[
{
'ctrl_sample': {'trace_data': [], 'guide_target': {}, 'cutsites': [{}]}
'edited_sample': {'trace_data': [], 'cutsites': [{}]}
}
]

Am I doing something wrong to not recreate the same trace.json file? This is potentially problematic for me when trying to analyze multiplexed samples where the trace.json should be returned as a list with multiple cut sites.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.