Giter Site home page Giter Site logo

tirganteanga / ecolicollection Goto Github PK

View Code? Open in Web Editor NEW

This project forked from caileancarter/ecolicollection

0.0 0.0 0.0 53 KB

Scripts used for handling my E coli assemblies collection

License: MIT License

Python 90.17% Jupyter Notebook 9.59% Shell 0.24%

ecolicollection's Introduction

Total alerts Language grade: Python

EcoliCollection

Scripts used for handling and analysing my E. coli genome assembly collection.

The workflow starts by pulling genome assemblies from ENA using a saved search result from the site.

Tutorial

  1. Enter a search term into ENA search
  2. Select Assembly results
  3. Download ENA records: XML
  4. Create an output directory
  5. Run fasta_from_ena.py with XML file as input and directory created as output
  6. A summary Excel file is created from search result
  7. Wait for FASTA files to be downloaded and unzipped. This can take a while...
  8. Run fetch_Entrez_metadata.py with directory as positional argument (you can include Entrez API and your account email to speed up the run)

For serotyping:

  1. Use the Ectyper tool on Galaxy
  2. Download results into a folder
  3. Run utils.py with -s or --sero flag to input path for directory containing serotpe data. --input should be path for working directory

For phylotyping:

  1. Run utils.py with --input flag for working directory and use -p or --phylo flag.
  2. Optionally, you can specify location of EzClermont.py script using --script, but script is included in package.

Conda support

  1. Run setup_conda script in command-line or
  2. Run:
$ conda env create --name ecoli-collection --file environment.yaml
$ conda activate ecoli-collection

Snakemake support

  1. Have XML file in EcoliCollection directory
  2. Rename XML file to ena_genome_assembly.xml
  3. Run snakemake --cores 1

This is working progress:

  • Tidy scripts
  • Add docstrings
  • Make changes to EzClermont
  • Finish README
  • Include tutorial
  • Trialed and tested
  • CLI support
  • Snakemake support
  • conda support
  • Have Excel spreadsheet by put in parent directory instead of FASTA directory

ecolicollection's People

Contributors

caileancarter avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.