Giter Site home page Giter Site logo

qlslab / extaxsi Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 2.0 14.81 MB

Extaxsi is a bioinformatic library aimed to elaborate and visualize molecular and taxonomic informations.

License: MIT License

Python 1.79% Shell 0.01% Jupyter Notebook 0.11% HTML 98.09%
bioinformatics ecology visualization molecular-biology molecular data taxonomic-classifications taxonomy-database taxonomy python

extaxsi's Introduction

ExTaxsI

alt text

Project overview

ExTaxsI is a bioinformatic tool aimed to elaborate and visualize molecular and taxonomic informations. This open-source user friendly project, written in Python 3.7, allows the creation of interactive plots starting from NCBI search query or directly from offline taxonomic files.

ExTaxsI has multiple functions:

  • DATABASE: creation of multi FASTA files composed by nucleotide sequences, taxonomic lists, genes and accessions, starting from manual inputs or csv/tsv files.

  • VISUALIZATION: creation of interactive plots, such as scatter plot, sun burst plot and world map, starting from DATABASE output or external sources.

  • ID CONVERSION: conversion of TaxID into 6-ranks taxonomy and vice versa; it can convert single manual inputs, takes multiple inputs joined together by a plus sign or tsv/csv file with a list of taxIDs.

Hardware requirements

Minimum hardware requirements: no specific requirements are needed for ExTaxsI installation, however for the correct functioning of the software we suggest the following:

  • RAM: 4GB
  • CPU: quad-core or more.

Installation instructions in a nutshell

Please visit the examples directory of this Github page for a step-by-step installation and example usage.
PyPI package is also available to directly integrate ExTaxsI functions into Python code - for installation and usage, please visit the library directory.

1- Download Python 3:

https://www.python.org/

2- INSTALL EXTERNAL LIBRARIES:

To install external libraries open a terminal (prompt for windows users) in the extaxsi folder or navigate to the folder with the following:

  • cd path/to/ExTaxsI-folder
  • cd path\to\ExTaxsI-folder (windows-users)

Now that you are inside the ExTaxsI-folder, if you are a conda user (best option), run the following command:

  • conda create --name myExTaxsIenv --file requirements.txt --channel default --channel etetoolkit --channel plotly

Otherwise (less recommended), run the following command:

  • pip3 install -r pip_requirements.txt

3- CUSTOMIZE YOUR SETTINGS:

Before starting to use ExTaxsI, the settings.ini file should be customized:

  • entrez_email: insert your email;
  • api_key: insert your api_key created from NCBI;

In order to not overload the NCBI servers, by entering your API key in the setting file, NCBI admits maximum 10 requests/second for all activities from that key.

Here is the reference: https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/

How to run ExTaxsI

Open a terminal and go to the ExTaxsI folder directory by running the entire path:

linux:

  • cd path/to/ExTaxsI-folder

windows (prompt):

  • cd path\to\ExTaxsI-folder

mac:

  • cd path/to/ExTaxsI-folder

Now that you are in the right directory, run the following command to start:

  • python ExTaxsI.py

Operating instructions

The first time that you run ExTaxsI, the program will take time to download the local database which you can update as needed on startup.

Which module do you want to use?

Choose the module you’re interested in by entering the correlated number:

  1. Database creation module: taxonomy and FASTA files download;
  2. Visualization module: scatter plot and world map plot from taxonomy files or queries;
  3. Taxonomy IDs converter: conversion of taxID to 6-ranks taxonomy and vice versa;

Module 1: Database creation module; taxonomy and fasta files download.

When organism name list, Ids or accessions are less than 2500 the search key is composed by a single query, otherwise query will be splitted in groups of 2500 generating temporary files, which would be deleted at the end of the process.

Output file (standard format) will be saved in Download folder.

Available formats:

  • multi-FASTA file format (NCBI standard format, with header followed by nucleotide sequence, accession, name, code, gene)
  • TSV format

Module 2: Statistical module; ScatterPlot and world map from taxonomy files or queries

Module2’s required data, can be uploaded in several ways:

  • manually, by entering a query;
  • uploading module1’s taxonomy output file;
  • uploading file from external sources containing taxonomy lists.

It's possible to do 3 types of interactive plots:

  • scatter plot: uses taxonomy as input to produce a graph that indicates the quantity of each individual taxonomic unit and which taxa are present; alt text
  • sunburst plot: uses taxonomy but creates an expansion pie that allows to explore taxonomy in depth with less weight on the quantity of each individual taxonomic unit; alt text
  • world map plot: uses the country metadata of accessions data to produce a map indicating the position of each taxon found; alt text

Output file format: html

Output file folder: download

Module3: Taxonomy ID converter.

You can convert taxonomy ids from a file or by manual input into full 6 main ranks taxonomy (phylum;class;order;family;genus;species).

Note: list values must be placed within the first column.

Copyright and licensing information

Copyright (c)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contacts

Credits and acknowledgments

Throughout the creation of this tool, we relied heavily on contributions from professors and college students. Their input was invaluable, and we want to take a moment to thank them and recognize them for all of their hard work:

  • Adam Chahed, Ex-student at University of Milano-Bicocca.
  • Elena Parladori, Ex-student at University of Milano-Bicocca.
  • Bachir Balech, Researcher at CNR IBIOM.
  • Dario Pescini, Associate Professor at University of Milano-Bicocca.

extaxsi's People

Contributors

adali981 avatar albertobrusati avatar bachob5 avatar giuliaago avatar pescini avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

extaxsi's Issues

Checklist

General checks

  • Repository
    • Software code available on a public repository
  • License
  • Installation
    • List of dependencies
    • Provide the software via a package manager
      • Conda via Bioconda
      • Language specific package manager: Pypi, Bioconductor, etc
    • Installation proceeding as outlined in the documentation
  • Code structure

  • Automated tests
    • Continuous integration with automated tests verifying the functionality of the software
  • Versioning & DOI
    • Connect repository to archiving tool like Zenodo
    • Generate releases often

Documentation

  • README.md
  • Documentation together with the code: GitHub page, ReadTheDoc
  • A statement of need: Section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is
  • Installation instructions: Clearly-stated list of dependencies, installation guideline
  • Example usage: Examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Documentation of the core functionality of the software documented to a satisfactory level (e.g., API method documentation)
  • Performance
  • Community guidelines: Clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
    • Code of Conduct
    • CONTRIBUTING.md file
    • Roadmap
  • API documentation inside the functions

Credits to Bérénice Batut @bebatut

Blank geo map

If no country metadata were found, it creates an empty html plot - write this info in the tutorial

Row line in input file

Check first row of all the input files (for world map plot first row is skipped during reading)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.