Giter Site home page Giter Site logo

hartl3y94 / minifasta Goto Github PK

View Code? Open in Web Editor NEW

This project forked from not-a-feature/minifasta

0.0 0.0 0.0 115 KB

An small FASTA toolbox for small to medium size projects without dependencies.

License: GNU General Public License v3.0

Python 100.00%

minifasta's Introduction

miniFASTA

An easy FASTA object handler, reader, writer and translator for small to medium size projects without dependencies.

Test Badge Python Version Badge Download Badge

Installation

Using pip / pip3:

pip install miniFasta

Or by source:

git clone [email protected]:not-a-feature/miniFASTA.git
cd miniFASTA
pip install .

How to use

miniFASTA offers easy to use functions for fasta handling. The five main parts are:

  • fasta_object()
    • toAmino()
    • roRevComp()
    • valid()
    • len() / str() / eq()
  • read()
  • write()
  • translate_seq()
  • reverse_comp()

fasta_object()

The core component of miniFASTA is the fasta_object(). This object represents an entry in a FASTA file and consists of a head and body.

import miniFasta as mf
fo = mf.fasta_object(">Atlantic dolphin", "CGGCCTTCTATCTTCTTC", stype="DNA")
print(fo.head) # >Atlantic dolphin
print(fo.body) # CGGCCTTCTATCTTCTTC

### Following functions are defined on a fasta_object():

str(fo) # will return:
# >Atlantic dolphin
# CGGCCTTCTATCTTCTTC

# Body length
len(fo) # will return 18, the length of the body

# Equality 
print(fo == fo) # True

fo_b = mf.fasta_object(">Same Body", "CGGCCTTCTATCTTCTTC")
print(fo == fo_b) # True

fo_c = mf.fasta_object(">Different Body", "ZZZZAGCTAG")
print(fo == fo_c) # False

fasta_object(...).valid()

Checks if the body contains invalid characters. stype of fasta_object needs to be set in order to check for illegal characters in its body.

stype is one of:

  • ANY : [default] Allows all characters.
  • NA : Allows all Nucleic Acid Codes (DNA & RNA).
  • DNA : Allows all IUPAC DNA Codes.
  • RNA : Allows all IUPAC DNA Codes.
  • PROT: Allows all IUPAC Aminoacid Codes.

Optional: allowedChars can be set to overwrite default settings.

# The default object allows all characters.
# True
fasta_object(">valid", "Ä'_**?.asdLLA").valid()

# Only if stype is specified, valid can check for illegal characters.
# True
fasta_object(">valid", "ACGTUAGTGU", stype="NA").valid()

# False, as W is not allows for DNA/RNA
fasta_object(">invalid", "ACWYUOTGU", stype="NA").valid() 

# True
fasta_object(">valid", "AGGATTA", stype="ANY").valid(allowedChars = "AGTC")

# True, as stype is ignored if allowedChars is set.
fasta_object(">valid", "WYU", stype="DNA").valid(allowedChars = "WYU") 

fasta_object(...).toAmino(translation_dict)

Translates the body to an amino-acid sequence. See tranlate_seq() for more details.

fo.toAmino() 
print(fo.body) # Will return RPSIFF
d = {"CCG": "Z", "CTT": "A" ...}
fo.toAmino(d) 
print(fo.body) # Will return ZA...

fasta_object(...).toRevComp(complement_dict)

Converts the body to its reverse comlement. See reverse_comp() for more details.

fo.toRevComp() 
print(fo.body) # Will return GAAGAAGATAGAAGGCCG

Reading FASTA files

read() is a fasta reader which is able to handle compressed and non-compressed files. Following compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read. This function returns a list of fasta_objects. The entries are usually casted to upper case letters. Set read("path.fasta", upper=False) to disable casting.

fos = mf.read("dolphin.fasta") # List of fasta entries.
fos = mf.read("mouse.fasta", upper=False) # The entries won't be casted to upper case.
fos = mf.read("reads.tar.gz") # Is able to handle compressed files.

Writing FASTA files

write() is a basic fasta reader. It takes a single or a list of fasta_objects and writes it to the given path.

The file is usually overwritten. Set write(fo, "path.fasta", mode="a") to append file.

fos = mf.read("dolphin.fasta") # List of fasta entries
mf.write(fos, "new.fasta")

Sequence translation

translate_seq() translates a sequence starting at position 0. Unless translation_dict is provided, the standart bacterial code is used. If the codon was not found, it will be replaced by an ~. Tailing bases that do not fit into a codon will be ignored.

mf.translate_seq("CGGCCTTCTATCTTCTTC") # Will return RPSIFF

d = {"CGG": "Z", "CTT": "A"}
mf.translate_seq("CGGCTT", d) # Will return ZA.

Reverse Complement

reverse_comp() converts a sequence to its reverse comlement. Unless complement_dict is provided, the standart complement is used. If no complement was found, the nucleotide remains unchanged.

mf.reverse_comp("CGGCCTTCTATCTTCTTC") # Will return GAAGAAGATAGAAGGCCG

d = {"C": "Z", "T": "Y"}
mf.reverse_comp("TC", d) # Will return ZY

License

Copyright (C) 2021 by Jules Kreuer - @not_a_feature
This piece of software is published unter the GNU General Public License v3.0
TLDR:

| Permissions      | Conditions                   | Limitations |
| ---------------- | ---------------------------- | ----------- |
| ✓ Commercial use | Disclose source              | ✕ Liability |
| ✓ Distribution   | License and copyright notice | ✕ Warranty  |
| ✓ Modification   | Same license                 |             |
| ✓ Patent use     | State changes                |             |
| ✓ Private use    |                              |             |

Go to LICENSE.md to see the full version.

Dependencies

In addition to packages included in Python 3, this piece of software uses 3rd-party software packages for development purposes that are not required in the published version. Go to DEPENDENCIES.md to see all dependencies and licenses.

minifasta's People

Contributors

not-a-feature avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.