Giter Site home page Giter Site logo

gpmap's Introduction

GPMap

PyPI version Join the chat at https://gitter.im/harmslab/gpmap Documentation Status Build Status nbviewer Binder

A Python API for managing genotype-phenotype map data

GPMap defines a flexible object for managing genotype-phenotype (GP) map data. At it's core, it stores all data in Pandas DataFrames and thus, interacts seamlessly with the PyData ecosystem.

To visualize genotype-phenotype objects created by GPMap, checkout GPGraph.

Basic example

Import the package's base object.

from gpmap import GenotypePhenotypeMap

Pass your data to the object.

# Data
wildtype = "AAA"
genotypes = ["AAA", "AAT", "ATA", "TAA", "ATT", "TAT", "TTA", "TTT"]
phenotypes = [0.1, 0.2, 0.2, 0.6, 0.4, 0.6, 1.0, 1.1]
stdeviations = [0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]

# Initialize the object
gpm = GenotypePhenotypeMap(wildtype,
                           genotypes,
                           phenotypes,
                           stdeviations=stdeviations)

# Check out the data.
gpm.data

Or load a dataset from disk.

gpm = GenotypePhenotypeMap.read_json("data.json")

Installation

Users This simplest way to install this package is using pip:

pip install gpmap

Developers The recommended way to install this package for development is using pipenv.

  1. Clone this repository:
git clone https://github.com/harmslab/gpmap
  1. Install the package using pipenv.
cd gpmap
pipenv install --dev -e .
  1. Run tests using pytest
pytest

Dependencies

The following modules are required. Also, the examples/tutorials are written in Jupyter notebooks and require IPython to be installed.

gpmap's People

Contributors

harmsm avatar lgoldbach avatar lperezmo avatar zsailer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpmap's Issues

Is it suitable for whole gene?

I wonder if it's suitable for full sequence?
When I test the test_data and change it to amino acid genotypes, it works well. However, when I elongate the test sequence to 242 aa, it can't work.

And I got error at the step of SequenceSpace():
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 14.2 GiB for an array with shape (1912602624,) and data type float64

I hope to use it to analyse random mutantions on a 239 amino acids long gene.

Move data under "data" key in dictionary/json format.

The dictionary and json formats put metadata and data at the same level, i.e.

{
  "name": "my_data",
  "description": "a sentence about my data",
  "genotypes": [...],
  "phenotypes": [...],
  .
  .
  .
}

I suggest that we move data underneath a "data" key. This aligns more with the GenotypePhenotypeMap anyways, which stores the data in a DataFrame under the data attribute. This cleanly separates the data from metadata.

This idea came up when thinking about how someone stores data in an Excel file or CSV file. Metadata can't easily go in those formats next to the data, so they usually live in a separate file (like a JSON or YAML file). If they were to merge, say, an Excel data and JSON meta data file into a single JSON file, I think it's more clear to have the data become a single field with subfields.

{
  "name": "my_data",
  "description": "a sentence about my data",
  "data" : {
    "genotypes": [...],
    "phenotypes": [...],
    .
    .
    .
  }
}

GenotypePhenotypeMap throws error

Error:

AttributeError: GenotypePhenotypeMap instance has no attribute '_indices'

When using code from tutorial:

from gpmap import GenotypePhenotypeMap

Create list of genotypes and phenotypes

wildtype = "AA"
genotypes = ["AA", "AV", "AM", "VA", "VV", "VM"]
phenotypes = [1.0, 1.1, 1.4, 1.5, 2.0, 3.0]

Create GenotypePhenotypeMap object

gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)

Attempted on two machines with same result.

what to do about extra data?

Need to properly handle extra data that isn't used by GPMap. Right now, GPMap ignores extra data annotated on each genotype. Users however, will likely want to keep this information together.

The question is, should we expose these attributes through the GPMap interface. Or should they only live in the underlying GPMap DataFrame?

Using VCF files?

I have DST information for drug resistant and susceptible bacteria. How can I convert them to a genotype-phenotype map for your gpmap & epistasis modules?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.