Giter Site home page Giter Site logo

tripal_cmap_loader's Introduction

Build Status

Tripal Cmap

This module currently provides a Tripal 3 importer for Cmap files that conforms to the Chado map module schematic. We hope to use existing modules for display (ie cmapjs by legume federation). A field utilizing cmap.js will be included in this module soon.

Installation and setup

This module is not installable with drush. Navigate to where your custom modules are installed and clone this repository. For example:

cd /var/www/html/sites/all/modules/custom/
git clone https://github.com/statonlab/tripal_cmap_loader.git
drush pm-enable tripal_cmap_loader -y

Expected CMAP data

The below table shows an example CMAP file. You can find full file in the example in the example folder. This example map is a genetic map published here.

The importer will load features in assuming that the accession is the unique name and the name is the feature name. currently we ignore the is_landmark and feature_aliases columns.

map_acc map_name map_start map_stop locus_acc feature_name feature_accession feature_aliases feature_start feature_stop feature_type_acc is_landmark
C_mollisima_A A 0 90.4 CmSI0385 CmSI0385 CmSI0385 0 0 microsatellite 0
C_mollisima_A A 0 90.4 CmSNP01340 CmSNP01340 CmSNP01340 1.1 1.1 SNP 0
C_mollisima_A A 0 90.4 CmSNP01086 CmSNP01086 CmSNP01086 3.5 3.5 SNP 0

Using the importer

Before loading the file, create a featuremap (Tripal 3 bundle: Map) entity record.

You will need to select the following when loading a cmap file:

  • organism
  • featuremap
  • sequence ontology term for the mapping features.

You must choose a single organism when loading the map: this is the organism that will be used for new features created. Two types of things will be loaded into Chado: the mapping feature (a chromosome, a scaffold), and the marker feature (a SNP). The type_id for the marker is taken from the feature_type_acc column: that term must be in the sequence ontology. The mapping feature is chosen in the loader.

Cmap column to Chado mapping

Note that the cmap format is not consistent, or at least we have not found a definitive list of columns. The loader expects the following 11 columns. If your file is not 12 columns wide, it will not load. The first column must be map_acc

cmap column chado entry
map_acc uniquename for mapping feature IE chromosome, scaffold
map_name name for mapping feature IE chromsome, scaffold
map_start not used
map_stop not used
locus_accession uniquename for the locus feature
feature_name name for marker feature
feature_accession uniquename for marker feature
feature_aliases not used
feature_start featurepos record for start
feature_stop featurepos record for stop
feature_type_accession type_id for marker feature
is_landmark not used

FPC

We have written a script to convert FPC format to cmap. See here for code).

Using this module with tripal_map

If you would look to use the Tripal Map module to visualize data loaded with this module, you need to make two small adjustments:

  1. Replace the tripal_map_genetic_markers_mview mview.
SELECT F.uniquename as marker_locus_name, F.feature_id as marker_locus_id, F2.uniquename as genetic_marker_name,
  C1.name as map_unit_type, C2.name as marker_type, FM.name as map_name, FM.featuremap_id as map_id, FMP.value as map_type,
  F3.name as linkage_group, F3.feature_id as linkage_group_id, FPP.value as marker_pos, C.name as marker_pos_type,
  O.organism_id as organism_id, O.genus as genus, O.species as species, O.common_name as common_name
  FROM {feature} F
  INNER JOIN {feature_relationship} FR 	ON FR.subject_id = F.feature_id AND
    F.type_id = (SELECT cvterm_id  FROM {cvterm} WHERE name = 'biological_region' AND
    cv_id = (SELECT cv_id FROM {cv} WHERE name = 'sequence'))
     AND
    FR.type_id = (SELECT cvterm_id  FROM {cvterm} WHERE {cvterm}.name = 'instance_of' AND
    cv_id = (SELECT cv_id FROM {cv} WHERE name = 'OBO_REL'))

  INNER JOIN {feature} F2               	ON FR.object_id = F2.feature_id 
     AND
    FR.type_id = (SELECT cvterm_id FROM {cvterm} WHERE name = 'instance_of' AND
    cv_id = (SELECT cv_id FROM {cv} WHERE name = 'OBO_REL'))
    
  INNER JOIN {featurepos} FP            	ON F2.feature_id = FP.feature_id
  
  INNER JOIN {featuremap} FM    		ON FM.featuremap_id = FP.featuremap_id
  INNER JOIN {cvterm} C1                ON C1.cvterm_id = FM.unittype_id
  	INNER JOIN {cvterm} C2 ON C2.cvterm_id = F2.type_id
  INNER JOIN {featuremapprop} FMP       ON FMP.featuremap_id = FP.featuremap_id AND
   FMP.type_id = (SELECT cvterm_id FROM {cvterm} WHERE name = 'featuremap_type' AND
   cv_id = (SELECT cv_id FROM {cv} WHERE name = 'local'))
  INNER JOIN {featuremap_organism} FMO 	ON FMO.featuremap_id = FM.featuremap_id
  INNER JOIN {feature} F3 				ON FP.map_feature_id = F3.feature_id
  INNER JOIN {featureposprop} FPP 		ON FPP.featurepos_id = FP.featurepos_id
  INNER JOIN {cvterm} C 				ON C.cvterm_id = FPP.type_id
  INNER JOIN {organism} O 				ON FMO.organism_id = O.organism_id

  
  1. Replace the tripal_map_qtl_and_mtl_mview mview
SELECT FM.name as map_name,
FM.featuremap_id as map_id,
FMP.value as map_type,
F3.name as linkage_group,
F3.feature_id as linkage_group_id,
F.uniquename as marker_locus_name,
C1.name as map_unit_type,
C2.name as marker_type,
C4.name as marker_pos_type,
FPP.value as marker_pos,
O.organism_id as organism_id, O.genus as genus, O.species as species, O.common_name as common_name,
F.feature_id as feature_id, FPP.featurepos_id as featurepos_id
FROM {feature} F
--F is the biological_region or parent marker locus
INNER JOIN {feature_relationship} FR 	ON FR.subject_id = F.feature_id AND
F.type_id = (SELECT cvterm_id  FROM {cvterm} WHERE name = 'biological_region' AND
cv_id = (SELECT cv_id FROM cv WHERE name = 'sequence'))
AND
FR.type_id = (SELECT cvterm_id  FROM {cvterm} WHERE cvterm.name = 'instance_of' AND
cv_id = (SELECT cv_id FROM cv WHERE name = 'OBO_REL'))

-- F2 is the marker feature itself
INNER JOIN {feature} F2               	ON FR.object_id = F2.feature_id
-- This mview is just for QTLs
INNER JOIN {cvterm} C ON F2.type_id = C.cvterm_id AND (C.name = 'QTL' OR C.name = 'heritable_phenotypic_marker')
INNER JOIN {featurepos} FP            	ON F2.feature_id = FP.feature_id
INNER JOIN {featuremap} FM    		ON FM.featuremap_id = FP.featuremap_id
INNER JOIN {cvterm} C1                ON C1.cvterm_id = FM.unittype_id
-- C2 is the marker type term
INNER JOIN {cvterm} C2 ON C2.cvterm_id = F2.type_id
INNER JOIN {featuremapprop} FMP       ON FMP.featuremap_id = FP.featuremap_id AND
FMP.type_id = (SELECT cvterm_id FROM cvterm WHERE name = 'featuremap_type' AND
cv_id = (SELECT cv_id FROM cv WHERE name = 'local'))
INNER JOIN {featuremap_organism} FMO 	ON FMO.featuremap_id = FM.featuremap_id
--F3 is the parent feature of the map ie the linkage group
INNER JOIN {feature} F3 				ON FP.map_feature_id = F3.feature_id
INNER JOIN {featureposprop} FPP 		ON FPP.featurepos_id = FP.featurepos_id
INNER JOIN {organism} O 				ON FMO.organism_id = O.organism_id
INNER JOIN {cvterm} C4  ON FPP.type_id = C4.cvterm_id
  1. Ensure that the featuremap bundle ( IE Genetic Map) has a chado property with the local:featuremap_type term. I reccommend setting a default value for this property so you won't forget. Note you should only need to set this property yourself if the Tripal Entity you are using was custom created and has no type.

  2. Go to the tripal map admin settings at /admin/tripal/extension/tripal_map. Change the start and stop property names tostart position and stop position. If you don't do this, the maps won't draw!

If your featuremap has this property set, and you've populated the altered tripal_map_genetic_markers_mview materialized view (Data Storage -> Chado -> Materialized Views, press "populate"), your field should show up on the organism and featuremap! You might need to clear the cache (drush cc all) before the field appears on the organism.

tripal_cmap_loader's People

Contributors

almasaeed2010 avatar bradfordcondon avatar jwest60 avatar

Watchers

 avatar  avatar  avatar

tripal_cmap_loader's Issues

linkage group not valid term

we list it as as valid term in the loader, but it isnt

its invalid because synonyms wont work. we need to either use the correct Chado API which replaces synonyms, or, not use the autocomplete that includes synonyms.

get mview to run

INNER JOIN featurepos FP ON F.feature_id = FP.feature_id currently this breaks it.

feature is as defiend as below: IE its the SNP feature. it should definitely have the featurepos record.

select F.name from feature F inner join feature_relationship FR on FR.subject_id = F.feature_id AND f.type_id = (SELECT cvterm_id from cvterm WHERE name = 'biological_region' AND cv_id = (SELECT cv_id FROM cv WHERE name = 'sequence'))

  AND

    FR.type_id = (SELECT cvterm_id  FROM cvterm WHERE cvterm.name = 'instance_of' AND
    cv_id = (SELECT cv_id FROM cv WHERE name = 'OBO_REL'))

 INNER JOIN feature F2 ON FR.object_id = F2.feature_id AND
 	FR.type_id = (SELECT cvterm_id FROM cvterm WHERE name = 'instance_of' AND cv_id = (SELECT cv_id FROM cv WHERE name = 'OBO_REL'))

schema

https://academic.oup.com/database/article/doi/10.1093/database/baw010/2630160

Locus names of molecular markers that have been mapped to genetic maps are stored in the feature table with type_id as the SO term ‘marker_locus’ and the relationship between the marker is stored in the feature_relationship table with type_id as the RO term ‘instance_of’. Locus names are usually the same as marker names but when the same marker is mapped to more than one position in the same genetic map, distinct locus names are associated with each position in the map. A feature of SO term ‘marker_locus’ is therefore associated with the genetic map position (Figure 2). Linkage groups stored in the feature table with SO term ‘linkage_group’ as the type_id. Genetic maps are stored in the featuremap table (Table 2). The associated genetic map, linkage group and locus (marker_locus, QTL or bin) are stored in the featurepos table (Table 3). We store map positions (cM) in the featureposprop table, not in the featurepos.mappos field, since most QTL have three associated positions, start, stop and peak (Table 3). The relationship between the locus and bin is stored in feature_relationship table using 'located_in' as the type_id. The mapping population is stored in the stock table and linked to the featuremap table via the featuremap_stock (Figure 2).

screen shot 2018-04-19 at 2 16 40 pm

Dealing with cmap file columns not being exactly 12 columns

I get the following error when loading the chestnut cmap file:

Improper number of columns on line 0.
  This module expects a 12 column file.
WD tripal_job: Improper number of columns on line 0.                 [error]
  This module expects a 12 column file.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] Improper number of columns on line 0.  This module expects a 12 column file.
Job execution failed: SQLSTATE[22001]: String data, right truncated: [error]
7 ERROR:  value too long for type character varying(32)

File: chestnut map - Sheet1.txt

Note you will need to change the extension (github only allows this ext)

what to do about qtl mview?

we dont currently have, or support, or have data to guess at, for QTL markers. I think for now we just dont worry about it....

markerpositiontype prop?

marker_position_type is loaded as "start".

However the other featureposprops are

3.5     start position
1.1     start position
0       start position
50.9    stop position
50.8    stop position

does this really make sense? Do we still use the marker_position_type = start prop? Or is it a remnant of our previous way of doing things?

unused cmap fields?

map start/stop: should we add it as a prop on the mapping feature ie chromosome?

anchor: what would we do with this?

synonyms: I think feature supports these somehow...

cmap veiwer field

This file should, for now, just accept a PATH to the map in cmap format.

baseline map

Tripal 3 includes a base map bundle.

It inserts into featuremap: it has a name, a description field, and a unittype cvterm.

ADD featuremaptype instructions to README

TripalMap needs a map_type for the mview. i added a prop attaching to featuremap, and a warning if you tyr to run loader on a featuremap without it, but for now the chado prop is added manually (and set to required manually).

We need instructions, or, we need to attach the prop field and set it to required programmaticaly.

proposal: dont use featurerange at all

notes from talking:

case: Feature has single location:

single positin goes into featurepos, with no featureposprop

case: feature has start/stop:

single feature created. 2 entries into featurepos, with featureposprop for start/end

case: feature has start/stop/peak ie QTL:

single feature created. 3 entries into featurepos, with featureposprop for start/end/peak.

Questions

  • QTL data doesnt seem like it can fit into cmap format. where would the peak go? we just have start/stop.

featurerange vs featurepos with props

A) why would i ever use feature range? If I have a single start/stop, why not just use featurepos, with start/stop props?

B) why does featurerange point to features instead of positions? It means you'd have to create a start feature, sto pfeature, which are part of the parent feature which is the actual marker you want. the featurepos solution seems so much better.

mview for cmap compatibility guide

need to create and post aguide for using this module with tripalmap.

pretty much just below, but add some screenshots and remove the php from the mview sql .

guide

In order to use tripal_map with our cmap loaded data, we need to replace the materialized view used to populate the tripal_map_genetic_markers mview.

To do so, go to admin->data storage -> chado -> mviews.

replace the sql code with the below mview.

    $sql = "
  SELECT F.uniquename as marker_locus_name, F.feature_id as marker_locus_id, F2.uniquename as genetic_marker_name,
  C1.name as map_unit_type, C2.name as marker_type, FM.name as map_name, FM.featuremap_id as map_id, FMP.value as map_type,
  F3.name as linkage_group, F3.feature_id as linkage_group_id, FP.mappos as marker_pos, FPP.value as marker_pos_type,
  O.organism_id as organism_id, O.genus as genus, O.species as species, O.common_name as common_name
  FROM {feature} F
  INNER JOIN {feature_relationship} FR 	ON FR.subject_id = F.feature_id AND
    F.type_id = (SELECT cvterm_id  FROM {cvterm} WHERE name = 'biological_region' AND
    cv_id = (SELECT cv_id FROM {cv} WHERE name = 'sequence'))
     AND
    FR.type_id = (SELECT cvterm_id  FROM {cvterm} WHERE {cvterm}.name = 'instance_of' AND
    cv_id = (SELECT cv_id FROM {cv} WHERE name = 'OBO_REL'))

  INNER JOIN {feature} F2               	ON FR.object_id = F2.feature_id 
     AND
    FR.type_id = (SELECT cvterm_id FROM {cvterm} WHERE name = 'instance_of' AND
    cv_id = (SELECT cv_id FROM {cv} WHERE name = 'OBO_REL'))
    
  INNER JOIN {featurepos} FP            	ON F2.feature_id = FP.feature_id
  
  INNER JOIN {featuremap} FM    		ON FM.featuremap_id = FP.featuremap_id
  INNER JOIN {cvterm} C1                ON C1.cvterm_id = FM.unittype_id
  	INNER JOIN {cvterm} C2 ON C2.cvterm_id = F2.type_id
  INNER JOIN {featuremapprop} FMP       ON FMP.featuremap_id = FP.featuremap_id AND
   FMP.type_id = (SELECT cvterm_id FROM {cvterm} WHERE name = 'featuremap_type' AND
   cv_id = (SELECT cv_id FROM {cv} WHERE name = 'local'))
  INNER JOIN {featuremap_organism} FMO 	ON FMO.featuremap_id = FM.featuremap_id
  INNER JOIN {feature} F3 				ON FP.map_feature_id = F3.feature_id
  INNER JOIN {featureposprop} FPP 		ON FPP.featurepos_id = FP.featurepos_id
  INNER JOIN {cvterm} C 				ON C.cvterm_id = FPP.type_id
  INNER JOIN {organism} O 				ON FMO.organism_id = O.organism_id
  ";  }

support for feature range

right now we only support featurepos.

the featurerange function needs to be written for when start != stop

map type not used or validated in form

oops. an arbitrary number is passed in the run method. Let's fix that.

$map_type_id = 100; //cvterm for the map type. IE.... chromosome? Linkage group? let's give this variable a better name.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.