Giter Site home page Giter Site logo

interminer's Introduction

Master: Build Status: master Dev: Build Status: dev

InterMine-R

R package for accessing InterMine instances

Installation

InterMineR has been added to Bioconductor.

To install this package, start R and enter:

## try http:// if https:// URLs are not supported

if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager")

BiocManager::install("InterMineR")

In case installation for RCurl fails showing the errorinstallation of package ‘RCurl’ had non-zero exit status, install libxml2-dev, libcurl4-openssl-dev and aptitude on you system using the following commands :

sudo apt-get install aptitude

sudo apt-get install libcurl4-openssl-dev

sudo apt-get install libxml2-dev

Usage

See HTML vignettes for detailed API and tutorials:

  • vignettes/InterMineR.Rmd
  • vignettes/Enrichment_Analysis_and_Visualization.Rmd
  • vignettes/FlyMine_Genomic_Visualizations.Rmd

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

Credits

Bing Wang:

  • InterMineR package creation (first edition)
  • Established the first query system (queries as list objects)
  • Vignette creation (first edition)

Konstantinos Kyritsis:

  • InterMineR package update to:
    1. Operate with the InterMine web services.
    2. Comply with the standards of Bioconductor submission.
    3. Retrieve the available Mines from the InterMine registry.
  • Established the second query system (queries as InterMineR-class objects)
  • Addition of enrichment analysis functionality and convertion of the results to GeneAnswer-class objects
  • Addition of functionality for converting InterMineR-retrieved data to GRanges-class and RangedSummarizedExperiment-class objects
  • Additional functions:
    1. simplifyResult function for flattened results display.
    2. listDatasets and getDatasets to retrieve information about the available datasets in each Mine.
  • Vignette update (second edition)
  • Addition of tutorials for Enrichment Analysis and Visualization and FlyMine Genomic Visualizations with InterMineR

Celia Sanchez Laorden

  • Created the listManager class for dealing with lists of objects within InterMineR
  • Created methods get_list, delete_listand create_list
  • Created functions intersect, union, difference and subtract.
  • Created auxiliary methods: GET_api_list which returns the response object of the Request, get_unused_list_name which checks if the name given by the user has been already used and, in such a case, provides a new one, and do_operation which creates a new list results of an operation.

License

LGPL. See LICENSE for details.

Stickers

Sticker templates are available from BiocStickers - or find us at a conference to pick some up in person.

interminer's People

Contributors

ashishpriyadarshicic avatar celions avatar hpages avatar kostaskyritsis avatar link-ny avatar nturaga avatar rachellyne avatar sowla avatar vobencha avatar yochannah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interminer's Issues

doEnirchment error after upgrade of R and biocondcutor

Running the example in the tutorial (https://rdrr.io/bioc/InterMineR/f/vignettes/Enrichment_Analysis_and_Visualization.Rmd) does not work any more after upgrading bioconductor to 3.13. It throws out the following error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘doEnrichment’ for signature ‘"missing"’

Could be some dependency issues but could not find much information on this message.

Enhancement request:

One another idea that I had, a feature related to not finding hits, that could be useful: in the InterMineR library, the function runQuery() returns only the matches. However the website throws a message with the input names that could not be mapped, which is useful for curating raw list of say protein names, to eg. get the ones that are human only, etc. This can be done manually with the object returned by runQuery(), but a parameter like "return.no.matches" that gives you the non-mapped names would be useful to have.

orderBy generates too many spaces in the newQuery method

This R code

# Create a new query
Pax6List = newQuery(
  #here we're choosing which columns of data we'd like to see
  view = c("Gene.primaryIdentifier",
           "Gene.symbol"),
  sortOrder = list("Gene.primaryIdentifier", "ASC"))

# Make a constraint: 
listConstraint = setConstraints(
  paths = c("Gene"),
  operators = c("IN"),
  values = list("PL_Pax6_Targets")
)

# Add the constraint to the query
Pax6List$where <- listConstraint

generates this XML:

<query name="" model="genomic" view="Gene.primaryIdentifier Gene.symbol" longDescription="" sortOrder=" Gene.primaryIdentifier  ASC">
  <constraint path="Gene" value="PL_Pax6_Targets" code="A" op="IN" extraValue=""/>
</query>

Note that there are too many spaces in sortOrder. The space space BEFORE Gene.primaryIdentifier is ignored, but there is also a double space between Gene.primaryIdentifier and ASC. This causes the InterMine server to throw a 400 error when it is run.

Docs are PDFs

The documentation are two comprehensive PDFs.

Should be online. Maybe on our read-the-docs site?

Error in curl

When I put the query code into a function, it does not go through. But when I run it plainly, it works.

It through this error;
Error in curl :: curl_fetch_memory(URL, handle = handle) :
Timeout was reached: Connection timed out after 10000 miliseconds

Or,

Failed to connect to registry. Intermine.org port 80: Timed out

I could blame the internet connection for this but it doesn't do that if I run it outside the function I wrote.

newQuery and setQuery have inconsistent argument names

See table for comparison of arguments for these two methods - I think the library is young enough we could get away with modifying it to be consistent.

newQuery setQuery
view select
sortOrder orderBy
(not implemented - see #5 ) where
name name
longDescription description
constraintLogic (?? possibly missing )

convertToGeneAnswers should accept list names

convertToGeneAnswers constitutes a wrapper function for converting the results of the doEnrichment function to a GeneAnswers-class object.

To run this function, one must pass a data.frame containing the gene identifiers that were enriched on as the geneInput argument.

Given that enrichment can be run using either a list of ids or an InterMine list name, it would make sense if convertToGeneAnswers would accept a list name for geneInput as well - otherwise we have to run a query to fetch the list contents client side, even though we didn't need to do this for the enrichment in the previous step.

constructing a new query

I am using intermineR to query the TargetMine. Following the instructions in the intermineR.pdf and manual page I should be able to get a query template using getTemplateQuery or construct a new query using newQuery, in either cases the documentations say there should be at least three items in each query. These are view, constrains and constrainLogic.

Using the function names on the output object from getTemplateQuery gives:
"model" "title" "description" "select" "name" "comment" "tags" "orderBy" "where"

Using the function args on the function newQuery gives:
function (name = "", view = character(), sortOrder = "", longDescription = "", constraintLogic = NULL)

I tried modifying the slot where of the query object, as explained for the item constrain however as a list not as a matrix and it works for a single value only.

There might be some changes in the query scheme that I am not aware of or doesn't show in the documentation yet.

I am using R (3.3.2 (2016-10-31) on x86_64-apple-darwin13.4.0 (64-bit) running on macOS Sierra 10.12.2. Both intermineR and bioconductor are the last releases; InterMineR_0.99.4 Biobase_2.34.0.

Bad Request (HTTP 400) when runQuery

Hi,
I wrote the following function to retrieve gene information from a list of genes for Arabidopsis thaliana.

getGeneNames <- function(gene_ids){
  thaliana = initInterMine(listMines()["ThaleMine"])
  # gene_ids <- list(rownames(all_transcript_0))
  # build constraints
  constraints = InterMineR::setConstraints(
    paths = c("Gene"),
    operators = "LOOKUP",
    values = list(gene_ids)
  )
  
  print("constraints built")
  
  # define new query
  queryGeneIds = InterMineR::setQuery(
    select = c(
      "Gene.primaryIdentifier",
      "Gene.symbol",
      "Gene.briefDescription",
      "Gene.chromosome.primaryIdentifier"
      # "Gene.geneRifs.annotation"
      # "Gene.chromosomeLocation.start",
      # "Gene.chromosomeLocation.end",
      # "Gene.chromosomeLocation.strand"
    ),
    where = constraints
  ) 

  # run query and store results
  print("running query, please wait...")
  gene_info = InterMineR::runQuery(im = thaliana, qry = queryGeneIds)
  
  # rename columns
  colnames(gene_info) <- c("meta.gene_id", "gene_name", "gene_description", "chromosome")
  return(gene_info)
}

For some reason it stopped working few months ago, throwing the error:
Error in InterMineR::runQuery(im = thaliana, qry = queryGeneIds) :
Bad Request (HTTP 400).

sessionInfo:
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.1
nickname Action of the Toes

[5] InterMineR_1.6.1

Can anyone help me figuring out what is going on ? Am i missing something ?

Citation bug

Citation is not working. Info on Bioconductor page

Citation (from within R, enter citation("InterMineR")):
Important note to the maintainer of the InterMineR package: An error occured while trying to generate the citation from the CITATION file. This typically occurs when the file contains R code that relies on the package to be installed e.g. it contains calls to things like packageVersion() or packageDate() instead of using meta$Version or meta$Date. See R documentation for more information.

Error given:
citation("InterMineR")

Error in tools:::.parse_CITATION_file(file, meta$Encoding) : 
  non-ASCII input in a CITATION file without a declared encoding

Enrichment vignette: change from pie-charting p-val to pie chart of -log(pval)

https://bioconductor.org/packages/release/bioc/vignettes/InterMineR/inst/doc/Enrichment_Analysis_and_Visualization.html#piechart-and-barplot-graphs

Pies and bar charts are pretty, but given the fact that a smaller P value is better, it takes some figuring out to understand what's going on - the smaller the pie segment / the smaller the bar, the more significant it is.

One option could be to subtract all the values from 1 so smaller values get bigger slices, but that would also require some explaining. I think it might make more sense just to remove these two visualisations.

Changing population in do.Enrichment

Hello,

It seems that the population argument in the doEnrichment function only accepts names of public lists at the moment (giving it a private list with API token/key from MyMine [passed to im argument] gives a "Bad Request (HTTP 400)" error). Since I can't seem to be able to avoid this error, is there a way to insert gene ids, rather than a list, as the population/background for this function? There's an equivalent argument ("ids") for the list of genes to investigate, but no way to pass the background genes to the population argument without putting them in a list first.

I am using R version 3.4.3 and InterMineR Version: 1.0.0.

travis failing

Why is it failing?

R for Travis-CI is not officially supported, but is community maintained.

Should we even bother?

Pass token to enrichment web service

as far as I can tell, it's impossible to enrich private lists with InterMineR. Code like this results in an error 400 You do not have access to a bag named listToEnrich

# okay, to save list on the intermine servce I'll need a token (in this case from humanmine)
# to get your own token log into the web interface (e.g. humanmine.org) and go to the
# myMine tab, then click the Account Details sub-tab, and copy and paste (or generate)
# your token into this script
myToken <- "tokenHerePlease"; #replace this with your own token

# load required libraries
library(InterMineR)
library(httr)

# This is a function to save lists on the intermine server. we'll use it later on in this script
saveIMList = function(
  # a nice intermine link please, e.g. http://www.humanmine.org/humanmine
  anInterMine, 
  #The name you'd like this to be saved on the server as... 
  listName= NULL,
  # The type of object you're saving, e.g. Gene or Protein or something else. 
  # This must match the name of the class in the InterMine model, e.g. "Gene" not "gene" or "genes"
  listType= "Gene",
  # The gene ids (or other entity, e/g/ protein ids) to save
  listContents = NULL,
  # your token. it can't work without this!
  token = NULL) {
  
  # We need to pass the list of ids as a single comma (or tab, or newline) separated string
  listContentsString <- paste0(
    listContents, 
    collapse=",")
  
  # The URL we call to save the list, with the name of the list we want to save
  # and the type of objects embedded in the URL as parameters. 
  requestUrl <- paste0(anInterMine,
                       "/service/lists?name=", listName,
                       "&type=", listType)
  
  # InterMineR doesn't have a built in method to save lists on the InterMine server, but we can call the API directly
  # Load the httr library to make http requests
  response <- POST(
    # the URL we're making an API call against
    url=requestUrl, 
    # tell them which IDs you want saved, as a single string
    body = listContentsString,
    # prove you're you with a token
    add_headers(Authorization = paste0("Token ", token)),
    encode = "multipart", 
    # clear errors please
    verbose(),
    # required, will return 500 error without the content type
    content_type("text/plain;charset=UTF-8"))
}

# querying against my chosen InterMine
myMine <- listMines()["HumanMine"]
im <- initInterMine(mine=myMine, token=myToken)

# Let's make a list of ids I want to enrich
# These are human gene identifiers. 
myGenes <- c(2566,
             57094,
             6323,
             6324,
             6335,
             84059)

# currently enrichment results MUST have a saved list as a background population,
# and the background population MUST have all of the same genes as in the list we're enriching.
# let's make a saved, named HumanMine list with our ids.
# note that this is the same list as above with a few extra ids for demo purposes
# it probably doesn't make much biological sense
backgroundPopulation <- list(2566,
                          57094,
                          6323,
                          6324,
                          6335,
                          84059, 
                          5468, 
                          3983, 
                          10257, 
                          105, 
                          29929,
                          10019,
                          10456,
                          10459,
                          1050,
                          1890 )

#save the BackgroundPopulation list
backgroundPopulationList <- saveIMList(
  myMine, 
  listName= "myBackgroundPopulation",
  listType= "Gene",
  listContents = backgroundPopulation,
  token = myToken) 

#save the list we want to be enriched
listToEnrich <- saveIMList(
  myMine, 
  listName= "listToEnrich",
  listType= "Gene",
  listContents = myGenes,
  token = myToken) 


#enrichment result with saved list name and background population as a saved list
enrichmentResult <- doEnrichment(
  im = im,
  genelist = "listToEnrich",
  widget = "publication_enrichment",
  population = "myBackgroundPopulation")

And looking through https://github.com/intermine/InterMineR/blob/master/R/doEnrichment.R I can't see anywhere where the token is passed through. I also tried intercepting the traffic with a network sniffer so I'm pretty sure the token isn't passed through. This is probably one of the causes of #42

Content summary

Summarise what is in the mine. How to do this?

  1. counts of each data type
  2. data sets and related counts

Maybe there is a better way?

setConstraints should support constraintLogic and codes

An important part of InterMine querying is the ability to set OR logic for constraints, as well as the default AND logic.

This code throws an error, because neither code nor constraintLogic are permissible arguments in setConstraints:

pancreasConstraint2 = setConstraints(
  paths = c("Gene", "Gene.proteinAtlasExpression.level", "Gene.proteinAtlasExpression.level", "Gene.proteinAtlasExpression.tissue.name"),
  operators = c("IN", rep("=", 2), "="),
  values = list("PL_DiabetesGenes", "Medium", "High", "Pancreas"),
  code = c("A", "B", "C", "D"),
  constraintLogic = "A and (B or C) and D"
)

remove the invalid (but needed) code and constraintLogic arguments and it all works fine.

Improvement: Consider adding a less verbose method for a single constraint

Consider this code - when there is only one constraint, wrapping each element in datastructures seems unnecessarily verbose.

 setConstraints(
    paths = c("Gene"),
    operators = c("IN"),
    values = list("PL_Pax6_Targets")
  )

Suggestion
It would be nice to either overridethe method signature for setConstraints to allow something like this

 setConstraints(
    paths = "Gene",
    operators = "IN",
    values = "PL_Pax6_Targets"
  )

or alternatively, if this isn't something that can be done in R, maybe just adding the method setConstraint (note no s at the end) would achieve the same thing.

This is low priority but would probably be easy to implement while fixing more important constraint issues such as #58

Add support for ONE OF constraints

ONE OF constraints aren't handled correctly by InterMineR and convert to invalid query XML.

pancreasConstraint1 = setConstraints(
  paths = c("Gene", "Gene.proteinAtlasExpression.level", "Gene.proteinAtlasExpression.tissue.name"),
  operators = c("IN", "ONE OF", "="),
  values = list("PL_DiabetesGenes", c("Medium", "High"), "Pancreas")
)

This constraint will return a 400 when run, because it's creating the following XML:

<query name="" model="genomic" view="Gene.primaryIdentifier Gene.symbol Gene.proteinAtlasExpression.cellType Gene.proteinAtlasExpression.level Gene.proteinAtlasExpression.tissue.name" longDescription="" sortOrder="Gene.primaryIdentifier ASC">
  <constraint path="Gene" value="PL_DiabetesGenes" code="A" op="IN" extraValue=""/>
  <constraint path="Gene.proteinAtlasExpression.level" value="Medium" code="B" op="ONE OF" extraValue=""/>
  <constraint path="Gene.proteinAtlasExpression.tissue.name" value="Pancreas" code="C" op="=" extraValue=""/>
</query> 

But it should be making something more like:

<query name="" model="genomic" view="Gene.primaryIdentifier Gene.symbol Gene.proteinAtlasExpression.cellType Gene.proteinAtlasExpression.level Gene.proteinAtlasExpression.tissue.name" longDescription="" sortOrder="Gene.primaryIdentifier ASC">
  <constraint path="Gene" value="PL_DiabetesGenes" code="A" op="IN" extraValue=""/>
  <constraint path="Gene.proteinAtlasExpression.level" op="ONE OF" code="B" >
    <value>Medium</value>
    <value>High</value>
  </constraint>
  <constraint path="Gene.proteinAtlasExpression.tissue.name" value="Pancreas" code="C" op="=" extraValue=""/>
</query> 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.