Giter Site home page Giter Site logo

annotables's People

Contributors

aaronwolen avatar khughitt avatar mdozmorov avatar mjsteinbaugh avatar stephenturner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

annotables's Issues

entrez id in grc37

Hi,
thanks for the useful package.

The entrez gene ids are missing in most cases

length(is.na(grch38$entrez)) [1] 66531

length(is.na(grch38$entrez)) [1] 67416

rename in code to add grch37

Thank you for creating this package.
I tried to add grch37 and failed.
When I run the code from your README.Rmd:

fix_genes <- . %>%
tbl_df %>%
distinct %>%
rename(ensgene=ensembl_gene_id,
entrez=entrezgene,
symbol=external_gene_name,
chr=chromosome_name,
start=start_position,
end=end_position,
biotype=gene_biotype)

myattributes <- c("ensembl_gene_id",
"entrezgene",
"external_gene_name",
"chromosome_name",
"start_position",
"end_position",
"strand",
"gene_biotype",
"description")

and adding grch37 following your code, I get:
Error in rename(., ensgene = ensembl_gene_id, entrez = entrezgene, symbol = external_gene_name, :
object 'ensembl_gene_id' not found

By just removing the rename function and last pipe, everything seems to work.
I am quite new to bioinformatics, R and github. I hope 'Issues' is the right place to ask my question.

annotables installation

hi~
I used the R (3.5.0) to install annotables packages. But I got some Errors.

Commands:

install.packages("devtools")
devtools::install_github("stephenturner/annotables")

Errors:

devtools::install_github("stephenturner/annotables")
Error in curl::new_handle() : An unknown option was passed in to libcurl

Could you give a favour. How to manage this error. I have done my best to do it.
Thank you!

Versions

Can you make tables with ensembl versions (on genes & transcripts)?

update readme

need to update documentation with changes made in #6 by @aaronwolen to document automated creation of new datasets based on YAML files

Installation problems

Hello, I am rather new to R and am not able to install the annotables package. Here is what I get:

` install.packages("devtools")
Error in install.packages : Updating loaded packages

Restarting R session...

During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"

Use devtools to install the package

devtools::install_github("stephenturner/annotables")
Downloading GitHub repo stephenturner/annotables@master
tar: Failed to set default locale
tar: Failed to set default locale
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_TIME failed, using "C"
3: Setting LC_MESSAGES failed, using "C"
4: Setting LC_MONETARY failed, using "C"
v checking for file '/private/var/folders/0m/_xp6xtk96v3c7nztw552_zzh0000gn/T/RtmphObqL7/remotes107e2dec0e0d/stephenturner-annotables-805a247/DESCRIPTION' ...

  • preparing 'annotables':
    v checking DESCRIPTION meta-information ...
  • checking for LF line-endings in source and make files and shell scripts
  • checking for empty or unneeded directories
  • looking to see if a 'data/datalist' file should be added
  • building 'annotables_0.1.91.tar.gz' (2.5s)

Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package '/var/folders/0m/_xp6xtk96v3c7nztw552_zzh0000gn/T//RtmphObqL7/file107e1486950e/annotables_0.1.91.tar.gz' had non-zero exit status

library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

library(annotables)
Error in library(annotables) : there is no package called 'annotables'`

I also tried other installing methods but in the end there was never a package called 'annotables'.
Thank you for your help!

Entries with same ENSG

I suspect mistakes in gene symbols.

I was making the wrong assumption that there is unique correspondence between the rows in grch38/grch37 and the different ENSGs. It turned out there there are ensgene repetitions:

> sum(data.frame(table(grch38$ensgene))$Freq > 1)                                 
[1] 361

I checked several of these 361 duplicating genes and it seems that the entrez gid's are the only difference:

> grch38[ grch38$ensgene == "ENSG00000198668",  ]
# A tibble: 3 x 9
  ensgene         entrez symbol chr     start    end strand biotype description
  <chr>            <int> <chr>  <chr>   <int>  <int>  <int> <chr>   <chr>      
1 ENSG00000198668    801 CALM1  14     9.04e7 9.04e7      1 protei… calmodulin…
2 ENSG00000198668    805 CALM1  14     9.04e7 9.04e7      1 protei… calmodulin…
3 ENSG00000198668    808 CALM1  14     9.04e7 9.04e7      1 protei… calmodulin…

I further looked at the NCBI webside for the different entrez gid's and they point to different genes CALM genes (not only CALM1).

CALM1: https://www.ncbi.nlm.nih.gov/gene/?term=801
CALM2: https://www.ncbi.nlm.nih.gov/gene/?term=805
CALM3: https://www.ncbi.nlm.nih.gov/gene/?term=808

Version:

> packageVersion("annotables")
[1] ‘0.1.91’

better version/build management

with the changes in #6 it's much easier to recreate annotation tables. the files are named e.g. galgal5, but which version/build is actually used depends on what's current in ensembl. e.g., when I first built this package, chicken was on galgal4. i had to manually update the filenames, and I probably did the wrong thing by just deleting (rather than deprecating) the old datasets. maybe that's okay since it's still versioned in a release. not sure how to best handle these issues.

Are grch37 and grch38 supposed to be the exact same object?

Hi, I am using the version 0.1.91 of the annotables package. I have noticed that the grch37 and grch38 objects are exactly the same.

library(annotables)

identical(grch37, grch38)

I am surprised that even the genomic positions are exactly the same being two different versions of the human genome. Am I being stupid and missing something?

pander usage

If you don't mind, I share some tips on how to improve the pander calls and make the code more readable and easier to maintain by using global options. First of all, there is no real need to call pandoc.table, it's totally fine to use the pander general S3 method on anything.

I've seen you do not want to split the tables. For this end, you can define a global option that will be used in all future pander (or pandoc.table) calls. Eg you can set it to 100 like you did, or even to disable this feature if needed:

panderOptions('table.split.table', Inf)

Similarly, you can set the table style as well:

panderOptions('table.style', 'rmarkdown')

And the cell alignment can be specified too via the table.alignment.default option, which can be a function as well to eg justify numbers to the left and everything else to the right etc. See eg this thread on SO: http://stackoverflow.com/a/27014481/564164

Please feel free to close this ticket, I just wanted to share these (hopefully) useful tricks :)

Update Question

If I want to update annotables for a build more recent than your git hub, is it simply a matter of cloning and building the annotables package? In other words, does building the package automatically go to the latest ensembl build? If yes, where do I change the code to reflect the current version so ensembl_version returns the correct value.

I did clone and build the package successfully, but ensembl_version still reports ensembl 91 and it would be work to compare to a known older version. I'm not so familiar with these data packages and I'm having trouble dissecting the package to find the source code that's hitting ensembl to answer the versioning question myself.

Thanks,
John Thompson

different organisms

Hi,

I find your package very useful, but I'm not very R savvy.
Is it possible to add new organisms to annotables?
I'm interested in Mmul10 (Macaca mulatta).
Thanks for your help.

Grcm38 gene annotations are Grcm39 based!

I loaded annotables (version 0.2.0) in R and found the coordinates of ensemble annotated genes in grcm38 are acutally the coordinates of ensemble annotated genes based on the updated genome assembly grcm39! Thanks for updating this!
grafik
grafik
grafik

version 90

Hi Stephen,

Hope everything is going well. Ensembl released version 90 last monthish, is there a plan to update the annotations? I'm not sure what your vision was involving keeping up. Thanks so much! It is surprising how useful it is to just have the annotations on hand and not have to re-look them up every time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.