stephenturner / annotables Goto Github PK

View Code? Open in Web Editor NEW

160.0 160.0 34.0 64.44 MB

R data package for annotating/converting Gene IDs

Home Page: http://www.gettinggeneticsdone.blogspot.com/2015/11/annotables-convert-gene-ids.html

R 100.00%

annotables's People

Contributors

Stargazers

Watchers

annotables's Issues

Update gene->transcript tables to match mikelove/tximport#13

thelovelab/tximport@8f31541
thelovelab/tximport#13
The grch38_gt data_table is useful, but could we maybe have a grch38_tg as well?

    tx  gene
1   ENST00000387314 ENSG00000210049
2   ENST00000389680 ENSG00000211459
3   ENST00000387342 ENSG00000210077
4   ENST00000387347 ENSG00000210082
5   ENST00000386347 ENSG00000209082
6   ENST00000361390 ENSG00000198888
``

entrez id in grc37

Hi,
thanks for the useful package.

The entrez gene ids are missing in most cases

length(is.na(grch38$entrez)) [1] 66531

length(is.na(grch38$entrez)) [1] 67416

rename in code to add grch37

Thank you for creating this package.
I tried to add grch37 and failed.
When I run the code from your README.Rmd:

fix_genes <- . %>%
tbl_df %>%
distinct %>%
rename(ensgene=ensembl_gene_id,
entrez=entrezgene,
symbol=external_gene_name,
chr=chromosome_name,
start=start_position,
end=end_position,
biotype=gene_biotype)

myattributes <- c("ensembl_gene_id",
"entrezgene",
"external_gene_name",
"chromosome_name",
"start_position",
"end_position",
"strand",
"gene_biotype",
"description")

and adding grch37 following your code, I get:
Error in rename(., ensgene = ensembl_gene_id, entrez = entrezgene, symbol = external_gene_name, :
object 'ensembl_gene_id' not found

By just removing the rename function and last pipe, everything seems to work.
I am quite new to bioinformatics, R and github. I hope 'Issues' is the right place to ask my question.

annotables installation

hi~
I used the R (3.5.0) to install annotables packages. But I got some Errors.

Commands:

install.packages("devtools")
devtools::install_github("stephenturner/annotables")

Errors:

devtools::install_github("stephenturner/annotables")
Error in curl::new_handle() : An unknown option was passed in to libcurl

Could you give a favour. How to manage this error. I have done my best to do it.
Thank you!

do something with MSigDB

See http://bioinf.wehi.edu.au/software/MSigDB/

This could be useful in all sorts of ways.

Versions

Can you make tables with ensembl versions (on genes & transcripts)?

update readme

need to update documentation with changes made in #6 by @aaronwolen to document automated creation of new datasets based on YAML files

Zebrafish annotation

Could you please add Zebrafish to the list?

@demis001

Installation problems

Hello, I am rather new to R and am not able to install the annotables package. Here is what I get:

` install.packages("devtools")
Error in install.packages : Updating loaded packages

Restarting R session...

During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"

Use devtools to install the package

devtools::install_github("stephenturner/annotables")
Downloading GitHub repo stephenturner/annotables@master
tar: Failed to set default locale
tar: Failed to set default locale
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_TIME failed, using "C"
3: Setting LC_MESSAGES failed, using "C"
4: Setting LC_MONETARY failed, using "C"
v checking for file '/private/var/folders/0m/_xp6xtk96v3c7nztw552_zzh0000gn/T/RtmphObqL7/remotes107e2dec0e0d/stephenturner-annotables-805a247/DESCRIPTION' ...

preparing 'annotables':
v checking DESCRIPTION meta-information ...
checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
looking to see if a 'data/datalist' file should be added
building 'annotables_0.1.91.tar.gz' (2.5s)

Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package '/var/folders/0m/_xp6xtk96v3c7nztw552_zzh0000gn/T//RtmphObqL7/file107e1486950e/annotables_0.1.91.tar.gz' had non-zero exit status

library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

library(annotables)
Error in library(annotables) : there is no package called 'annotables'`

I also tried other installing methods but in the end there was never a package called 'annotables'.
Thank you for your help!

Entries with same ENSG

I suspect mistakes in gene symbols.

I was making the wrong assumption that there is unique correspondence between the rows in grch38/grch37 and the different ENSGs. It turned out there there are ensgene repetitions:

> sum(data.frame(table(grch38$ensgene))$Freq > 1)                                 
[1] 361

I checked several of these 361 duplicating genes and it seems that the entrez gid's are the only difference:

> grch38[ grch38$ensgene == "ENSG00000198668",  ]
# A tibble: 3 x 9
  ensgene         entrez symbol chr     start    end strand biotype description
  <chr>            <int> <chr>  <chr>   <int>  <int>  <int> <chr>   <chr>      
1 ENSG00000198668    801 CALM1  14     9.04e7 9.04e7      1 protei… calmodulin…
2 ENSG00000198668    805 CALM1  14     9.04e7 9.04e7      1 protei… calmodulin…
3 ENSG00000198668    808 CALM1  14     9.04e7 9.04e7      1 protei… calmodulin…

I further looked at the NCBI webside for the different entrez gid's and they point to different genes CALM genes (not only CALM1).

CALM1: https://www.ncbi.nlm.nih.gov/gene/?term=801
CALM2: https://www.ncbi.nlm.nih.gov/gene/?term=805
CALM3: https://www.ncbi.nlm.nih.gov/gene/?term=808

Version:

> packageVersion("annotables")
[1] ‘0.1.91’

better version/build management

with the changes in #6 it's much easier to recreate annotation tables. the files are named e.g. galgal5, but which version/build is actually used depends on what's current in ensembl. e.g., when I first built this package, chicken was on galgal4. i had to manually update the filenames, and I probably did the wrong thing by just deleting (rather than deprecating) the old datasets. maybe that's okay since it's still versioned in a release. not sure how to best handle these issues.

Github repo webpage link redirects to weird webpage

Hi Steven,

Just one more thing that I noticed. When I click on the webpage link on this repo: www.gettinggeneticsdone.com/2015/11/annotables-convert-gene-ids.html.

It redirects me to this weird japanese webpage:
https://earthgekinka.com/creditcardgenkinka/jibundedekiru.html

Are grch37 and grch38 supposed to be the exact same object?

Hi, I am using the version 0.1.91 of the annotables package. I have noticed that the grch37 and grch38 objects are exactly the same.

library(annotables)

identical(grch37, grch38)

I am surprised that even the genomic positions are exactly the same being two different versions of the human genome. Am I being stupid and missing something?

pander usage

If you don't mind, I share some tips on how to improve the pander calls and make the code more readable and easier to maintain by using global options. First of all, there is no real need to call pandoc.table, it's totally fine to use the pander general S3 method on anything.

I've seen you do not want to split the tables. For this end, you can define a global option that will be used in all future pander (or pandoc.table) calls. Eg you can set it to 100 like you did, or even to disable this feature if needed:

panderOptions('table.split.table', Inf)

Similarly, you can set the table style as well:

panderOptions('table.style', 'rmarkdown')

And the cell alignment can be specified too via the table.alignment.default option, which can be a function as well to eg justify numbers to the left and everything else to the right etc. See eg this thread on SO: http://stackoverflow.com/a/27014481/564164

Please feel free to close this ticket, I just wanted to share these (hopefully) useful tricks :)

Update Question

If I want to update annotables for a build more recent than your git hub, is it simply a matter of cloning and building the annotables package? In other words, does building the package automatically go to the latest ensembl build? If yes, where do I change the code to reflect the current version so ensembl_version returns the correct value.

I did clone and build the package successfully, but ensembl_version still reports ensembl 91 and it would be work to compare to a known older version. I'm not so familiar with these data packages and I'm having trouble dissecting the package to find the source code that's hitting ensembl to answer the versioning question myself.

Thanks,
John Thompson

different organisms

Hi,

I find your package very useful, but I'm not very R savvy.
Is it possible to add new organisms to annotables?
I'm interested in Mmul10 (Macaca mulatta).
Thanks for your help.

Grcm38 gene annotations are Grcm39 based!

I loaded annotables (version 0.2.0) in R and found the coordinates of ensemble annotated genes in grcm38 are acutally the coordinates of ensemble annotated genes based on the updated genome assembly grcm39! Thanks for updating this!

version 90

Hi Stephen,

Hope everything is going well. Ensembl released version 90 last monthish, is there a plan to update the annotations? I'm not sure what your vision was involving keeping up. Thanks so much! It is surprising how useful it is to just have the annotations on hand and not have to re-look them up every time.

stephenturner / annotables Goto Github PK

annotables's People

Contributors

Stargazers

Watchers

Forkers

annotables's Issues

Use devtools to install the package

Recommend Projects

Recommend Topics

Recommend Org