yulab-smu / createkeggdb Goto Github PK

View Code? Open in Web Editor NEW

57.0 6.0 19.0 60 KB

Create KEGG.db Package

R 92.40% Makefile 7.60%

createkeggdb's Introduction

Create KEGG.db Package

Query online KEGG annotation to generate KEGG.db package that can be used by clusterProfiler and other packages.

✍️ Authors

Guangchuang YU and Ziru Chen

⏬ Installation

## install.packages("remotes")
remotes::install_github("YuLab-SMU/createKEGGdb")

⚙️ Workflow

📖 Documents

createkeggdb's People

Contributors

Stargazers

Watchers

Forkers

chenziru watsonwoo duanxiaoqian huangliang0828 xiaoqiwang19 lozybean hzongyao siyangming yhj-j moo-cow donjae-wang asasasdasfasfasfasfasf aaa7260 shengqh denghb001 huerqiang yinyinghao wwww6662003 baby-233

createkeggdb's Issues

unused argument when create_kegg_db

How to create the KEGG-db of all microbia

你好！

请问如何构建所有微生物（所有细菌、病毒、真菌、古菌）的数据库呢？

no description on enrichKEGG result

Hi ,

It is a great package for us. Thanks!. When I run the fucthion enrichKEGG , after made the KEGG.db with createKEGGdb. I can get the result. But I found there is "NA" in the columns "Description".

Get NA Description

Y叔好，

我用KEGG数据本地化，再也不用担心网络问题了提供的代码安装了KEGG.db，运行示例代码的时候发现结果的Description列是NA，我注意到clusterProfiler更新了KEGG的API，是不是KEGG API改变导致的这个问题呢？

代码和结果如下：

# 本地化
remotes::install_github("YuLab-SMU/createKEGGdb")
createKEGGdb::create_kegg_db("hsa")
install.packages("./KEGG.db_1.0.tar.gz",repos=NULL)

# 使用
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

kk <- clusterProfiler::enrichKEGG(gene = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05,
                 qvalueCutoff = 0.05,
                 use_internal_data =T)
kk
#
# over-representation test
#
#...@organism 	 hsa 
#...@ontology 	 KEGG 
#...@keytype 	 kegg 
#...@gene 	 chr [1:207] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" "9493" "1062" "3868" "4605" "9833" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...9 enriched terms found
'data.frame':	9 obs. of  9 variables:
 $ ID         : chr  "hsa04110" "hsa04114" "hsa04218" "hsa04061" ...
 $ Description: chr  NA NA NA NA ...
 $ GeneRatio  : chr  "11/94" "10/94" "10/94" "8/94" ...
 $ BgRatio    : chr  "127/8275" "131/8275" "156/8275" "100/8275" ...
 $ pvalue     : num  1.69e-07 2.05e-06 9.88e-06 1.62e-05 2.06e-05 ...
 $ p.adjust   : num  3.53e-05 2.14e-04 6.88e-04 8.48e-04 8.62e-04 ...
 $ qvalue     : num  3.45e-05 2.09e-04 6.72e-04 8.28e-04 8.42e-04 ...
 $ geneID     : chr  "8318/991/9133/890/983/4085/7272/1111/891/4174/9232" "991/9133/983/4085/51806/6790/891/9232/3708/5241" "2305/4605/9133/890/983/51806/1111/891/776/3708" "3627/10563/6373/4283/6362/6355/9547/1524" ...
 $ Count      : int  11 10 10 8 7 7 5 8 10
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287

Something wrong with get_path2name Function

Something wrong with download KEGG dataset. Here I correrted this part.

#options(clusterProfiler.download.method = "wget")
enrichKEGG(de,pvalueCutoff=0.01,use_internal_data = F)
--> No gene can be mapped....
--> Expected input gene ID:
--> return NULL...

Sometimes, I found the error from enrichKEGG can't work correct. So I choose to build the KEGG.db.
But......
createKEGGdb::create_kegg_db('hsa')
Error in clusterProfiler:::kegg_list("pathway", species) :
unused argument (species)

The argument "species" was unused. So I checked the cod and find somthing wrong in function "get_path2name"
Here we add line3 and change "species" as "new_species"

get_path2name <- function(species){
if (length(species) == 1) {
new_species=paste0("pathway/",species)
keggpathid2name.df <- clusterProfiler:::kegg_list(new_species)
} else {
keggpathid2name.list <- vector("list", length(species))
names(keggpathid2name.list) <- species
for (i in species) {
keggpathid2name.list[[i]] <- clusterProfiler:::kegg_list("pathway", i)
}
keggpathid2name.df <- do.call(rbind, keggpathid2name.list)
rownames(keggpathid2name.df) <- NULL
}
keggpathid2name.df[,2] <- sub("\s-\s[a-zA-Z ]+$\w+$$", "", keggpathid2name.df[,2])

keggpathid2name.df[,1] %<>% gsub("path:map", "", .)

colnames(keggpathid2name.df) <- c("path_id","path_name")
return(keggpathid2name.df)
}

createKEGGdb::create_kegg_db('hsa')
install.packages("./KEGG.db_1.0.tar.gz",repos=NULL,type="source")

ego_KEGG=enrichKEGG(gene=list$entrezgene,
organism = "hsa",
pvalueCutoff = 1,
qvalueCutoff=1,
minGSSize=1,
use_internal_data = T)
#Result-----------------------------------
ego_KEGG@result

           ID Description GeneRatio  BgRatio       pvalue     p.adjust       qvalue                                                    geneID

hsa05202 hsa05202 11/67 193/8292 3.366684e-07 6.093698e-05 4.961429e-05 1051/1649/3398/5966/4616/221037/1026/2120/5914/2521/51274
hsa04141 hsa04141 8/67 171/8292 6.426462e-05 5.815948e-03 4.735288e-03 3309/1649/7095/7184/9709/2923/468/5611
hsa03040 hsa03040 7/67 156/8292 2.460964e-04 1.233899e-02 1.004628e-02 10772/151903/6434/29896/25949/2521/6628

#To fix the NA value-----------------------------------
keggpathid2name.df <- clusterProfiler:::kegg_list("pathway/hsa")
ego_KEGG@result$Description<-strsplit(keggpathid2name.df$to[match(ego_KEGG@result$ID,keggpathid2name.df$from)],
split = " - Homo sapiens (human)",fixed = T)

This is the whole problem and solution method.

Compatibility of createKEGGdb with keyType option of clusterProfiler::enrichKEGG function

Hello,

Thanks for this useful package!

I have some questions on what exactly is stored in the resulting KEGG.db, and how that relates to the options of clusterProfiler::enrichKEGG.
enrichKEGG has an option keyType, which accepts kegg, ncbi-geneid, ncbi-proteinid or uniprot.

Background/context

I would like to have a solution for doing KEGG enrichment analysis, starting from gene SYMBOL. I want to be able to use the same solution from any arbitrary species.

From this reply YuLab-SMU/clusterProfiler#108 (comment)

KEGG id and ENTREZID are the same for only some of the species, but not always the same.

and this blog post https://guangchuangyu.github.io/2016/05/convert-biological-id-with-kegg-api-using-clusterprofiler/

A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.

I conclude that kegg id are not reliable enough/not sufficiently well described for my use. I would thus prefer to use ncbi-geneid.

However, when opening the sqlite database created through createKEGGdb, I only see a field gene_or_orf_id in table pathway2gene.

Questions:

what is the gene_or_orf_id present in the KEGG.db database? Is it a kegg id?
can I use createKEGGdb to create a KEGG.db package, and then use it for clusterProfiler::enrichKEGG with keyType = ncbi-geneid (and use_internal_data = TRUE)

Than you in advance for your help,
All the best

Error when using remotes::install_github("YuLab-SMU/createKEGGdb")

I want to try the method to get the latest information about E.coli.But when I type remotes::install_github("YuLab-SMU/createKEGGdb") on RStudio,the program reported an error.

remotes::install_github("YuLab-SMU/createKEGGdb")
Downloading GitHub repo YuLab-SMU/createKEGGdb@master
Skipping 1 packages not available: clusterProfiler
✓ checking for file 'C:\Users\yuwt8\AppData\Local\Temp\Rtmp63vtx0\remotes123865ac3bdb\YuLab-SMU-createKEGGdb-378e7cf/DESCRIPTION' (710ms)
─ preparing 'createKEGGdb':
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building 'createKEGGdb_0.0.2.tar.gz'

installing source package 'createKEGGdb' ...
** using staged installation
** R
Error : (converted from warning) unable to re-encode 'create_kegg_db.R' line 139
ERROR: unable to collate and parse R files for package 'createKEGGdb'
removing 'C:/Download/R/R-3.6.2/library/createKEGGdb'
Error: Failed to install 'createKEGGdb' from GitHub:
(converted from warning) installation of package ‘C:/Users/yuwt8/AppData/Local/Temp/Rtmp63vtx0/file123872f9683b/createKEGGdb_0.0.2.tar.gz’ had non-zero exit status

I wonder the reason and how to solve it.
Thank you!

Failed to install 'createKEGGdb' from GitHub

Hi,
I want to try the method to get the latest information about zea mays. But when I type "remotes::install_github("YuLab-SMU/createKEGGdb") " on RStudio, the program reported an error:
Downloading GitHub repo YuLab-SMU/createKEGGdb@master
Skipping 1 packages not available: clusterProfiler
错误: Failed to install 'createKEGGdb' from GitHub: setup stdio (system error 2, 系统找不到指定的文件。) @win/processx.c:970
I wonder the reason and how to solve it.
Thank you!

Pay attention: [https://rest.kegg.jp/list/all] this api dose not work now!

Dear authors,
please pay attention to this issue:

r$> clusterProfiler:::kegg_list("all")
Reading KEGG annotation online: "https://rest.kegg.jp/list/all"...
fail to download KEGG data...
NULL
Warning message:
In download.file(url, method = method, ...) :
  cannot open URL 'https://rest.kegg.jp/list/all': HTTP status was '400 Bad Request'

r$> clusterProfiler:::kegg_list()
Error in clusterProfiler:::kegg_list() : 
  argument "db" is missing, with no default

r$> clusterProfiler:::kegg_list
function (db, species = NULL) 
{
    if (db == "pathway") {
        url <- paste("https://rest.kegg.jp/list", db, species, 
            sep = "/")
    }
    else {
        url <- paste("https://rest.kegg.jp/list", db, sep = "/")
    }
    kegg_rest(url)
}
<bytecode: 0x560062430ac8>
<environment: namespace:clusterProfiler>

I have already installed the newest createKEGGdb and clusterProfiler

KEGG.db is not consistent with Online

I'd follow your package, but backgroud number of KEGG is not consistent with Online for mouse.
KEGG.db is 7650 but online is 8656.

yulab-smu / createkeggdb Goto Github PK

createkeggdb's Introduction

Create KEGG.db Package

✍️ Authors

⏬ Installation

⚙️ Workflow

📖 Documents

createkeggdb's People

Contributors

Stargazers

Watchers

Forkers

createkeggdb's Issues

keggpathid2name.df[,1] %<>% gsub("path:map", "", .)

Recommend Projects

Recommend Topics

Recommend Org