Query online KEGG annotation to generate KEGG.db package that can be used by clusterProfiler and other packages.
Guangchuang YU and Ziru Chen
## install.packages("remotes")
remotes::install_github("YuLab-SMU/createKEGGdb")
Create KEGG.db Package
Query online KEGG annotation to generate KEGG.db package that can be used by clusterProfiler and other packages.
Guangchuang YU and Ziru Chen
## install.packages("remotes")
remotes::install_github("YuLab-SMU/createKEGGdb")
作者您好
我从http://rest.kegg.jp/list/organism
获取了所有所有物种名称后,使用createKEGGdb包进行全体物种的KEGG数据库构建
create_kegg_db(keggOrganism)
结果在下载到第475个物种时,发生错误如下:
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/vps"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/vcrb"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/vve"...
Error in content[, 1] : 下标出界
请问如何处理呢?
y叔:
您好!
我在Rstudio上运行遇到一个报错不知道如何解决呢。
> createKEGGdb::create_kegg_db('hsa')
Error in `colnames<-`(`*tmp*`, value = c("path_id", "path_name")) :
attempt to set 'colnames' on an object with less than two dimensions
谢谢。
你好!
请问如何构建所有微生物(所有细菌、病毒、真菌、古菌)的数据库呢?
Hi ,
It is a great package for us. Thanks!. When I run the fucthion enrichKEGG , after made the KEGG.db with createKEGGdb. I can get the result. But I found there is "NA" in the columns "Description".
Y叔好,
我用KEGG数据本地化,再也不用担心网络问题了提供的代码安装了KEGG.db
,运行示例代码的时候发现结果的Description列是NA,我注意到clusterProfiler更新了KEGG的API, 是不是KEGG API改变导致的这个问题呢?
代码和结果如下:
# 本地化
remotes::install_github("YuLab-SMU/createKEGGdb")
createKEGGdb::create_kegg_db("hsa")
install.packages("./KEGG.db_1.0.tar.gz",repos=NULL)
# 使用
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
kk <- clusterProfiler::enrichKEGG(gene = gene,
organism = 'hsa',
pvalueCutoff = 0.05,
qvalueCutoff = 0.05,
use_internal_data =T)
kk
#
# over-representation test
#
#...@organism hsa
#...@ontology KEGG
#...@keytype kegg
#...@gene chr [1:207] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" "9493" "1062" "3868" "4605" "9833" ...
#...pvalues adjusted by 'BH' with cutoff <0.05
#...9 enriched terms found
'data.frame': 9 obs. of 9 variables:
$ ID : chr "hsa04110" "hsa04114" "hsa04218" "hsa04061" ...
$ Description: chr NA NA NA NA ...
$ GeneRatio : chr "11/94" "10/94" "10/94" "8/94" ...
$ BgRatio : chr "127/8275" "131/8275" "156/8275" "100/8275" ...
$ pvalue : num 1.69e-07 2.05e-06 9.88e-06 1.62e-05 2.06e-05 ...
$ p.adjust : num 3.53e-05 2.14e-04 6.88e-04 8.48e-04 8.62e-04 ...
$ qvalue : num 3.45e-05 2.09e-04 6.72e-04 8.28e-04 8.42e-04 ...
$ geneID : chr "8318/991/9133/890/983/4085/7272/1111/891/4174/9232" "991/9133/983/4085/51806/6790/891/9232/3708/5241" "2305/4605/9133/890/983/51806/1111/891/776/3708" "3627/10563/6373/4283/6362/6355/9547/1524" ...
$ Count : int 11 10 10 8 7 7 5 8 10
#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology
2012, 16(5):284-287
Something wrong with download KEGG dataset. Here I correrted this part.
#options(clusterProfiler.download.method = "wget")
enrichKEGG(de,pvalueCutoff=0.01,use_internal_data = F)
--> No gene can be mapped....
--> Expected input gene ID:
--> return NULL...
Sometimes, I found the error from enrichKEGG can't work correct. So I choose to build the KEGG.db.
But......
createKEGGdb::create_kegg_db('hsa')
Error in clusterProfiler:::kegg_list("pathway", species) :
unused argument (species)
The argument "species" was unused. So I checked the cod and find somthing wrong in function "get_path2name"
Here we add line3 and change "species" as "new_species"
get_path2name <- function(species){
if (length(species) == 1) {
new_species=paste0("pathway/",species)
keggpathid2name.df <- clusterProfiler:::kegg_list(new_species)
} else {
keggpathid2name.list <- vector("list", length(species))
names(keggpathid2name.list) <- species
for (i in species) {
keggpathid2name.list[[i]] <- clusterProfiler:::kegg_list("pathway", i)
}
keggpathid2name.df <- do.call(rbind, keggpathid2name.list)
rownames(keggpathid2name.df) <- NULL
}
keggpathid2name.df[,2] <- sub("\s-\s[a-zA-Z ]+\(\w+\)$", "", keggpathid2name.df[,2])
colnames(keggpathid2name.df) <- c("path_id","path_name")
return(keggpathid2name.df)
}
createKEGGdb::create_kegg_db('hsa')
install.packages("./KEGG.db_1.0.tar.gz",repos=NULL,type="source")
ego_KEGG=enrichKEGG(gene=list$entrezgene,
organism = "hsa",
pvalueCutoff = 1,
qvalueCutoff=1,
minGSSize=1,
use_internal_data = T)
#Result-----------------------------------
ego_KEGG@result
ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID
hsa05202 hsa05202 11/67 193/8292 3.366684e-07 6.093698e-05 4.961429e-05 1051/1649/3398/5966/4616/221037/1026/2120/5914/2521/51274
hsa04141 hsa04141 8/67 171/8292 6.426462e-05 5.815948e-03 4.735288e-03 3309/1649/7095/7184/9709/2923/468/5611
hsa03040 hsa03040 7/67 156/8292 2.460964e-04 1.233899e-02 1.004628e-02 10772/151903/6434/29896/25949/2521/6628
#To fix the NA value-----------------------------------
keggpathid2name.df <- clusterProfiler:::kegg_list("pathway/hsa")
ego_KEGG@result$Description<-strsplit(keggpathid2name.df$to[match(ego_KEGG@result$ID,keggpathid2name.df$from)],
split = " - Homo sapiens (human)",fixed = T)
This is the whole problem and solution method.
Hello,
Thanks for this useful package!
I have some questions on what exactly is stored in the resulting KEGG.db
, and how that relates to the options of clusterProfiler::enrichKEGG
.
enrichKEGG
has an option keyType
, which accepts kegg
, ncbi-geneid
, ncbi-proteinid
or uniprot
.
Background/context
I would like to have a solution for doing KEGG enrichment analysis, starting from gene SYMBOL. I want to be able to use the same solution from any arbitrary species.
From this reply YuLab-SMU/clusterProfiler#108 (comment)
KEGG id and ENTREZID are the same for only some of the species, but not always the same.
and this blog post https://guangchuangyu.github.io/2016/05/convert-biological-id-with-kegg-api-using-clusterprofiler/
A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.
I conclude that kegg
id are not reliable enough/not sufficiently well described for my use. I would thus prefer to use ncbi-geneid
.
However, when opening the sqlite
database created through createKEGGdb
, I only see a field gene_or_orf_id
in table pathway2gene
.
Questions:
gene_or_orf_id
present in the KEGG.db
database? Is it a kegg
id?createKEGGdb
to create a KEGG.db
package, and then use it for clusterProfiler::enrichKEGG
with keyType = ncbi-geneid
(and use_internal_data = TRUE
)Than you in advance for your help,
All the best
I want to try the method to get the latest information about E.coli.But when I type remotes::install_github("YuLab-SMU/createKEGGdb") on RStudio,the program reported an error.
remotes::install_github("YuLab-SMU/createKEGGdb")
Downloading GitHub repo YuLab-SMU/createKEGGdb@master
Skipping 1 packages not available: clusterProfiler
✓ checking for file 'C:\Users\yuwt8\AppData\Local\Temp\Rtmp63vtx0\remotes123865ac3bdb\YuLab-SMU-createKEGGdb-378e7cf/DESCRIPTION' (710ms)
─ preparing 'createKEGGdb':
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building 'createKEGGdb_0.0.2.tar.gz'
I wonder the reason and how to solve it.
Thank you!
Hi,
I want to try the method to get the latest information about zea mays. But when I type "remotes::install_github("YuLab-SMU/createKEGGdb") " on RStudio, the program reported an error:
Downloading GitHub repo YuLab-SMU/createKEGGdb@master
Skipping 1 packages not available: clusterProfiler
错误: Failed to install 'createKEGGdb' from GitHub: setup stdio (system error 2, 系统找不到指定的文件。) @win/processx.c:970
I wonder the reason and how to solve it.
Thank you!
Dear authors,
please pay attention to this issue:
r$> clusterProfiler:::kegg_list("all")
Reading KEGG annotation online: "https://rest.kegg.jp/list/all"...
fail to download KEGG data...
NULL
Warning message:
In download.file(url, method = method, ...) :
cannot open URL 'https://rest.kegg.jp/list/all': HTTP status was '400 Bad Request'
r$> clusterProfiler:::kegg_list()
Error in clusterProfiler:::kegg_list() :
argument "db" is missing, with no default
r$> clusterProfiler:::kegg_list
function (db, species = NULL)
{
if (db == "pathway") {
url <- paste("https://rest.kegg.jp/list", db, species,
sep = "/")
}
else {
url <- paste("https://rest.kegg.jp/list", db, sep = "/")
}
kegg_rest(url)
}
<bytecode: 0x560062430ac8>
<environment: namespace:clusterProfiler>
I have already installed the newest createKEGGdb and clusterProfiler
I'd follow your package, but backgroud number
of KEGG is not consistent with Online for mouse.
KEGG.db is 7650 but online is 8656.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.