Giter Site home page Giter Site logo

kyzhouhzau / knowledge-graph Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 2.0 48.49 MB

a QA Demo based on KG! use scrapy and jena.

Python 89.68% Jupyter Notebook 0.01% Ruby 8.72% Shell 0.87% Batchfile 0.07% Smarty 0.66%
kg qa python knowledge-graph question-answering

knowledge-graph's Introduction

step1:数据获取三元组构建

scrapy 爬去基因-突变-疾病-描述文本 信息(动态IP,不然秒封)
将爬取信息一份存入mysql数据库备份,一份存为CSV文件
数据清洗,预处理后构建RDF三元组
三元组结构:(不做描述)

step2:数据载入图数据库Jena

./tdbloader --loc=E:\KG_DEMO\AGAC_KGQA_PART\static\test.ttl \tdb
#启动服务
./fuseki-server --update --loc=E:\apache-jena-fuseki-3.9.0\tdb  /tdb
#运行前端
python manage.py runserver 0.0.0.0:8000

框架解释:

  • Django构建网页前端,为一个搜索界面。
  • 搜索的自然语言被post到后台。 后台用jieba结合字典对英文切词(英文切词与中文不同需要修改部分源码)字典包含《药物字典》,《疾病字典》。
  • 分词后如果里面有单词出现在疾病或者药物字典中。
  • 根据出现的药物或者疾病词构建Sparql查询。(用字典匹配是由于目前版本的知识模型逻辑上比较简单,通过三层跳跃便可以直接由疾病找到药物,或者从药物找到疾病) (可选的方案:1、用refo多规则匹配,将匹配结果转成sparql;2、用依存句法解析,提取o,p,b关系,对应到sparql图查询上;3、基于机器学习按照机器翻译的逻辑去训练这种转换过程。 当然如果问题复杂,这里也会涉及实体的对齐,可以基于向量相似度,或者子图相似度来识别,从而将自然语言映射到实体上。另外分此后基于字典识别实体只适用于这种逻辑简单的搜索中。当逻辑复杂,可以先做命名实体识别,再做实体映射,最后做自然语言转sparql)
  • 用spaql查询endpoints得到的json用Echarts绘制成关系图。最后返回该关系网络。

注:这个是一个非常简版的问答Demo,可以基于以上思路改进。并且自己抽空再去完善。本人并不是研究该方向,只是抽了两天多时间做了这个希望熟悉下基本流程涨点知识,有特别多的不足,望大神指点。 此外这边数据目前不方便共享,其中某些节点是用神经网络预测得到。只提供了部分几条

参考:http://www.openkg.cn/tool/refo-kbqa

knowledge-graph's People

Contributors

kyzhouhzau avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.