Giter Site home page Giter Site logo

ysc / questionansweringsystem Goto Github PK

View Code? Open in Web Editor NEW
2.0K 217.0 1.3K 18.67 MB

QuestionAnsweringSystem是一个Java实现的人机问答系统,能够自动分析问题并给出候选答案。

License: Apache License 2.0

Java 99.63% Shell 0.34% Batchfile 0.03%

questionansweringsystem's Introduction

QuestionAnsweringSystem是一个Java实现的人机问答系统,能够自动分析问题并给出候选答案。IBM人工智能计算机系统"沃森"(Watson)在2011年2月美国热门的电视智力问答节目"危险边缘"(Jeopardy!)中战胜了两位人类冠军选手,QuestionAnsweringSystem就是IBM Watson的Java开源实现。

QuestionAnsweringSystem技术实现简要分析

QuestionAnsweringSystem在100offer举办的「寻找实干和坚持的技术力量」Side Project赞助活动中荣获最具人气奖

100offer最具人气奖.png

捐赠致谢

使用方法

1、安装JDK8和Maven3.3.3
    将JDK的bin目录和Maven的bin目录加入PATH环境变量,确保在命令行能调用java和mvn命令:
    java -version
        java version "1.8.0_60"
    mvn -v
        Apache Maven 3.3.3
        
2、获取人机问答系统源码
    git clone https://github.com/ysc/QuestionAnsweringSystem.git
    cd QuestionAnsweringSystem
    建议自己注册一个GitHub账号,将项目Fork到自己的账号下,然后再从自己的账号下签出项目源码,
    这样便于使用GitHub的Pull requests功能进行协作开发。

3、运行项目
    unix类操作系统执行:
        chmod +x startup.sh & ./startup.sh
    windows类操作系统执行:
        ./startup.bat

4、使用系统
    打开浏览器访问:http://localhost:8080/deep-qa-web/index.jsp

工作原理

1、判断问题类型(答案类型),当前使用模式匹配的方法,将来支持更多的方法,如朴素贝叶斯分类器。
2、提取问题关键词。
3、利用问题关键词搜索多种数据源,当前的数据源主要是人工标注的语料库、谷歌、百度。
4、从搜索结果中根据问题类型(答案类型)提取候选答案。
5、结合问题以及搜索结果对候选答案进行打分。
6、返回得分最高的TopN项候选答案。

目前支持5种问题类型(答案类型)

1、人名 
	如:
	APDPlat的作者是谁?
	APDPlat的发起人是谁?
	谁死后布了七十二疑冢?
	***最爱的女人是谁?
2、地名
	如:
	“海的女儿”是哪个城市的城徽?
	世界上流经国家最多的河流是哪一条?
	世界上最长的河流是什么?
	汉城是哪个国家的首都?
3、机构团体名
	如:
	BMW是哪个汽车公司制造的?
	长城信用卡是哪家银行发行的?
	美国历史上第一所高等学府是哪个学校?
	前身是红色中华通讯社的是什么?
4、数字
	如:
	全球表面积有多少平方公里?
	撒哈拉有多少平方公里?
	北京大学占地多少平方米?
	撒哈拉有多少平方公里?
5、时间
	如:
	哪一年第一次提出“大跃进”的口号?
	大庆油田是哪一年发现的?
	澳门是在哪一年回归祖国怀抱的?
	***在什么时候进行南巡讲话?

增加新的问题类型(答案类型)

1、在枚举类 org.apdplat.qa.model.QuestionType 中
   增加新的问题类型,并在词性和问题类型之间做映射。
   
2、在资源目录 src/main/resources/questionTypePatterns 中增加新的模式匹配规则来支持新的问题类型的判定
   目录中的 3 个文件代表不同抽象层级的模式,只需要在其中一个文件中增加新的模式即可。
   
3、在类 org.apdplat.qa.questiontypeanalysis.QuestionTypeTransformer 中
   将模式匹配规则映射为枚举类 org.apdplat.qa.model.QuestionType 的实例。

API接口

调用地址:
	http://127.0.0.1/deep-qa-web/api/ask?n=1&q=APDPlat的作者是谁?
参数:
	n表示需要返回的答案的个数
	q表示问题
编码:
	服务端和客户端均使用UTF-8编码
	服务端需要修改tomcat配置文件conf/server.xml,在相应的Connector中加入配置URIEncoding="UTF-8"
返回json:
	[
		{
			"answer": "杨尚川",
			"score": 1
		}
	]

使用说明

1、初始化MySQL数据库(MySQL作为数据缓存区使用,此步骤可选):

在MySQL命令行中执行QuestionAnsweringSystem/deep-qa/src/main/resources/mysql/questionanswer.sql文件中的脚本   
MySQL编码:UTF-8,
主机:127.0.0.1
端口:3306
数据库:questionanswer
用户名:root
密码:root

2、构建war文件并部署到tomcat:

cd QuestionAnsweringSystem   
mvn install
cp deep-qa-web/target/deep-qa-web-1.2.war apache-tomcat-8.0.27/webapps/   
启动tomcat

3、打开浏览器访问:

http://localhost:8080/deep-qa-web-1.2/index.jsp

可部署war包下载

在你的应用中集成人机问答系统QuestionAnsweringSystem

QuestionAnsweringSystem提供了两种集成方式,以库的方式嵌入到应用中,以平台的方式独立部署。

下面说说这两种方式如何做。

1、以库的方式嵌入到应用中。

这种方式只支持Java平台,可通过Maven依赖将库加入构建路径,如下所示:

<dependency>
    <groupId>org.apdplat</groupId>
    <artifactId>deep-qa</artifactId>
    <version>1.2</version>
</dependency>

在应用如何使用呢?示例代码如下:

String questionStr = "APDPlat的作者是谁?";
Question question = SharedQuestionAnsweringSystem.getInstance().answerQuestion(questionStr);
if (question != null) {
    List<CandidateAnswer> candidateAnswers = question.getAllCandidateAnswer();
    int i=1;
    for(CandidateAnswer candidateAnswer : candidateAnswers){
        System.out.println((i++)+"、"+candidateAnswer.getAnswer()+":"+candidateAnswer.getScore());
    }
}

运行程序后会在当前目录下生成目录deep-qa,目录里面又有两个目录dic和questionTypePatterns。
dic是中文分词组件依赖的词库,questionTypePatterns是问题类别分析依赖的模式定义,可根据自己的需要修改。

2、以平台的方式独立部署。

首先在自己的服务器上如192.168.0.1部署好了,然后就可以通过Json Over HTTP的方式提供服务,使用方法如下所示:

调用地址:
http://192.168.0.1/deep-qa-web/api/ask?n=1&q=APDPlat的作者是谁?
参数:
n表示需要返回的答案的个数
q表示问题
编码:
UTF-8编码
返回json:
[
    {
        "answer": "杨尚川",
        "score": 1
    }
]

深入了解

QuestionAnsweringSystem由2个子项目构成,deep-qa和deep-qa-web。
deep-qa是核心部分,deep-qa-web提供web界面来和用户交互,同时也提供了Json Over HTTP的访问接口,便于异构系统的集成。
deep-qa是一个jar包,可通过maven引用:

<dependency>
    <groupId>org.apdplat</groupId>
    <artifactId>deep-qa</artifactId>
    <version>1.2</version>
</dependency>

示例代码如下:

String questionStr = "APDPlat的作者是谁?";
Question question = SharedQuestionAnsweringSystem.getInstance().answerQuestion(questionStr);
if (question != null) {
    List<CandidateAnswer> candidateAnswers = question.getAllCandidateAnswer();
    int i=1;
    for(CandidateAnswer candidateAnswer : candidateAnswers){
        System.out.println((i++)+"、"+candidateAnswer.getAnswer()+":"+candidateAnswer.getScore());
    }
}

运行程序后会在当前目录下生成目录deep-qa,目录里面又有两个目录dic和questionTypePatterns。
dic是中文分词组件依赖的词库,questionTypePatterns是问题类别分析依赖的模式定义,可根据自己的需要修改。

Watson介绍

Watson is a computer system like no other ever built. 
It analyzes natural language questions and content well enough and fast enough 
to compete and win against champion players at Jeopardy!

IBM Watson: How it Works

Building Watson - A Brief Overview of the DeepQA Project

This is Watson:A detailed explanation of how Watson works

The DeepQA Research Team

相关文章

测试人机问答系统智能性的3760个问题

人机问答系统的前世今生

人机问答系统的类别

What is Question Answering?

其他人机问答系统介绍

1、OpenEphyra(Java开源)

Ephyra is a modular and extensible framework for open domain question answering (QA). 
The system retrieves accurate answers to natural language questions from the Web and 
other sources. 

OpenEphyra主页

2、Watsonsim(Java开源)

Open-domain question answering system from UNCC.
Watsonsim works using a pipeline of operations on questions, candidate answers, and 
their supporting passages. 
In many ways it is similar to IBM's Watson, and Petr's YodaQA. 
It's not all that similar to more logic based systems like OpenCog or Wolfram Alpha.

Watsonsim主页

3、YodaQA(Java开源)

YodaQA is an open source Question Answering system.
using on-the-fly Information Extraction from various data sources (mainly enwiki).
YodaQA stands for "Yet anOther Deep Answering pipeline" and 
the system is inspired by the DeepQA (IBM Watson) papers. 
It is built on top of the Apache UIMA.

YodaQA主页

4、OpenQA(Java开源)

OpenQA is an open source question answering framework that unifies approaches from 
several domain experts. 
The aim of OpenQA is to provide a common platform that can be used to promote advances 
by easy integration and measurement of different approaches.

OpenQA主页

5、START(商业)

START, the world's first Web-based question answering system, has been on-line 
and continuously operating since December, 1993. 

It has been developed by Boris Katz and his associates of the InfoLab Group 
at the MIT Computer Science and Artificial Intelligence Laboratory. 

Unlike information retrieval systems (e.g., search engines), 
START aims to supply users with "just the right information" 
instead of merely providing a list of hits. 

Currently, the system can answer millions of English questions about 
places (e.g., cities, countries, lakes, coordinates, weather, maps, demographics, 
political and economic systems), movies (e.g., titles, actors, directors), 
people (e.g., birth dates, biographies), dictionary definitions, and much, much more.

START主页

6、IBM Watson(商业)

Watson is built to mirror the same learning process that we have.
Watson has been learning the language of professions and is trained 
by experts to work across many different industries. 

IBM Watson主页

7、Siri(商业)

Siri /ˈsɪri/ is a part of Apple Inc.'s iOS which works as 
an intelligent personal assistant and knowledge navigator. 
The feature uses a natural language user interface to 
answer questions, make recommendations, and perform actions 
by delegating requests to a set of Web services. 

Siri主页

8、Wolfram|Alpha(商业)

Wolfram|Alpha introduces a fundamentally new way to get knowledge and answers 
not by searching the web, but by doing dynamic computations based on a vast collection 
of built-in data, algorithms, and methods.

Wolfram|Alpha主页

9、Evi(商业)

Evi was founded in August 2005, originally under the name of True Knowledge, with the mission 
of powering a new kind of search experience where users can access the world's knowledge simply 
by asking for the information they need in a way that is completely natural.

Evi主页

10、微软小冰(商业)

微软小冰是智能聊天机器人,基于微软搜索引擎和大数据积累,所有数据全部来自于公开的互联网网页信息。

微软小冰主页

11、Magi Semantic Search(商业)

Magi is a search engine that gives you answers instead of references. 
It's designed to be General, Feasible and Useful. 

Magi Semantic Search主页

https://travis-ci.org/ysc/QuestionAnsweringSystem

questionansweringsystem's People

Contributors

hankcs avatar intfloat avatar ysc avatar yuchaozhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

questionansweringsystem's Issues

是否可以添加新的训练数据到数据库中

想请问一下,这里的模型可以用于输入问题给出答案,那是否可以采用该模型添加额外的数据到数据库中用于训练模型。另外,原始的数据库中的数据来源是什么,是否也是经过了问题分析中类似的特征提取步骤用于提取关键字,然后将关键字保存至数据库中。

空指针异常

正常回答完一个问题返回后,再次提交问题会NullpointExceptionN

type Exception report

message

description The server encountered an internal error () that prevented it from fulfilling this request.

exception

org.apache.jasper.JasperException: An exception occurred processing JSP page /index.jsp at line 40

37: List candidateAnswers = null;
38: if (questionStr != null && questionStr.trim().length() > 3) {
39: questionStr = new String(questionStr.getBytes("ISO8859-1"), "UTF-8");
40: Question question = questionAnsweringSystem.answerQuestion(questionStr);
41: if (question != null) {
42: candidateAnswers = question.getAllCandidateAnswer();
43: }

Stacktrace:
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:568)
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:470)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
root cause

java.lang.NullPointerException
edu.stanford.nlp.trees.international.pennchinese.ChineseGrammaticalStructure.collapsePrepAndPoss(ChineseGrammaticalStructure.java:94)
edu.stanford.nlp.trees.international.pennchinese.ChineseGrammaticalStructure.collapseDependencies(ChineseGrammaticalStructure.java:73)
edu.stanford.nlp.trees.GrammaticalStructure.typedDependenciesCCprocessed(GrammaticalStructure.java:737)
org.apdplat.qa.questiontypeanalysis.patternbased.MainPartExtracter.getMainPart(MainPartExtracter.java:150)
org.apdplat.qa.questiontypeanalysis.patternbased.MainPartExtracter.getMainPart(MainPartExtracter.java:129)
org.apdplat.qa.questiontypeanalysis.patternbased.MainPartExtracter.getMainPart(MainPartExtracter.java:107)
org.apdplat.qa.questiontypeanalysis.patternbased.PatternBasedMultiLevelQuestionClassifier.extractQuestionPatternFromQuestion(PatternBasedMultiLevelQuestionClassifier.java:336)
org.apdplat.qa.questiontypeanalysis.patternbased.PatternBasedMultiLevelQuestionClassifier.classify(PatternBasedMultiLevelQuestionClassifier.java:126)
org.apdplat.qa.system.QuestionAnsweringSystemImpl.answerQuestions(QuestionAnsweringSystemImpl.java:182)
org.apdplat.qa.system.QuestionAnsweringSystemImpl.answerQuestion(QuestionAnsweringSystemImpl.java:174)
org.apdplat.qa.system.QuestionAnsweringSystemImpl.answerQuestion(QuestionAnsweringSystemImpl.java:158)
org.apache.jsp.index_jsp._jspService(index_jsp.java:92)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:432)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
note The full stack trace of the root cause is available in the Apache Tomcat/7.0.26 logs.

当运行BaiduDemo.java 类时,报如下错误Unable to resolve "chineseFactored.ser.gz" as either class path, filename or URL

17:17:39.892 [main] INFO o.a.q.q.p.MainPartExtracter - 模型:chineseFactored.ser.gz
Loading parser from serialized file chineseFactored.ser.gz ...
java.io.IOException: Unable to resolve "chineseFactored.ser.gz" as either class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:434)
at edu.stanford.nlp.io.IOUtils.readStreamFromString(IOUtils.java:368)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromSerializedFile(LexicalizedParser.java:606)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(LexicalizedParser.java:401)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:158)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:144)
at org.apdplat.qa.questiontypeanalysis.patternbased.MainPartExtracter.(MainPartExtracter.java:57)
Loading parser from text file chineseFactored.ser.gz java.io.IOException: Unable to resolve "chineseFactored.ser.gz" as either class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:434)
at edu.stanford.nlp.io.IOUtils.readerFromString(IOUtils.java:512)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromTextFile(LexicalizedParser.java:540)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(LexicalizedParser.java:403)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:158)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:144)
at org.apdplat.qa.questiontypeanalysis.patternbased.MainPartExtracter.(MainPartExtracter.java:57)

增加新的问题类型

因为在QuestionTypeTransformer把Object和Definition注释掉了,在<增加新的问题类型>中怎样增加自己想要的问题和答案类型。比如如何增加<怎样评价github网站:该网站是一个XXX的类型的网站...>这样的问答类型?

关于自定义问题

我使用了自定义的问题答案,按照目录路径/org/apdplat/qa/files/,添加了两个文件,可是始终无法正常读取到,直接读取会抛出异常报错,请问一下是哪里配置了读取的固定文件吗,是从引用的jar中读取的吗?
我删除了引用的qa的四个库,不再报错,读取的路径也是正常的我定义的文件,可是读取的文件内容是之前拷贝的测试使用的time_material.txt中的内容(内容已经更改过),我不清楚是什么情况,是不是关联到了设定的,还请大神指点一二。

Can anyone help me with this Exception when java call python

I am testing the method for my java application to call python code( .py );

Tried jython.jar: worked well with simple python but when the python file included third party jar it won't work;

Trying Runtime.getRuntime().exec(), Exception said "Cannot run program "F:/WorkSpace2/testPython/src/testPython/python.py": CreateProcess error=193, %1 is not a valid Win32 application"

求助

就是我看了下源码?求问提取完问题模式后和文件中的模式是做严格的字符匹配吗?.*例如文件中的(RW.RWOrdinaryMulti)/(nr|nr1|nr2|nrj|nrf).这个模式,.()的意思是正则匹配中的对应的含义吗?没看到匹配过程呢

关于自定义问题

我在QuestionAnsweringSystem/deep-qa/src/main/resources/org/apdplat/qa/files中增加了两个定义的Definition类型的问题,FilesConfig中也增加了引用,希望能够将特定的问题关联到定义好的问题答案中去,在测试的时候发现还是没有检测到实际引用,引用的是网上获取的,还请大神指点,需要怎么做才能关联到自定义的问题答案。谢谢。

问题类型的模式怎样获得

读了这个项目的问题分类模块。
其中有level1到level3三个问题模式的文件。
我猜想的这些模式的生成方式是:
1.首先是在level1中罗列一些实际问题,并有分类。
2.level2、level3是直接根据问题类型从level1生成的。
因为我想增加几个问题类型,请指点一下怎样得到这些问题模式?@ysc

搜索到Evidence 0 条

您好,我在昨天运行代码的时候,查询结果是有的,但是今天再次查询的时候,一直提示搜索到Evidence 0 条,请问一下这个问题怎么解决?运行在windows上面,编辑器是idea

分词总是分错,是怎么回事?

对问题进行分词:苏格兰属于哪个洲
分词结果为:苏 格 兰 属 于 哪 个 洲

我check下来源码在eclipse里运行多次,我记得就有一次运行是自动回答正确的

我初步断定是分词的错误,因为在CommonCandidateAnswerSelect类里

if (word.getText().length() < 2){
                LOG.debug("忽略长度小于2的候选答案:"+word);
                continue;
}

会忽略长度小于2的候选答案,所以上面的 苏格兰 会被忽略

[疑似Tomcat自身的BUG] Tomcat路径中包含空格时,报错“回答问题失败”

1.问题
当Tomcat路径中包含空格时(例如:D:\Program Files\apache-tomcat-8.0.38),控制台报错:加载资源失败,
并且QuestionAnsweringSystem网页前端提示“回答问题失败”。

2.Tomcat控制台报错信息(部分)

开始加载资源
classpath:web/dic/word_v_1_3/part_of_speech_des.txt
类路径资源:web/dic/word_v_1_3/part_of_speech_des.txt
类路径资源URL:file:/D:/Te%20st/apache-tomcat-8.0.38/webapps/deep-qa-web-2017-06-01/WEB-IN
F/classes/web/dic/word_v_1_3/part_of_speech_des.txt
加载资源:D:\Te%20st\apache-tomcat-8.0.38\webapps\deep-qa-web-2017-06-01\WEB-INF\classes\w
eb\dic\word_v_1_3\part_of_speech_des.txt
加载资源失败:D:\Te%20st\apache-tomcat-8.0.38\webapps\deep-qa-web-2017-06-01\WEB-INF\class
es\web\dic\word_v_1_3\part_of_speech_des.txt


java.io.FileNotFoundException: D:\Te%20st\apache-tomcat-8.0.38\webapps\deep-qa-web-2017-06
-01\WEB-INF\classes\web\dic\word_v_1_3\part_of_speech_des.txt (系统找不到指定的路径。)
        at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_31]
        at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[na:1.8.0_31]
        at java.io.FileInputStream.<init>(FileInputStream.java:93) ~[na:1.8.0_31]
        at org.apdplat.word.util.AutoDetector.load(AutoDetector.java:354) [word-1.3.jar:na
]
        at org.apdplat.word.util.AutoDetector.loadClasspathResource(AutoDetector.java:133)
 [word-1.3.jar:na]

3.最简单的解决方法
把Tomcat放置到不包含空格的路径中即可。

NullPointerException与数据库乱码

运行org/apdplat/qa/BaiduDemo.java后,空指针异常,并且从数据库中看到全部是乱码。以下是控制台输出:

从类路径的 /org/apdplat/qa/files/person_name_questions.txt 中加载Question:谁发现的万有引力定理:牛顿
Question:谁发现的万有引力定理
ExpectAnswer:牛顿
从数据库中查询到Question:??????????
使用【模式匹配】的方法判断问题类型: ??????????
问题:??????????
词和词性序列:
词性序列:
对问题进行分词:??????????
分词结果为:
句法树:
X
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.trees.international.pennchinese.ChineseGrammaticalStructure.collapsePrepAndPoss(ChineseGrammaticalStructure.java:94)
at edu.stanford.nlp.trees.international.pennchinese.ChineseGrammaticalStructure.collapseDependencies(ChineseGrammaticalStructure.java:73)
at edu.stanford.nlp.trees.GrammaticalStructure.typedDependenciesCCprocessed(GrammaticalStructure.java:737)
at org.apdplat.qa.questiontypeanalysis.patternbased.MainPartExtracter.getMainPart(MainPartExtracter.java:150)

关于余弦相似度算法与分词

分词:

** -> **
大** -> 大 傻 逼

余弦相似度算法是不是有问题,我使用其他方式计算与您这里的结果不符合,我的代码:

package DOC.Similarity;

import java.io.UnsupportedEncodingException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

/**
 * 2017/7/20
 * Created by dylan.
 * Home: http://www.devdylan.cn
 */
public class CosineSimilarAlgorithm {
    public static double getSimilarity(String doc1, String doc2) {
        if (doc1 != null && doc1.trim().length() > 0 && doc2 != null
                && doc2.trim().length() > 0) {

            if (Math.abs(doc2.length() - doc1.length()) > 10) {
                return 0;
            }
            Map<Integer, int[]> AlgorithmMap = new HashMap<Integer, int[]>();

            //将两个字符串中的中文字符以及出现的总数封装到,AlgorithmMap中
            for (int i = 0; i < doc1.length(); i++) {
                char d1 = doc1.charAt(i);
                if(isHanZi(d1)){
                    int charIndex = getGB2312Id(d1);
                    if(charIndex != -1){
                        int[] fq = AlgorithmMap.get(charIndex);
                        if(fq != null && fq.length == 2){
                            fq[0]++;
                        }else {
                            fq = new int[2];
                            fq[0] = 1;
                            fq[1] = 0;
                            AlgorithmMap.put(charIndex, fq);
                        }
                    }
                }
            }

            for (int i = 0; i < doc2.length(); i++) {
                char d2 = doc2.charAt(i);
                if(isHanZi(d2)){
                    int charIndex = getGB2312Id(d2);
                    if(charIndex != -1){
                        int[] fq = AlgorithmMap.get(charIndex);
                        if(fq != null && fq.length == 2){
                            fq[1]++;
                        }else {
                            fq = new int[2];
                            fq[0] = 0;
                            fq[1] = 1;
                            AlgorithmMap.put(charIndex, fq);
                        }
                    }
                }
            }

            Iterator<Integer> iterator = AlgorithmMap.keySet().iterator();
            double sqDoc1 = 0;
            double sqDoc2 = 0;
            double denominator = 0;
            while(iterator.hasNext()){
                int[] c = AlgorithmMap.get(iterator.next());
                denominator += c[0]*c[1];
                sqDoc1 += c[0]*c[0];
                sqDoc2 += c[1]*c[1];
            }

            return denominator / Math.sqrt(sqDoc1*sqDoc2);
        } else {
            return 0;
        }
    }

    private static boolean isHanZi(char ch) {
        // 判断是否汉字
        return (ch >= 0x4E00 && ch <= 0x9FA5);

    }

    /**
     * 根据输入的Unicode字符,获取它的GB2312编码或者ascii编码,
     *
     * @param ch
     *            输入的GB2312中文字符或者ASCII字符(128个)
     * @return ch在GB2312中的位置,-1表示该字符不认识
     */
    private static short getGB2312Id(char ch) {
        try {
            byte[] buffer = Character.toString(ch).getBytes("GB2312");
            if (buffer.length != 2) {
                // 正常情况下buffer应该是两个字节,否则说明ch不属于GB2312编码,故返回'?',此时说明不认识该字符
                return -1;
            }
            int b0 = (int) (buffer[0] & 0x0FF) - 161; // 编码从A1开始,因此减去0xA1=161
            int b1 = (int) (buffer[1] & 0x0FF) - 161; // 第一个字符和最后一个字符没有汉字,因此每个区只收16*6-2=94个汉字
            return (short) (b0 * 94 + b1);
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        return -1;
    }
}

使用您这边的余弦相似度算法貌似结果不太正确。

Windows环境下,浏览器请求过程中后台报java.nio.file.InvalidPathException: Illegal char 错误

Windows环境下,请求过程中报java.nio.file.InvalidPathException: Illegal char 错误。

具体错误日志:

20:36:11,926 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO
20:36:11,926 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [logfile] to Logger[ROOT]
20:36:11,929 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [stdout] to Logger[ROOT]

开始构造问答系统
模型:models/chineseFactored.ser.gz
Loading parser from serialized file models/chineseFactored.ser.gz ... done [17.6 sec].
模式文件目录:/E:/Java/IDE/sts-bundle/vfabric-tc-server-developer-2.9.6.RELEASE/base-instance/wtpwebapps/deep-qa-web/WEB-INF/classes/questionTypePatterns/
模式匹配策略启用文件:QuestionTypePatternsLevel1_true.txt
模式匹配策略启用文件:QuestionTypePatternsLevel2_true.txt
模式匹配策略启用文件:QuestionTypePatternsLevel3_true.txt
模式文件:QuestionTypePatternsLevel1_true.txt
是否允许多匹配:true
模式文件:QuestionTypePatternsLevel2_true.txt
是否允许多匹配:true
模式文件:QuestionTypePatternsLevel3_true.txt
是否允许多匹配:true
问答系统构造完成
Question:谁是资深Nutch搜索引擎专家? 搜索到Evidence 8 条
将Question:谁是资深Nutch搜索引擎专家? 加入MySQL数据库
使用【模式匹配】的方法判断问题类型: 谁是资深Nutch搜索引擎专家?
问题:谁是资深Nutch搜索引擎专家?
四月 28, 2015 8:36:34 下午 org.apache.catalina.core.StandardWrapperValve invoke
严重: Servlet.service() for servlet [jsp] in context with path [/deep-qa-web] threw exception [javax.servlet.ServletException: java.lang.ExceptionInInitializerError] with root cause
java.nio.file.InvalidPathException: Illegal char <:> at index 2: /E:/Java/IDE/sts-bundle/vfabric-tc-server-developer-2.9.6.RELEASE/base-instance/wtpwebapps/deep-qa-web/WEB-INF/classes/web/dic/word_v_1_3/word.local.conf
at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94)
at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255)
at java.nio.file.Paths.get(Paths.java:84)
at org.apdplat.qa.parser.WordParser.(WordParser.java:48)
at org.apdplat.qa.questiontypeanalysis.patternbased.PatternBasedMultiLevelQuestionClassifier.extractQuestionPatternFromQuestion(PatternBasedMultiLevelQuestionClassifier.java:301)
at org.apdplat.qa.questiontypeanalysis.patternbased.PatternBasedMultiLevelQuestionClassifier.classify(PatternBasedMultiLevelQuestionClassifier.java:126)
at org.apdplat.qa.system.QuestionAnsweringSystemImpl.answerQuestions(QuestionAnsweringSystemImpl.java:182)
at org.apdplat.qa.system.QuestionAnsweringSystemImpl.answerQuestion(QuestionAnsweringSystemImpl.java:174)
at org.apdplat.qa.system.QuestionAnsweringSystemImpl.answerQuestion(QuestionAnsweringSystemImpl.java:158)
at org.apache.jsp.index_jsp._jspService(index_jsp.java:84)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:432)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

句子依存关系 和 句法树 的一些小问题?多谢!

句法树:

(ROOT [81.764]
  (IP [81.634]
    (NP [66.213]
      (DNP [47.088]
        (NP [46.697]
          (NP [14.063] (NR **电视台))
          (ADJP [5.707] (JJ 著名))
          (NP [22.350] (NN 节目) (NN 实话实说)))
        (DEG 的))
      (QP [3.267] (OD 第一任))
      (NP [9.925] (NN 主持人)))
    (VP [11.021] (VC 是)
      (NP [8.042] (PN 谁)))))

句子依存关系:

    nn(实话实说-4, **电视台-1)
    amod(实话实说-4, 著名-2)
    nn(实话实说-4, 节目-3)
    assmod(主持人-7, 实话实说-4)
    assm(实话实说-4, 的-5)
    nummod(主持人-7, 第一任-6)
    top(是-8, 主持人-7)
    root(ROOT-0, 是-8)
    attr(是-8, 谁-9)

希望大神可以给些相关资料,不懂里面字母和数字的意思,比如[]里的数字等等,多谢大神!

关于新增一个问答模式(模板)

请问一下数据库中三个表,evidence,question,rewind之间的关系,假如我需要新增一个问答模式(模版),应该如何利用这三个表,再次感谢

关于修改编译代码

请问如果想自己修改代码的话,是不是需要把pom.xml中关联的deep-qa和deep-qa-web去掉,Java小白,大神见笑了。

BigramEvidenceScore评分规则问题

Tools.countsForBigram中,index > 0才计入评分,那开头一致的不算匹配成功。
比如question是姚明的女儿是谁,evidence中title是姚明的女儿叫姚沁蕾,匹配只能匹配到“的女儿”,而不会将“姚明的”计入评分,这是出于何种考虑?

问题类型分类

请问一下,我部署了您的系统之后,不管我问什么问题,问题类型分类都匹配不到,最后默认为“PERSON_NAME”。这是什么原因?

运行http://localhost:8088/deep-qa-web-1.2/index.jsp时,出现正确界面,但是提问出错

type Exception report

message javax.servlet.ServletException: java.lang.OutOfMemoryError: Java heap space

description The server encountered an internal error that prevented it from fulfilling this request.

应该是jvm内存默认分配过小,便尝试修改了catalina.bat文件
set CATALINA_OPTS=-Xms512M -Xmx512M
set JAVA_OPTS=-Xms512M -Xmx512M
修改了很多次,依旧出现堆内存溢出。不知道是什么问题.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.