Giter Site home page Giter Site logo

chatbot's Introduction

Mianbot

demo

Mianbot 是採用樣板與檢索式模型搭建的聊天機器人,目前有兩種產生回覆的方式,專案仍在開發中:)

  • 其一(左圖)是以詞向量進行短語分類,針對分類的目標模組實現特徵抽取與記憶回覆功能,以進行多輪對話,匹配方式可參考Semantic Graph(目前仍在施工中 ΣΣΣ (」○ ω○ )/)。
  • 其二(右圖)除了天氣應答外,主要是以 PTT Gossiping 作為知識庫,透過文本相似度的比對取出與使用者輸入最相似的文章標題,再從推文集內挑選出最為可靠的回覆,程式內容及實驗過程請參見PTT-Chat_Generator

匹配示例

更多的樣例可以參照 example/output.txt

輸入:明天早上叫我起床。

相似度 概念 匹配元
0.4521 鬧鐘 起床
0.3904 天氣 早上
0.3067 住宿 起床
0.1747 病症 起床
0.1580 購買 早上
0.1270 股票 早上
0.1096 觀光 早上

輸入:明天上海會不會下雨?

相似度 概念 匹配元
0.5665 天氣 下雨
0.3918 鬧鐘 下雨
0.1807 病症 下雨
0.1362 住宿 下雨
0.0000 股票
0.0000 觀光
0.0000 購買

環境需求

import Chatbot.console as console
c = console.Console(model_path='your_model')
  • 如要使用 QA 模組,請先依照問答測試用資料集進行配置,或透過將chatbot.py 中的 self.github_qa_unupdated 設為 True 選擇關閉 QA 模組

使用方式

聊天機器人

import Chatbot.chatbot as chatbot

chatter = chatbot.Chatbot(w2v_model_path='your_model')
chatter.waiting_loop()

計算匹配度

import Chatbot.console as console

c = console.Console(model_path='your_model')
speech = input('Input a sentence:')
res,path = c.rule_match(speech)
c.write_output(speech,res,path)

規則格式

規則採用 json 格式,樣板規則放置於\RuleMatcher\rule中,

    {
        "domain": "代表這個規則的抽象概念",
        "response": [
		"對應到該規則後",
        	"機器人所會給予的回覆",
        	"機器人會隨機抽取一條 response"
        ],
        "concepts": [
            "該規則的可能表示方式"
        ],
        "children": ["該規則的子規則","如購買 -> 購買飲料,購買衣服......"]
    }

Example

    {
        "domain": "購買",
        "response": [
        	"正在將您導向購物模組"
        ],
        "concepts": [
            "購買","購物","訂購"
        ],
        "children": [
            "購買生活用品",
            "購買家電",
            "購買食物",
            "購買飲料",
            "購買鞋子",
            "購買衣服",
            "購買電腦產品"
        ]
    },

問答測試用資料集

請點擊這裡下載部分測試用資料集,內容包含了 PTT C_Chat、Gossiping 版非新聞類問答約 250,000 則。檔案解壓縮後請放置於 QuestionAnswering/data/ 資料夾下,reply.rar 解壓縮後的資料夾請放置於 QuestionAnswering/data/processed 下:

QuestionAnswering
└── data
   ├── SegTitles.txt
   ├── processed
   │   └── reply
   │       ├── 0.json
   │       ├── .
   │       ├── .
   │       ├── .
   │       └── xxx.json
   └── Titles.txt

完成配置後,可以將chatbot.py 中的 self.github_qa_unupdated 設為 False 打開問答模組進行測試。

開發日誌

特別致謝

  • 網路探勘暨跨語知識系統實驗室
  • 智慧型知識管理實驗室
  • Legoly
  • 給予我協助與交流的每名朋友

chatbot's People

Contributors

chiachun1127 avatar david30907d avatar nickbanana avatar zake7749 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatbot's Issues

对话

请问题主大大,我通过word2vec训练的模型添加进工程文件当中,能够运行成功,每输入一句话都可以得到一句回复,但是有个问题,所有的回复都局限于几句话:“你好”,“我不太明白你的意思”,“原来如此”,“是吗”等等,反反复复,没有针对性回答,这个问题该如何改进呢?我现在把问答界面赋于如下:
你好,我是 MianBot
你好
Handler of '問候' have not implemented
你好
很高兴认识你
我不太明白你的意思
你现在哪
是嗎?
今天的天气怎么样
是嗎?
明天去哪玩
我不太明白你的意思
早上好
是嗎?

运行报错

FileNotFoundError: [Errno 2] No such file or directory: 'model/ch-corpus-3sg.bin'

NameError: name 'exit' is not defined

在运行demo_chatbot.py时报错,显示Chatbot-master/Chatbot/RuleMatcher文件夹下的rulebase.py第216行exit()出现错误:NameError: name 'exit' is not defined
在运行demo.py时也会报同样的错,因为调用的同样时这个rulebase.py
我的环境是python3.6.3,请问怎么回事?

[Errno 2] No such file or directory

我利用您提供的"使用 gensim 訓練中文詞向量"訓練好後,把模型放在該資料夾
也有將console更改如下
c = console.Console(model_path='word2vec-tutorial-master/word2vec.model')
但是出現[Errno 2] No such file or directory: 'word2vec-tutorial-master/word2vec.model'
請問我哪邊沒有完成嗎~~!?

model

No such file or directory: 'model/ch-corpus-3sg.bin,请问如何获取这个模型的?

bugs

File "test.py", line 9, in main
model = models.Word2Vec.load_word2vec_format('ch-corpus-3sg.bin',binary=True)

gensim/models/word2vec.py", line 1608, in load_word2vec_format
raise DeprecationWarning("Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.")

[ERROR] 出現 >> [Gensim] 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

請問大神,這個錯誤該怎麼解決,拜託幫忙了~~感覺差一點點可以完成
console檔程式碼如下

==========================================================
import random
import os

import jieba
import jieba.analyse

import RuleMatcher.rulebase as rulebase

class Console(object):

"""
Build some nlp function as an package.
"""

def __init__(self,model_path="model/ch-corpus-3sg.bin",
             rule_path="RuleMatcher/rule/",
             stopword="jieba_dict/stopword.txt",
             jieba_dic="jieba_dict/dict.txt.big",
             jieba_user_dic="jieba_dict/userdict.txt"):

    print("[Console] Building a console...")

    cur_dir = os.getcwd()
    curPath = os.path.dirname(__file__)
    os.chdir(curPath)

    # jieba custom setting.
    self.init_jieba(jieba_dic, jieba_user_dic)
    self.stopword = self.load_stopword(stopword)

    # build the rulebase.
    self.rb = rulebase.RuleBase()

    print("[Console] Loading the word embedding model...")

    try:
        self.rb.load_model(model_path)
        # models.Word2Vec.load('word2vec.model')
        
    except FileNotFoundError as e:
        print("[Console] 請確定詞向量模型有正確配置")
        print(e)
        exit()
    except Exception as e:
        print("[Gensim]")
        print(e)
        exit()

    print("[Console] Loading pre-defined rules.")
    self.rb.load_rules_from_dic(rule_path)

    print("[Console] Initialized successfully :>")

    os.chdir(cur_dir)


def listen(self):
    #into interactive console
    while True:
        self.show_information()
        choice = input('Your choice is: ')
        choice = choice.lower()
        if choice == 'e':
            res = self.jieba_tf_idf()
            for tag, weight in res:
                print('%s %s' % (tag, weight))
        elif choice == 'g':
            res = self.jieba_textrank()
            for tag, weight in res:
                print('%s %s' % (tag, weight))
        elif choice == 'p':
            print(self.rb)
        elif choice == 'r':
            self.rb.load_rules('RuleMatcher/rule/',reload=True)
        elif choice == 'd':
            self.test_speech()
        elif choice == 'm':
            speech = input('Input a sentence:')
            res,path = self.rule_match(speech)
            self.write_output(speech,res,path)
        elif choice == 'b':
            exit()
        elif choice == 's':
            rule_id = input('Input a rule id:')
            res = self.get_response(rule_id)
            if res is not None:
                print(res)
        elif choice == 'o':
            self.rb.output_as_json()
        else:
            print('[Opps!] No such choice: ' + choice + '.')

def jieba_textrank(self):

    """
    Use textrank in jieba to extract keywords in a sentence.
    """

    speech = input('Input a sentence: ')
    return jieba.analyse.textrank(speech, withWeight=True, topK=20)

def jieba_tf_idf(self):

    """
    Use tf/idf in jieba to extract keywords in a sentence
    """

    speech = input('Input a sentence: ')
    return jieba.analyse.extract_tags(speech, topK=20, withWeight=True)

def show_information(self):
    print('Here is chatbot backend, enter your choice.')
    print('- D)emo the data in speech.txt.')
    print('- E)xtract the name entity.')
    print('- G)ive me the TextRank.')
    print('- M)atch a sentence with rules.')
    print('- P)rint all rules in the rulebase.')
    print('- R)eload the base rule.')
    print('- O)utput all rules to rule.json.')
    print('- S)how me a random response of a rule')
    print('- B)ye.')

def init_jieba(self, seg_dic, userdic):

    """
    jieba custom setting.
    """

    jieba.load_userdict(userdic)
    jieba.set_dictionary(seg_dic)
    with open(userdic,'r',encoding='utf-8') as input:
        for word in input:
            word = word.strip('\n')
            jieba.suggest_freq(word, True)

def load_stopword(self, path):

    stopword = set()
    with open(path,'r',encoding='utf-8') as stopword_list:
        for sw in stopword_list:
            sw = sw.strip('\n')
            stopword.add(sw)
    return stopword

def word_segment(self, sentence):

    words = jieba.cut(sentence, HMM=False)
    #clean up the stopword
    keyword = []
    for word in words:
        if word not in self.stopword:
            keyword.append(word)
    return keyword

def rule_match(self, sentence, best_only=False, search_from=None, segmented=False):

    """
    Match the sentence with rules.

    Args:
        - sentence  : the string you want to match with rules.
        - best_only : if True, only return the best matched rule.
        - root      : a domain name, then the rule match will start
                      at searching from that domain, not from forest roots.
        - segmented : the sentence is segmented or not.
    Return:
        - a list of candiate rule
        - the travel path of classification tree.
    """
    keyword = []
    if segmented:
        keyword = sentence
    else:
        keyword = self.word_segment(sentence)

    if search_from is None: # use for classification (rule matching).
        result_list,path = self.rb.match(keyword,threshold=0.1)
    else:  # use for reasoning.
        result_list,path = self.rb.match(keyword,threshold=0.1,root=search_from)

    if best_only:
        return [result_list[0], path]
    else:
        return [result_list, path]


def get_response(self, rule_id):

    """
    Get a random response from the given rule's response'list.
    """
    rule = self.rb.rules[rule_id]
    res_num = rule.has_response()
    if res_num == 0:
        return None
    else:
        return rule.response[random.randrange(0,res_num)]

def test_speech(self):

    """
    Try matching all sentence in 'example/output.txt'
    """

    output = open('example/output.txt','w',encoding='utf-8')
    # load sample data
    with open('example/speech.txt','r',encoding='utf-8') as input:
        for speech in input:
            speech = speech.strip('\n')
            result,path = self.rule_match(speech)
            self.write_output(speech, result, path, output)

def write_output(self, org_speech, result, path, output = None):

    """
    Show the matching result.

        Args:
            - org_speech: the original input string.
            - result: a sorted array, refer match() in rulebase.py.
            - path: the travel path in classification tree.
            - output: expect as a file writer, if none, print
              the result to stdio.
    """
    result_information = ''
    result_information += "Case# " + str(org_speech) + '\n'
    result_information += "------------------\n"
    for similarity,rule,matchee in result:
        str_sim = '%.4f' % similarity
        result_information += str_sim+'\t'+path+rule+'\t\t'+matchee+'\n'
    result_information += "------------------\n"

    if output is None:
        print(result_information)
    else:
        output.write(result_information)

if name == 'main':
main()

Rulebase生成

您好,想請教專案中的rulebase是如何生成的呢?

關於demo.py, test.py執行之詢問

您好 感謝提供這個資源讓我們可以使用
如同題目 我在運行demo.py時遇到了問題(下附錯誤敘述)
我參照了其他使用者所提問過的
得知word2vec的model需要進行訓練
於是參考了您的另一篇文章成功訓練完成
測試demo.py (word2vec的 非本chatbot之demo.py) 也運行無誤
然而我將該model代回chatbot時
將demo.py, demo_chatbot.py 等等需要撰寫model路徑的檔案都改變後
卻仍舊出現錯誤訊息
希望po主能幫我解答 感激不盡

https://images.plurk.com/3eU8qiGvmUsxseftD1eQ.jpg

無法執行

大神你我因功課需要,需要製作聊天機器人
image
但是我無法執行這個專案

运行test.py 报错

Hello @zake7749

我先使用pip install word2vec成功后
root@ubuntu:~# pip install word2vec
Collecting word2vec
Using cached word2vec-0.9.2.tar.gz
Requirement already satisfied: numpy in /usr/local/lib/python3.5/dist-packages ( from word2vec)
Requirement already satisfied: cython in /usr/local/lib/python3.5/dist-packages (from word2vec)
Building wheels for collected packages: word2vec
Running setup.py bdist_wheel for word2vec ... done
Stored in directory: /root/.cache/pip/wheels/81/d0/9d/93f56c6111d24248341bbe35 5fd7d5ef6243f89260af5e91b3
Successfully built word2vec
Installing collected packages: word2vec
Successfully installed word2vec-0.9.2

我修改了model = models.Word2Vec.load('/usr/local/bin/word2vec')

接下来运行test.py

root@ubuntu:/home/liaotian/Chatbot/Chatbot/model# python test.py
2018-02-06 23:20:31,550 : INFO : loading Word2Vec object from /usr/local/bin/word2vec
Traceback (most recent call last):
File "test.py", line 41, in
main()
File "test.py", line 9, in main
model = models.Word2Vec.load('/usr/local/bin/word2vec')
File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 975, in load
return super(Word2Vec, cls).load(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/base_any2vec.py", line 629, in load
model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/base_any2vec.py", line 278, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/utils.py", line 395, in load
obj = unpickle(fname)
File "/usr/local/lib/python3.5/dist-packages/gensim/utils.py", line 1302, in unpickle
return _pickle.load(f, encoding='latin1')
_pickle.UnpicklingError: invalid load key, '.

出现了这个错误,请问怎么解决

谢谢

測試用資料集下載不到了

發現一個問題:QA 模組需要的”測試用資料集“下載的時候顯示"Oops! There was a problem with the network
Download"。

另外,請問一下,reply.rar裡面的這些xx.json文件是通過什麼方式產生出來的呢?
感謝!

对话意图抽取

您好,请问对话意图抽取这块是如何实现的,有详细的资料讲解么,对这块比较感兴趣,想深入了解下。

Web API

请问题主大大,如何让对话聊天在Web中呈现出来?Web API交互怎么实现的?

記憶回覆功能

你好,這個專案對我初學聊天機器人有很大的幫助,先謝謝你的分享。
我想了解一下這聊天機器人是怎樣達至記憶回覆功能?
demo
以左圖為例,聊天機器人如何記錄"高雄"這一個前面對話的選項呢?

繁体转换

@zake7749 请问题主大大,我发现问答测试资料集里面几乎都是回复繁体字,输入简体字也很难匹配成功,有什么方法可以把问答测试集里面的繁体字统一转换为简体字呢?

TypeError

TypeError: cannot use a string pattern on a bytes-like object
请问题主大大,这个问题如何解决?运行demo_chatbot.py后,能够运行成功,但输入一句话“很高兴认识你”时,无法得到一个确确的回答。
回复出现如下信息:
Traceback (most recent call last):
File "E:/python_work/pycharm/Chatbot-master/demo_chatbot.py", line 4, in
chatter.waiting_loop()
File "E:\python_work\pycharm\Chatbot-master\Chatbot\chatbot.py", line 65, in waiting_loop
res = self.listen(speech)
File "E:\python_work\pycharm\Chatbot-master\Chatbot\chatbot.py", line 109, in listen
response,stauts,target,candiates = self.getResponseOnRootDomains(target)
File "E:\python_work\pycharm\Chatbot-master\Chatbot\chatbot.py", line 149, in getResponseOnRootDomains
status,response = handler.get_response(self.speech, self.speech_domain, target)
File "E:\python_work\pycharm\Chatbot-master\Chatbot\task_modules\other\stock.py", line 30, in get_response
stock_no = self.get_stock_no(nm)
File "E:\python_work\pycharm\Chatbot-master\Chatbot\task_modules\other\stock.py", line 69, in get_stock_no
m = re.search('([0-9]{4}[ ]{2}|[0-9]{5}[ LRU]{1}|[0-9]6),([^ ]*)( *),',col)
File "D:\python\python-3.5.4\lib\re.py", line 173, in search
return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object

Process finished with exit code 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.