A search engine completed by Qingsong Lv, Shulin Cao and Yifan Wang. Our tutors are Qian Yin and Xin Zheng.
This repository is based on a national undergraduate scientific research and innovation project: simplified Chinese search engine for kids.
Sorry this project is not available now, but will be available soon.(maybe in one year)
When we did kidsearch project, we were sophomores. As time going by, we realize that there are more we can do to make it more valuable. So we decide to create this github repository. This project aims to tidy up codes of kidsearch which were written by us from 2016 to 2017 and make part of them opened. We will try our best to make this project a unified system and provide as many APIs as we can.
We think the best explanation of APIs should be comments of codes, but there will also be some tutorials available soon. If you want to get some literal thoughts now, related work may help.
The initial version of our project is based on Java(Lucene), Python(Crawler), PHP(frontend) and Socket(Communication). The most useful part we think is socket because we added multi-threading in it. Since Python is so popular at present, we also use PyLucene to replace Lucene and Django to replace PHP, which can simplify part of socket communications to build another Python version of our project. Both of the two versions will be open-sourced.
Actually, our project is mainly for simplified Chinese search engine. The reason for using English in documents and comments is that we think this project may also helpful to some other languages.
This project is aimed to help do some lightweight search engine tasks. So the running environment is mainly on Windows.
Python3.x (x>=5), Django(maybe django-rest is also needed?), PyLucene, Apache, MySQL.
Some other python packages are also needed: requests, ...
Make a wonderful convenient Python package to do tasks about search engine. Here is an ideal example:
import kidsearch as ks
webpages = ks.crawler(['http://www.61tom.com', 'http://www.61baobao.com/'], max_page=1000, max_depth=10)
indexes = ks.make_index(webpages)
results = indexes.search(key_words)
print(ks.show(results))
- 一种基于Python爬虫和Lucene检索的垂直搜索引擎的实现方法介绍 (blog)
- 儿童搜索引擎的现状与分析 (paper)
- 面向中文搜索引擎的网页结构化信息获取系统的设计与实现 (paper)
- 基于Lucene与Socket通信的中文搜索引擎的设计与实现 (paper)
- An Algorithm to Extract and Judge the Main Text Based on the Law of Total Probability (paper)
- KidSE: A Search Engine Designed for Children which Supports Simplified Chinese (paper)