Giter Site home page Giter Site logo

binarycoder777 / book-search-engine-data Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 12.11 MB

图书搜索引擎python数据处理端,主要任务是从指定的站点爬取书籍信息,然后对数据进行清洗过滤

HTML 0.03% Python 96.07% PowerShell 0.03% Shell 0.02% C 2.72% Cython 0.47% XSLT 0.58% GAP 0.07%

book-search-engine-data's Introduction

# 图书搜索引擎 ---数据处理端

描述:现在通过PC、手机、电子阅读器都可以完成阅读行为,网络上“飘荡”着浩瀚的电子书,常常让网络读者不知所措。快速检索电子书成为读者关心的话题。类似google、百度、中搜这样的搜索引擎,多数只是把图书搜索作为其中一项分支功能提供给大众,搜索结果不全面时有发生。因此需要一个专业的图书垂直搜索引擎。从精确搜索电子书,到电子书相关推荐,涵盖方方面面,可以自由查看你喜欢的书籍,形成一一个书籍资料库,给用户阅读书籍带来更多的选择。

系统架构

该搜索引擎整体系统架构主要由python数据处理端和java服务端构成。python数据处理端主要任务是从指定的站点爬取书籍信息,然后对数据进行清洗过滤,得到想要的数据后存储到Elasticsearch中,Java服务端使用springboot集成Elasticsearch,并借助于Elasticsearch提供的相关API进行搜索引擎相关功能开发,将接口暴露给web端。用户通过web端进行访问使用该搜索引擎。具体实现架构图如下所示: image

项目模块说明

image

  • Items.py: 数据传输实体
  • Middlewares.py: 中间件
  • Pipelines.py: 数据处理管道
  • Settings.py: 项目配置信息
  • Spiders: 主要爬虫功能代码
  • Book.py:爬虫+数据处理
  • Temp.log: 项目日志信息

项目技术栈

项目所使用技术栈和算法:Scrapy、Goose、TestRank、PageRank

功能模块实现

整个python数据处理端实现了数据爬取、数据处理、数据存储等功能。

项目效果预览

image

image

book-search-engine-data's People

Contributors

binarycoder777 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.