Giter Site home page Giter Site logo

jigou_pachong's Introduction

JiGou_PaChong

一个涉及动态加载、PDF 和 Word 文件类型的网络爬虫项目(大二期间学习爬虫时做的)

动态加载

主要采用的是 seleium 库模拟浏览器进行数据的请求和加载,来获取到最终的网页数据源

json 数据解析

因为网站的数据资源较大,所以我们在浏览网页的时候是在浏览解析后的 json 数据,只要找到当前网页的 json 数据地址就可以进一步请求到网页的数据源,更好的获得我们所需要的数据。

PDF 和 Word

PdF 和 Word 类型的数据,我在浏览的时候网页上没有现成的数据,只有一个 PDF 和 Word 文件的链接,只有下载后才能进一步的查看我们所需要的数据,在代码中可以统一下载和解析文件。

使用方法

每个文件夹中的代码都可以运行,但是运行之前需要先下载所需要的库,chromedirve.exe 根据自己使用的浏览器版本进行 下载

chromedrive.exe 文件的放置:

  • python 安装后的根目录
    • 具体安装位置请自己查找
  • anaconda 安装后的根目录
    • 具体安装位置请自己查找

jigou_pachong's People

Contributors

masonsxu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.