Giter Site home page Giter Site logo

baidutieba's Introduction

百度贴吧爬虫

这是一个基于 Python 的百度贴吧爬虫,用于获取指定关键词的贴吧帖子信息并存储到 CSV 文件中。

项目结构

  • 🚀 tieba.py:主要的爬虫脚本,用于爬取贴吧帖子信息。
  • 🎂 config.py:配置文件,用于设置搜索关键词、爬取起始页数和终点页数。
  • 🔗 requirements.txt:依赖的 Python 包列表。
  • 📦 data/{吧名}.csv:存储爬取到的数据。
  • 📩 logs/{吧名}.log:存储爬取过程中的日志信息。

使用方法

  • 1.⚡安装依赖:

    • 项目版本:Python 3.10.7
    pip install -r requirements.txt
  • 2.🌊配置参数:

    在 config.py 中设置需要爬取的贴吧关键词 KW、爬取起始页数 ST 和终点页数 PN。

  • 3.🚄运行脚本:

    python main.py

    脚本会开始爬取贴吧帖子信息,并将结果存储到 CSV 文件中。

  • 4.🌈功能特点:

    • ✅ 使用了 fake_useragent 库生成随机 User-Agent,增加了爬虫的隐蔽性。
    • ✅ 使用了 rich 库提供的进度条功能,使爬取过程更加可视化。
    • ✅ 支持设置爬取的起始页数和终点页数,灵活控制爬取范围。
    • ✅ 使用了多个账号的cookie构建cookie池,提高反爬能力,增加数据获取的健壮性。
  • 5.🚩注意事项:

    • 🚧爬取过程中请遵守网站的规则,不要过于频繁地进行请求,以免被封禁 IP。
    • 🚥请勿将爬取到的数据用于违法或商业用途,仅限个人学习和研究使用。

baidutieba's People

Contributors

viper373 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.