Giter Site home page Giter Site logo

crawl-enpower's Introduction

crawl-Enpower

2021高校人工智能创意赛赋能组作品-基于 EasyDL 文本的网络环境维护系统

###获取更新cookie: 使用Chrome打开https://www.bilibili.com, 登陆以后按F12键进入开发者模式---网络---XHR---随意打开一个接口---header---cookie 使用Chrome打开https://www.weibo.cn, 登陆以后按F12键进入开发者模式,同上 将cookie添加到spider的header中

微博id和视频bv号获取:

打开需要分析的微博或b站视频,复制url的最后一段

功能:

  • 选择社区,添加bv号,爬取ListBox中的评论内容,存入数据库中
  • 请求公有云部署的远程EasyDL模型接口,获得每个评论的分析结果
    • 基于文本分类的涉及政治、暴力、犯罪敏感舆论锁定
    • 基于情感倾向分析的积极氛围维护
    • 基于短文本单标签的相似谣言判断
    • 基于短文本单标签的用户隐私保护
    • 基于短文本单标签的广告检测
    • 基于短文本单标签的谣言破除
    • 基于短文本单标签的敏感言论判断
    • 对以上数据加权得到单个评论的违规指数‘index’
  • 对评论内容进行词频分析并画出词云图(开发中),实现可视化统计
  • 根据用户需求,查找违规用户,并采取处理

运行环境

  • Python3
  • Windows/Linux/MacOS

程序依赖

  • jieba==0.39
  • matplotlib==3.1.1
  • wordcloud==1.5.0
  • pymysql==0.9.3
  • mongoengine==0.23.1
  • tkinter

注意事项

  • 版权所有,禁止商用

后续开发

  • 违规用户处理,包括自动举报、提交至平台和笔记作者
  • 知乎、豆瓣等社区持续开发中。。。

crawl-enpower's People

Contributors

jiran214 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.