Giter Site home page Giter Site logo

yibuuu's Projects

anti-anti-spider icon anti-anti-spider

越来越多的网站具有反爬虫特性,有的用图片隐藏关键数据,有的使用反人类的验证码,建立反反爬虫的代码仓库,通过与不同特性的网站做斗争(无恶意)提高技术。(欢迎提交难以采集的网站)(因工作原因,项目暂停)

cix-extractor-py icon cix-extractor-py

基于行块分布函数的通用网页正文(及图片)抽取 - Python版本

crawlarticle icon crawlarticle

基于文字密度的新闻正文提取模块,兼容python2和python3,传入新闻网址或者网页源码即可返回标题,发布时间和正文内容。

cx-extractor-python icon cx-extractor-python

基于行块分布函数的通用网页正文抽取算法的Python版本实现,添加了英文支持/ Web page content extraction algorithm, support both Chinese and English

htmlcontent icon htmlcontent

extract meaningful text content from html of web page

htmlcontentextractor icon htmlcontentextractor

网页正文及正文图片提取,基于哈工大的《基于行块分布函数的通用网页正文抽取》算法

mergeexcel icon mergeexcel

合并一个excel中的多个sheet到一个新的excel

sixgod icon sixgod

正文提取|extract content from html

textsimilarity icon textsimilarity

基于向量空间模型(VSM)和潜语义索引(LSI)实现的多种文本相似度计算

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.