Giter Site home page Giter Site logo

decaptcha's Introduction

Decaptcha

English version Readme is available here .

通过简单的图像识别算法来完成验证码识别,打算把机器学习中的分类算法全部使用一遍。

使用方法

  1. 爬取验证码
  2. 对图像做处理并切分
  3. 手工标注数据
  4. 导入训练集
  5. 使用测试集

前期准备

  1. Image (图像处理库)

  2. numpy (数学处理库)

  3. ImageEnhance (图像处理库)

enhancer = ImageEnhance.Contrast(img)    # 增加对比对
img = enhancer.enhance(2)
enhancer = ImageEnhance.Sharpness(img)   # 锐化
img = enhancer.enhance(2)
enhancer = ImageEnhance.Brightness(img)  # 增加亮度
img = enhancer.enhance(2)

图像处理

静态图片

  1. 清除图片噪点
  2. 清除图片干扰线
  3. 切割图片
  4. 信息输出

动态图片

  1. 按帧转存 GIF
  2. 读取每个 GIF 的 Duration 属性
  3. 找到 Duration 最长的图片,后同静态图片处理

识别算法

KNN

# kNN algorithm
def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet
    sqDiffMat = diffMat ** 2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances ** 0.5
    sortedDistIndicies = distances.argsort()
    classCount = {}
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]  # changed
        classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]

SVM

根据算法的性质,可以问题设定成一个二分类问题:识别数字1和2(当然也可以是其他的任意两个数字)。

参考

License

MIT

decaptcha's People

Contributors

yaoshicn avatar zghgchao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.