Giter Site home page Giter Site logo

llava_captcha_solver's Introduction

Captcha Solver

Disclaimer

This project demonstrates CAPTCHA solving techniques for research/educational purposes only. Please be aware that using this software to bypass CAPTCHAs on websites may violate their Terms of Service and/or have legal consequences.

Description

Very basic proof-of-concept google recaptcha solver that uses the LLaVA-v1.6-7b model to extract the object name and detect the object for each square. The solver relies solely on vision, no HTML or similar. It takes screenshots, and clicks the images at the given location. It also detects the grid size, and if new images are appearing. In my limited testing it was able to solve the captcha after a max. of 2 minutes, but is often much faster.

Here is a short video demonstrating the solver:

demo_video.mp4

Limitations

  • Requires a GPU with at least 16 gb of vram
  • Currently only works in Ubuntu, because:
  1. I detect the captcha window for exactly this os (and the button border only looks like part1_bottom_2.png in ubuntu)
  2. LLaVa currently only supports linux, and running it via Ollama is not accurate enough
  • If images disappear, it has to re-classify all images at the end
  • Only works for this specific recaptcha layout, if it changes, the reference images also have to be updated

Installation

  1. Follow installation instructions at LLaVA's Repo
  2. Install sudo apt install gnome-screenshot
  3. pip install protobuf PyAutoGUI opencv-python pillow
  4. Run the script main.py to solve a captcha, once its done it will close (the llava model should be automatically downloaded on first start)

Contributions

Contributions welcome! If you have any issues or improvements feel free to change the code or let me know by submitting a new issue.

llava_captcha_solver's People

Contributors

notune avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.