Giter Site home page Giter Site logo

redhuntlabs / octopii Goto Github PK

View Code? Open in Web Editor NEW
563.0 12.0 47.0 4.45 MB

An AI-powered Personal Identifiable Information (PII) scanner.

Home Page: https://redhuntlabs.com/blog/octopii-an-opensource-pii-scanner-for-images.html

License: Other

Python 100.00%
cybersecurity image-processing machine-learning ocr optical-character-recognition pii pii-detection nlp python blackhat

octopii's People

Contributors

0x4f53 avatar othmanalikhan avatar owais-redhunt avatar umair-rhl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

octopii's Issues

"WARNING:tensorFlow:No training configuration found..." When running tool

Greetings,

I became aware of this project via Intigriti's Bug Bytes newsletter. I went through the install using venv, but found that the following error is returned when I run the tool against the 'dummy-pii' local directory and the 'https://pii-carbonconsole.fra1.digitaloceanspaces.com' URL.

image

It seems to be working as expected as it returns a confidence value for the sample images containing "PII". I am running the tool within Kali 2022.1 using Python 3.9.12 within a virtualenv using venv. A GitHub issue for another project that lead me to add ", compile=False" to line 214 of the octopii.py script

image

I don't really understand the implications of the change, but it did result in the error no longer being returned. As I mentioned earlier, the tool seems to be working as expected, so to me it kind of seems like it is just "cosmetic".

This is an exciting project. Thank you for the time and effort put into developing it and sharing it with the world!

UnboundLocalError: local variable 'contains_faces' referenced before assignment

Describe the bug
When running the tool on a directory without images or PDF files, an UnboundLocalError is raised because the variable contains_faces has not been initialized. I believe that adding contains_faces = 0 at the beginning of the search_pii(file_path) function will solve the issue.

To Reproduce
Steps to reproduce the behavior:

  1. Create a directory dir only with text files
  2. Run python3 octopii.py dir/

Expected behavior
Octopii runs successfully

Octopii crashes on empty files

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:
Run octopii against a folder with a 0 byte file in it

Traceback (most recent call last):
File "/opt/Octopii/octopii.py", line 199, in
results = search_pii (file_path)
File "/opt/Octopii/octopii.py", line 80, in search_pii
addresses = text_utils.regional_pii(text)
File "/opt/Octopii/text_utils.py", line 80, in regional_pii
place_entity = locationtagger.find_locations(text = text)
File "/usr/local/lib/python3.10/dist-packages/locationtagger/init.py", line 4, in find_locations
e = NamedEntityExtractor(url=url, text=text)
File "/usr/local/lib/python3.10/dist-packages/locationtagger/locationextractor.py", line 25, in init
raise Exception('Please input any text or url')
Exception: Please input any text or url

Expected behavior
It not to crash when a file is 0 bytes

ModuleNotFoundError: No module named 'cv2'

Describe the bug
ModuleNotFoundError: No module named 'cv2'

To Reproduce
Steps to reproduce the behavior:

  1. Run python3 octopii.py dummy-pii/ (Windows 11)

Expected behavior
Octopii runs successfully

Feature request: portable app

Would there be an easy way to make this portable so I could toss it on a thumb drive and run it on a random workstation?

Questions about "confidence_score"

Hi, I'm watching your sources and got some curiosity about your confidence scores.
Can I know your indications about your confidence scores? What are your standards about the score?

Windows

Is your feature request related to a problem? Please describe.
I have a use case which is where I want to scan through backup files with Octopii on an SMB share. The capability works for this but there are some additional steps in that I have to make sure my Linux machine has access to the SMB share or the Backup file in question. If we could enable this to work on Windows as well this would help my use case.

Describe the solution you'd like
I am not sure how big this lift is, more than happy to help where possible. I have added the errors below that I see after confirming that the dependencies for windows are available.

It is not the end of the world but being able to run this from a Windows box would be better than having a dedicated Linux box for this task.

Additional context
When I run on Windows where I have already installed Tesseract I get the following:

 Octopii  python .\octopii.py .\dummy-pii\
Traceback (most recent call last):
  File "C:\Users\Administrator\Documents\Octopii\octopii.py", line 123, in <module>
    rules=text_utils.get_regexes()
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\Documents\Octopii\text_utils.py", line 52, in get_regexes
    _rules = json.load(json_file)
             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3062: character maps to <undefined>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.