Giter Site home page Giter Site logo

open-sourced-olaf / docverifier Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 4.0 7.39 MB

A tool to summarize and report any flaws in a long agreement/text.

Home Page: https://rnrep-5aaaa-aaaab-aa54a-cai.ic0.app/

License: MIT License

Jupyter Notebook 46.49% JavaScript 19.73% HTML 3.64% Python 19.97% CSS 10.16%
scraping-websites flask-api chrome-extension nlp-machine-learning dfinity naive-bayes-text-classification svm-classifier

docverifier's Introduction

DocVerifier

A tool to summarize and report any flaws in a long agreement/text. This tool will help us to protect ourselves from accepting malicious agreetments, privacy policies, terms and conditions etc. It uses Naive Bayes classification to make the predictions.

Deployed Url in DFINITY

Deployed API url

Deployed extension.

How does it work

For Chrome Extension

  1. The user clicks on the button in the chrome extension, we get the current tab Url.
  2. This Url is then passed to the backend via the flask API.
  3. We then scrape the URL using Beautiful soup in python, to get all the Privacy policies, Terms of services links present in the website.
  4. The Urls that we get after the first scraping is used to get the privacy policies text using another scraper that uses NLP to get the best result.
  5. These texts are stored in a file which is then provided to the ML model that uses Naive bayes classification method, to predict the bad sentences presen if any in the privacy texts.
  6. We then display all the malicious sentences in the chrome extension itself.

For DFINTY App

  • The user will upload any text document that will go the backend filesystem and will be provided to the ML model to make the predictions.
  • We then display all the malicious sentences in the web app.

Future Scopes

  • Making the ML model more efficient by getting more training datasets.
  • Also predict the flaws in the website cookies.
  • Integrating file upload in the chrome extension itself.

Prerequisites

  1. Git.
  2. Node & npm (version 12 or greater).
  3. A fork of the repo.
  4. Python3 environment to install flask
  5. DFINITY Canister SDK package(need access to a terminal shell for MacOS or Linux.)

How to run the project locally

  • Clone this repo to your local machine using https://github.com/Open-Sourced-Olaf/DocVerifier
  • Move to the cloned repository cd DocVerifier

Steps to run backend( Flask API)

In order to install all packages follow the steps below:

  1. Move to flask-api folder cd flask-api
  2. For installing virtual environment - python3 -m pip install --user virtualenv
  3. Create A Virtual env - python3 -m venv env
  4. Activate virtual env
    • For Mac/Linux : source env/bin/activate
    • For Windows : .\env\Scripts\activate
  5. pip3 install -r requirements.txt
  6. flask run

The model will be served on http://127.0.0.1:5000/

Steps To Set Up custom_greeting(Dfinty App)

  1. Move to custom_greeting folder
  2. Install all the npm packages npm install
  3. Start the Internet Computer network on your local computer by running the following command:dfx start --background
  4. To deploy the App, run dfx deploy
  5. To get the canister Id of assets, run dfx canister id custom_greeting_assets
  6. The deployed Url will look like this http://127.0.0.1:8000/?canisterId=rwlgt-iiaaa-aaaaa-aaaaa-cai
  7. Whenever we make any changes in the code, want to rebuild the website.
  • Run dfx build to rebuild the project
  • Then run dfx canister install --all --mode reinstall to deploy the project changes

Deploying to internet computer

  • Stop the internet computer using dfx stop
  • Check if our current internet connection will allow us to connect to the Internet Computer network: dfx ping ic
  • Build and deploy the sample application to the Internet Computer by running the command dfx deploy --network=ic

Loading the Chrome extension

  • Go to the chrome://extensions in the browser
  • Click on load unpacked and choose the chrome-extension folder.
  • Publish it in chrome web store
    • To publish your item to the Chrome Web Store, follow these steps:
    1. Create your item's zip file
    2. Create a developer account https://chrome.google.com/webstore/devconsole/
    3. Upload your item
    4. Add assets for your listing
    5. Submit your item for publishing

Directory Structure

The following is a high-level overview of relevant files and folders.

DocVerifier/
├── flask-api/
│   ├── datasets
│   ├── static/uploads
│   ├── model
│   ├── scraper
│   ├── templates
│   ├── .gitignore
│   ├── Procfile
│   ├── nltk.txt
│   ├── requirements.txt
│   ├── runtime.txt
│   ├── output.txt
│   └── app.py

└── custom_greeting/
    ├── node_modules/
    ├── src/
    │   ├── custom_greeting/
    │   │   ├── main.mo
    │   ├── custom_greeting_assets/
    │   │   ├── assets
    │   │   └── public
    ├── dfx.json
    ├── package.json
    |__ webpack.config.js
    |__ tsconfig.json
    |__ canister_ids.json
    |__ README.md
    |__ package-lock.json
    |__ .gitignore
 |
 |__chrome-extension
       |_ background.js
       |_ icon.png
       |_ manifest.json
       |_ window.html
       |_ icon.svg
       |_ style.css
|__images
       |_ demo.gif
|__jupyter-notebooks
       |_ privacy_policy_predictor.ipynb
       |_ web_Scraping.ipynb
|__ .gitignore
|__ CODE_OF_CONDUCT.md
|__ LICENSE
|__ README.md

Challenges we ran into

  • Collected the good and bad policies for training our model was a time consuming task.
  • Not finding any way to have file upload popup working in a chrome extension.

How to contribute.

  • Fork and clone the repository git clone https://github.com/Open-Sourced-Olaf/DocVerifier
  • Create a branch git checkout -b "branch_name"
  • Make changes in that branch
  • Add and commit your changesgit add . && git commit -m "your commit message"
  • Then push the changes into your branch git push origin branch_name
  • Now you can create a PR using that branch in our repository.
  • 🎉 you have successfully contributed to this project.

Useful links and references

Contributors ✨

Shoutout goes to these wonderful people:


Anjali Soni

💻

Steven Tey

💻

Shrill Shrestha

💻

Rashi Sharma

💻

docverifier's People

Contributors

anjalisoni3655 avatar rashi-s17 avatar shrillshrestha avatar steven-tey avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

docverifier's Issues

UI Changes for both chrome extension and dfinity web app.

  • Currently, we are just displaying the first bad sentence and then redirecting the user to a separate page to view the other bad sentences in a list view format.
  • Can we improve on that to make it more user-friendly.
  • Can we make changes to the DOM of the privacy texts and direct the user to this page itself after highlighting all those sentences.

create a flask api

  • create a flask API with an endpoint that will make a post request to send the current tab URL that we got from the chrome extension to the scraper so as to get the terms and conditions data.
  • We will store the terms and conditions data in a .txt file.
  • These .txt files will then be fed to the model to predict the output.

filter the privacy texts before feeding to datasets

  • currently, we use nltk library to extract all the texts which fall in the English dictionary.
  • But, the sentences may not be correct.
  • We need to define our own sentence grammar for that.
  • We can also compare each sentence with our existing dataset sentences.

add agreements/offer letter datasets

  • The model is currently just trained for bad and good privacy policies.
  • We need to add datasets for offer letters and any other form of long agreements to make the predictions.

predictions for cookies

  • we generally accept all the cookies without reading about them.
  • So, if a user will install and pin this extension.
  • This extension will get triggered immediately when visiting a website.
  • It will then scan the cookies and will display an alert/thumbs up icon in the extension icon only or thumbs down icon otherwise.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.