Giter Site home page Giter Site logo

swastikgowdal / mod-guard Goto Github PK

View Code? Open in Web Editor NEW
7.0 1.0 3.0 1.3 MB

An Open-source project for One-Stop-Shop content moderation, it is technology agnostic and provides various communication strategies out of the box!

License: MIT License

JavaScript 99.80% Dockerfile 0.20%
malicious-url-detection nsfw-detection profanity-detection profanity-filter

mod-guard's Introduction

Component 6

GitHub license GitHub issues GitHub forks GitHub stars

Mod-Guard

One-stop-shop for all kinds of content moderation - nsfw, weapons, alcohol, drugs, gore,offensive symbols imagery detection, & profanity/bad-words detection/filter & spam detection & spamEmail detection & malicious URL detection.

This is language agnostic , that means it doesn't matter which programming languages you are using , since we have implemented different strategies/patterns to communicate with various application, you can integrate it easily with our application as well.

Most of the tools I have used to build this project are open source, so if you are interested , you can implement them by yourself, this tool is just an abstraction of all the complexity that comes when integrating multiple tools for all these services, so this tool will provide you with an easy way to integrate those services in your application and communicate with it in multiple ways.

Table of Contents

Advantages / Key-features of Mod-Guard

Features Mod-Guard
NSFW image detection
Alcohol image detection
Weapons image detection
Drugs image detection
Gore image detection
Offensive symbol image detection
Malicious URL detection
Profanity text detection
Filter Profanity text
Spam text detection
Spam Email detection
Ability to provide metadata
Different communication strategies
NSFW.js for nudity detection saves Sightengine api points
Rate Limiter

Installing

Note : You have to self host this application , explanation for self hosting is provided below as well.

Prerequisites : Node.js, npm and git bash installed in your computer/laptop. If the Prerequisites are met you can run this application locally as well!

If you prefer the video, then watch this - video

Step 1: Fork and then clone the forked repo!

fork-clone

Step 2: Navigate inside the cloned repo, and from there open the terminal and run the following command, to install all the dependencies -

npm install

Step 3: Create .env file in the root directory of the cloned repo, like this - Mod-Guard/.env Mod-Guard is the cloned repo, and you should create .env file inside that!

mod guard

Getting Started / Usage

As i told you guys before I am using various free, open source and freemium service , so this tool need to connect to those services to funciton, and these services provide you with the free api key to use their service with some limitations, you need to specify those api keys in the .env file , and the application will make use of that .

  • In the .env file specify the port you want to use -

    PORT=<your-desired-port>
    

    If not specified, it will use port 3001 by default

  • In the .env file specify the authentication key that you want to use and make sure that you provide that auth key in the header of each and every request from the client, provide the auth key like this in the .env -

    AUTH_KEY=<your-auth-key>
    

    This is integrated for security purposes, and I have also implemented rate limiter for basic protection against attacks like DDoS and brute force , we have used the express-slow-down module for this , which is based on the token bucket slow down algorithm, that means if more that 10 requests are sent in 1 min from the same IP address, then our rate limiter is gonna make the subsequent request wait for 500 ms and this is implemented only on API strategy api endpoint and no rate limiter imposed on API + publisher/consumer strategy api endpoint. You can change the 10 request limit in the rateLimiter.js file in the middleware folder of this project.

  • For any imagery detection, I am using sightengine api service , go to their website & sign up to their free service , then they will provide you with an api user key and api secret key , specify those keys in the .env file like this -

    SIGHT_ENGINE_API_USER=<your-api-user-key>
    
    SIGHT_ENGINE_API_SECRET=<your-api-secret>
    
    BLOG=<your-blog-website-url>
    

    Its an awesome service , but the api requests that can be made is limited for free service , so if you want to do a lot of imagery detection, I recommend you buy a paid plan.

  • For Malicious URL detection and spam email detection , I am using the ipqualityscore service , they also provide you with generous free tier with some limitations , go to their website and sign up to their service , and then they will provide you with an api key, specify it, inside the .env file like this -

    MALICIOUS_URL_SCANNER_KEY=<your-api-key>
    

    Its is also an awesome service , but the api requests that can be made is limited for free service , so if you want to do a lot of malicious URL detection, I recommend you buy a paid plan.

  • For spam content detection , I am using Akismet api service , they also provide you with generous free tier with some limitations , go to their website and sign up for the service , after that they will email you the api key , specify it , inside the .env file like this -

    AKISMET_KEY=<your-api-key>
    

    It is also a good service , but the limitation is that you can use it for free only on personnel projects , if you want to use it for commercial projects , you have to buy the paid plan.

  • For publisher/consumer strategy and API+publisher/consumer strategy, I am using the RabbitMQ message broker , and that message broker is hosted and maintained by a cloud provider called CloudAMQP , go to their website create an account and choose their free tier and create an instance , for more details on creating an instance go through this video , once you create an instance, you will get the link to that cloud instance/server , specify it in the .env file like this -

    RABBITMQ_SERVER=<your-rabbiMQ-instance-link>
    

    This is not necessary if you are going to use only the API strategy, since it doesn't require a message broker.

If you have followed along the steps properly, your .env file should look something like this -

mod guard .env

There are various strategies for communicating with our tool -

  1. Api Strategy (Rest API's)
  2. API + Publisher/Consumer Strategy
  3. Publisher/Consumer Strategy (Asynchronous messaging/Message Broker)

Choose the appropriate strategy based on your use-case.

1. Api Strategy

Using this strategy is just like communicating with any other Rest api's out there!

API-strategy-drawio

Request structure (multipart/form-data) -

NOTE: Remember requests must be of multipart/form-data

Parameters Description
strategy (required) This field accepts either 1 or 2 as its value , 1 means API strategy , 2 means API + Publisher/Consumer strategy
image_file This field accepts image file , so provide the image file and make sure that you don't provide the image_link along with this as well and I have set the size limit of image to be less than 5 MB , you can increase/decrease that as well , in the multer.js file located in the middleware folder.
image_moderation This field accepts nudity,gore,wad,offensive as value, you can only choose from these moderation models.
image_link This field accepts complete URL/URI of the image , including the http or https , and make sure that you don't provide the image_file along with this as well
isProfane This field accepts text and returns whether the text was profane or not!
filter_profanity This field accepts text with or without profanity/bad-words and returns the profanity/bad-words filtered text
isMalicious This field accepts URL of a website and returns whether it is malicious or not!
isSpam[content] This field accepts text and detects whether the text is spam or not, for detecting it also needs ip address and useragent of the sender, perfect use-case : comments on a post, sending the ip address & useragent of the commenter
isSpam[ip] This field accepts ip address of the text sender, needed to help detect whether the text is spam or not
isSpam[useragent] This field accepts the browsers useragent of the text sender, needed to help detect whether the text is spam or not
isSpam[name] This field accepts name of the spam text sender , for more accuracy , but not compulsory
isSpam[email] This field accepts email of the spam text sender , for more accuracy , but not compulsory
isSpamEmail This field accepts any email and determines whether that email is marked as spam or not
metadata This is just the reference info that client can provide along with the request ,it can be of any type and this will be returned back with the response and this will not be stored by the server , use-case : if you are using our API + publisher/consumer strategy , you might need to know info of some request later down the road, so for that you can add some metadata on each and every request, which will be propagated through and will be available at the consumer end

For more info on using the API strategy go through this API Documentation , implementation using different programming languages are also provided along with an explanation and example requests.

Pros/Cons of using API strategy -

  1. It is quite simple to use.
  2. If you are sending a lot of data for moderation or if the image is large, then the Mod-Guard server can take quite a lot of time to send the response back to client, so there is latency involved and for tackling this problem I have implemented the other 2 strategies.

2. Api + Publisher / Consumer Strategy

api-pubcon

This strategy is provided for those who don't want to wait for the moderation response immediately and also don't want to learn to use or integrate message broker to their existing application, using this strategy , you send a rest api request along with the data to this tool and this tool will enqueue those data in the message queue and will immediately send the response back to you saying that the message data has been enqueued and that data will be processed later and the response will be available at the consumer end , there you can do whatever you want with the response.

Eg - Let's say you have an application like instagram, where the user posts an image, text etc.., you can send those to mod-guard for moderation , but the moderation will take too long if you have chosen the api strategy , but you have to give the user a response immediately whether the post has been posted or not , so at that time you can use this API+Publisher/consumer strategy wherein you send those data using rest api's but that data will not be processed immediately , they will be enqueued in the message queue for it to be processed later, after enqueuing the data (enqueuing of data will only take around 20-30 ms) successful, enqueued response will be sent to you and by this you can say to user that the post has been successfully posted , now the consumers will dequeue that data and will process and do moderation on that data and the response will be available , so you can do whatever you want with the response, like for eg you can delete the post if it contains an nsfw image and send a notification to the user as well or you can successfully save it in your database etc..,

Request structure (multipart/form-data) -

NOTE: Remember requests must be of multipart/form-data

The requests can be sent to either of the endpoints, to know the endpoints go through this API Doc.

  • One of the api endpoint is same as the above API strategy and the request format is also same but make sure that you specify 2 as the value for the strategy field. (Imp - This api endpoint has the rate limiter implemented on it, remember that, you can also change the rate limit value as well.)
  • Then there is another api endpoint dedicated only for this strategy , go through this API Doc for that. (There is no rate limiter implemented on this api endpoint since you will probably be sending many request and the response will not take too long as well.)

To use this strategy properly , navigate to root folder of my project and then inside that I have components folder and inside that I have modGuard folder, inside that I have consumers folder and inside that by default i have two instances of the same consumer called modGuardConsumer.js, on the 70th line of these two files you can write your code, there you will be able to to access the response of the moderation data.

Mod-Guard/components/modGuard/consumers/modGuardConsumer.js

modGuardConsumer

Pros/Cons of using API + Publisher/Consumer strategy -

  1. This strategy can be used by only those who know to work with js/node js, since you will have to write your javascript code to access the response of moderation data and do what ever you want with it.
  2. You can also write few lines of js to send the response of moderation data from consumer back to your server, and your server might be written in different prgramming language.
  3. Since the data is enqueued in the message queue and that message queue is made persistent, there is no data loss, hence reliable!
  4. Programmer doesn't have to learn about the message broker.

3. Publisher/Consumer Strategy

Pub-Con

In this strategy , the publisher will be integrated to your existing application , and that publisher will have the connection setup to the message queue which will be hosted in cloudAMQP, and this Mod-Guard tool will be hosted in any cloud hosting platforms (eg-heroku) and will act as only the consumer, which will dequeue the data that is being enqueued in the message queue and that data will be processed and the result of the moderation data will be available at the consumer end , just like the API+Publisher/Consumer strategy, you will write your own js code in the modGuardConsumer.js file to access the response of the moderation data, and do whatever you want with it.

modGuardConsumer

If you are using javascript/node.js, then you can use the modGuardPublisher.js file available in the publishers folder, which is inside the components folder and which is inside the root folder of the repo, Mod-Guard/components/modGuard/publishers/modGuardPublisher.js
You can just take this file and put it inside your code base and import it in your code and call the moderationDataPublisher() function and pass the moderation data as an argument to it , and that moderation data will be enqueued in the message queue and result of that moderation data will be available at the consumer end. But if you are using this publisher file in your code base, make sure that you install amqplib and dotenv packages.

2021-10-14-11-27

If you are not using js, then go through this website to implement publisher in your programming language and using that enqueue the moderation data in the "ModGuard" message queue.

Moderation data structure - Moderation data structure/format should be same as the request format in API strategy. Here instead of performing API request, you just pass the moderation data which is of type object as an argument to the moderationDataPublisher() function.

Pros/Cons of using Publisher/Consumer strategy -

  1. Need to know about message brokers and just the basics of RabbitMQ.
  2. This strategy is Highly recommended since there is not a lot of latency involved.
  3. This strategy and the 2nd strategy are reliable due to persistent message queue, hence there will be no data loss.

Self-Hosting video

Instagram-post-138

Examples

1. Api strategy :

  • Example 1 - API-strategy-eg-1
  • Example 2 - API-strategy-eg-2

2. Api strategy + Publisher/Consumer strategy:

  • Example 1 - Api strategy + Publisher/Consumer strategy-eg-1
  • Example 2 - Api strategy + Publisher/Consumer strategy-eg-2

Help Getting Started

If you need any help getting started with Mod-Guard, you can contact me on Instagram-SwastikGowda or on LinkedIn-SwastikGowda or email me at [email protected]

Docker Image

Below is the command to pull the docker Image, its hosted on Docker hub.

docker pull swastikgowda/mod-guard

NOTE : If you are using docker image, don't forget to pass the env variables in docker-compose file available in this repo!

Contributing

Please read through our contributing guidelines. Included are directions for opening issues, coding standards, and notes on development.

Creator

Swastik Gowda

Acknowledgments 👍

Thanks to these awesome services/tools and technologies!

Copyright and license

Code and documentation copyright 2021 Swastik Gowda. Code released under the MIT License.

mod-guard's People

Contributors

swastikgowdal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.