Giter Site home page Giter Site logo

jojoee / leo-profanity Goto Github PK

View Code? Open in Web Editor NEW
52.0 2.0 13.0 1.54 MB

:tiger: Profanity filter, based on "Shutterstock" dictionary

Home Page: https://jojoee.github.io/leo-profanity/

License: MIT License

HTML 16.90% JavaScript 82.55% TypeScript 0.54%
curse bad profanity swear dirty obscene

leo-profanity's Introduction

leo-profanity

continuous integration release runnable runnable old node runnable without optional dependencies Codecov Version - npm License - npm semantic-release Greenkeeper badge Mutation testing badge

Profanity filter, based on "Shutterstock" dictionary. Demo page, API document page

Installation

// npm
npm install leo-profanity
npm install leo-profanity --no-optional # install only English bad word dictionary

// yarn
yarn add leo-profanity
yarn add leo-profanity --ignore-optional # install only English bad word dictionary

// Bower
bower install leo-profanity
// dictionary/default.json

// githack
<script src="https://raw.githack.com/jojoee/bahttext/master/src/index.js"></script>
const filter = LeoProfanity
filter.clearList()
filter.add(["boobs", "butt"])

Example usage for npm

// support languages
// - en
// - fr
// - ru

var filter = require('leo-profanity');

// output: I have ****, etc.
filter.clean('I have boob, etc.');

// replace current dictionary with the french
filter.loadDictionary('fr');

// create new dictionary
filter.addDictionary('th', ['หนึ่ง', 'สอง', 'สาม', 'สี่', 'ห้า'])

See more here LeoProfanity - Documentation

Algorithm

This project decide to split it into 2 parts, Sanitize and Filter and these below is a interesting algorithms.

Sanitize

Attempt 1 (1.1): Convert all into lowercase string
Example:
- "SomeThing" to "something"
Advantage:
- Simple to understand
- Simple to implement
Disadvantage or Caution:
- Will ignore "case sensitive" word

Attempt 2 (1.2): Turn "similar-like" symbol to alphabet
Example:
- "@" to "a"
- "5" or "$" to "s"
- "@ss" to "ass"
- "b00b" to "boob"
- "a$$a$$in" to "assassin"
Advantage:
- Detect some trick words
Disadvantage or Caution:
- False positive
- Subjective, which depends on each person think about the symbol
- Limit user imagination (user cannot play with word)
  e.g. "[email protected]"
  e.g. user want to try something funny like "a$$a$$in"

Attempt 3 (1.3): Replace "." and "," with space to separate words
In some sentence, people usually using "." and "," to connect or end the sentence
Example:
- "I like a55,b00b.t1ts" to "I like a55 b00b t1ts"
Advantage:
- Increase founding possibility e.g. "I like a55,b00b.t1ts"
Disadvantage or Caution:
- Disconnect some words e.g. "[email protected]"

Filter

Attempt 1 (2.1): Split into array (or using regex)
Using space to split "word string" into "word array" then check by profanity word list
Example:
- "I like ass boob" to ["I", "like", "ass", "boob"]
Advantage:
- Simple to implement
Disadvantage:
- Need proper list of profanity word
- Some "false positive" e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)

Attempt 2 (2.2): Filter word inside (with or without space)
Detect all alphabet that contain "profanity word"
Example:
- "thistextisfunnyboobsanda55" which contains suspicious words: "boobs", "a55"
Advantage:
- Can detect "un-spaced" profanity word
Disadvantage:
- Many "false positive" e.g. http://www.morewords.com/contains/ass/, Clbuttic mistake (filter mistake)

In Summary

  • We don't know all methods that can produce profanity word (e.g. how many different ways can you enter a55 ?)
  • There have a non-algorithm-based approach to achieve it (yet)
  • People will always find a way to connect with each other (e.g. Leet)

So, this project decide to go with 1.1, 1.3 and 2.1.

(note - you can found other attempts in "Reference" section)

CMD

npm run test.watch
npm run validate
npm run doc.generate

# test npm publish
npm publish --dry-run

# mutation test
npm install -g stryker-cli
stryker init
export STRYKER_DASHBOARD_API_KEY=<the_project_api_token>
echo $STRYKER_DASHBOARD_API_KEY
npx stryker run

Other languages

Reference

leo-profanity's People

Contributors

adamgammell avatar d0rianb avatar dependabot[bot] avatar dzencot avatar greenkeeper[bot] avatar jojoee avatar raujimenez avatar robertbullen avatar stubbo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

leo-profanity's Issues

readme.md typo

it says filter.check() in one of the examples as a title but the following code says filter.clean() which i believe doesn't return a boolean answer.

index.spec.js tests not ran

Hi,

I'm working on a PR to add french dictionary to your module, but can't get the index.spec.js to work. Running npm test only run Utils.spec.js test.

Could you help me ?

Support multiple languages

Please support ability to use custom dictionaries with bad words for different languages or add more languages

How to Handle profanity in Quotes?

It seems our "users" are starting to notice that if they put a quote around a bad word it won't get filtered.
Is there an option to handle those situations?

The automated release is failing 🚨

🚨 The automated release from the master branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you could benefit from your bug fixes and new features.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can resolve this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here is some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


Invalid npm token.

The npm token configured in the NPM_TOKEN environment variable must be a valid token allowing to publish to the registry https://registry.npmjs.org/.

If you are using Two-Factor Authentication, make configure the auth-only level is supported. semantic-release cannot publish with the default auth-and-writes level.

Please make sure to set the NPM_TOKEN environment variable in your CI with the exact value of the npm token.


Good luck with your project ✨

Your semantic-release bot 📦🚀

[QUESTION] Multi-language

Does var filter = require('leo-profanity'); supports multiple language by default or should I create 2 var and load two dictionary?

Webpacked

Is there anyway you could make this for a js online application

Count words filtered

It's possible to have the words was filtered ?
This makes it possible to know if the text contains vulgar words, to be able to tag the content as with a particular state.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.