Giter Site home page Giter Site logo

jo3-l / obscenity Goto Github PK

View Code? Open in Web Editor NEW
68.0 3.0 1.0 2.14 MB

Robust, extensible profanity filter for NodeJS

License: MIT License

TypeScript 99.48% JavaScript 0.52%
profanity profane obscene swearing antiswear swearwords swear-filtering swear-filter bad-words obscenity

obscenity's Introduction

About me

Hi! I'm Joseph Liu, an incoming CS student at the University of Waterloo.

If you're reading this, you probably know me in real life (👀) or through my involvement with the YAGPDB Discord bot, where I am an administrator and longtime contributor.

Current and past projects

Recently, I've been working on a language server and some related static analysis tools for YAGPDB's templating language in Rust. Some of my older projects include:

If any of these happen to align with your interests—or if you'd just like to get in touch—I'm always happy to talk. My email is jliu1602 [at] gmail.com; I am also active on Discord as jo3_l.

A slighter longer version of this introduction is on my website.

Some stats

(courtesy of GitHub Readme Stats)

obscenity's People

Contributors

hatscripts avatar jo3-l avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

hatscripts

obscenity's Issues

If the obscene word is at the beginning the censoring is not recovered

Expected behavior

Typing "fu" censors the word into "**" but continuing on to typing "funtastic" yields a "ntastic". It's similar to "as" filtering to "" but having "assembly" yields "**sembly".

Actual behavior

**ntastic

Minimal reproducible example

No response

Steps to reproduce

No response

Additional context

No response

Node.js version

v20.11.1

Obscenity version

v0.3.1

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

bug: Strange input results in false positive

Expected behavior

When I input the following string:

    "" ""
    "" ""
    "" ""
Assamese -> Assam

I expect that there should be no censoring.

Actual behavior

However, Assam becomes A*sam.

Strangely, modifying parts of the string, such as the quotes ("), results in no censoring.

Minimal reproducible example

import {
  RegExpMatcher,
  TextCensor,
  englishDataset,
  englishRecommendedTransformers,
  keepStartCensorStrategy,
  keepEndCensorStrategy,
  asteriskCensorStrategy
} from 'obscenity'

const matcher = new RegExpMatcher({
  ...englishDataset.build(),
  ...englishRecommendedTransformers
})

const strategy = keepStartCensorStrategy(keepEndCensorStrategy(asteriskCensorStrategy()))
const censor = new TextCensor().setStrategy(strategy)

const input = `    "" ""
    "" ""
    "" ""
Assamese -> Assam`

const matches = matcher.getAllMatches(input)
console.log(censor.applyTo(input, matches))

Steps to reproduce

  1. Run the above code
  2. View console

Additional context

No response

Node.js version

N/A

Obscenity version

0.2.0

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

request: French language support

Description

I'm working on a project that requires some french support. I saw https://github.com/darwiin/french-badwords-list/tree/master being adapted for https://github.com/jojoee/leo-profanity and was thinking of doing the same thing. I like how extensible this library is.

Solution

Similar to english.ts, the idea is to import and extract the array from https://github.com/darwiin/french-badwords-list/tree/master and build a dataset. I can work on a PR for it but can someone point me in the right direction for writing a test for this?

Code of Conduct

  • I agree to follow this project's Code of Conduct.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Warning

These dependencies are deprecated:

Datasource Name Replacement PR?
npm standard-version Available

Other Branches

These updates are pending. To force PRs open, click the checkbox below.

  • chore(deps): replace dependency standard-version with commit-and-tag-version ^9.5.0

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

github-actions
.github/workflows/codeql-analysis.yml
  • actions/checkout v4
  • github/codeql-action v3
  • github/codeql-action v3
  • github/codeql-action v3
.github/workflows/continuous-integration.yml
  • actions/checkout v4
  • pnpm/action-setup v4
  • actions/setup-node v4
  • actions/checkout v4
  • pnpm/action-setup v4
  • actions/setup-node v4
  • codecov/codecov-action v4
  • actions/checkout v4
  • pnpm/action-setup v4
  • actions/setup-node v4
npm
package.json
  • @commitlint/cli ^18.0.0
  • @commitlint/config-angular ^18.0.0
  • @jest/types ^29.5.0
  • @types/jest ^29.5.2
  • @typescript-eslint/eslint-plugin ^8.0.0
  • @typescript-eslint/parser ^8.0.0
  • conventional-github-releaser ^3.1.5
  • eslint ^8.57.0
  • eslint-config-prettier ^9.1.0
  • eslint-plugin-jest ^27.9.0
  • eslint-plugin-prettier ^4.2.1
  • fast-check ^2.25.0
  • gen-esm-wrapper ^1.1.3
  • is-ci ^3.0.1
  • jest ^29.7.0
  • jest-circus ^29.5.0
  • prettier ^2.8.8
  • rimraf ^6.0.0
  • standard-version ^9.5.0
  • ts-jest ^29.1.1
  • ts-node ^10.9.1
  • typedoc ^0.25.0
  • typedoc-plugin-markdown ^3.15.3
  • typescript ^5.2.2
  • node >=14.0.0
  • pnpm 9.6.0

  • Check this box to trigger a request for Renovate to run again on this repository

bug: Certain words not being censored

Expected behavior

Inputting all the EnglishProfaneWord, I expected all of them to be censored.

Actual behavior

As you can see on this CodePen, not all of the words get censored.

Minimal reproducible example

import {
  RegExpMatcher,
  TextCensor,
  englishDataset,
  englishRecommendedTransformers,
  keepStartCensorStrategy,
  keepEndCensorStrategy,
  asteriskCensorStrategy
} from 'obscenity'

const matcher = new RegExpMatcher({
  ...englishDataset.build(),
  ...englishRecommendedTransformers
})

const strategy = keepStartCensorStrategy(keepEndCensorStrategy(asteriskCensorStrategy()))
const censor = new TextCensor().setStrategy(strategy)

const words = 'abbo abeed africoon anal anus arabush arse ass bastard bestiality bitch blowjob boob boonga buttplug chingchong chink cock cuck cum cunt deepthroat dick dildo doggystyle double penetration ejaculate fag felch fellatio finger bang fisting fuck gangbang handjob hentai hooker incest jerk off jizz lubejob masturbate nigger orgasm orgy penis porn pussy rape retard scat semen sex slut tit tranny vagina whore'

const matches = matcher.getAllMatches(words)
console.log(censor.applyTo(words, matches))

Steps to reproduce

  1. View console
  2. Observe that not all words are censored

Additional context

Here is a less minimal CodePen with an input and output textarea: https://codepen.io/HatScripts/pen/NWJxEKW

Node.js version

N/A

Obscenity version

v0.1.4

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

Fix Typescript Types when using NodeNext module resolution

Hi! 👋

Firstly, thanks for your work on this project! 🙂

Today I used patch-package to patch [email protected] for the project I'm working on.

Using NodeNext as the typescript moduleResolution causes the types to be unresolved.

Here is the diff that solved my problem:

diff --git a/node_modules/obscenity/package.json b/node_modules/obscenity/package.json
index 899188c..580449a 100644
--- a/node_modules/obscenity/package.json
+++ b/node_modules/obscenity/package.json
@@ -6,8 +6,14 @@
   "module": "./dist/index.mjs",
   "types": "./dist/index.d.ts",
   "exports": {
-    "import": "./dist/index.mjs",
-    "require": "./dist/index.js"
+    "import": {
+      "types": "./dist/index.d.ts",
+      "default":"./dist/index.mjs"
+    },
+    "require": {
+      "types": "./dist/index.d.ts",
+      "default": "./dist/index.js"
+    }
   },
   "repository": {
     "type": "git",

This issue body was partially generated by patch-package.

Question around performance

I'm considering using the library for processing a lot of text so I'm wondering if performance is something that has been considered in the library code and testing? It would be interesting to add some information about performance in the readme.

bug: Unable to ban numbers

Expected behavior

Using this pattern with numbers

pattern`|666|`

I expect that to be matched

Actual behavior

it's not matched

Minimal reproducible example

import {
  DataSet,
  RegExpMatcher,
  englishRecommendedTransformers,
  pattern,
} from "obscenity";

const customDataset = new DataSet();
customDataset.addPhrase((phrase) => phrase.setMetadata({ originalWord: "666" }).addPattern(pattern`|666|`));
const matcher = new RegExpMatcher({
        ...customDataset.build(),
        ...englishRecommendedTransformers
      });

matcher.getAllMatches("666").length //=> 0

Steps to reproduce

  1. Use that code
  2. See 0, but expected 1

Additional context

No response

Node.js version

v16.6.1

Obscenity version

0.1.1

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

Package obscenity has been ignored because it contains invalid configuration. Reason: Package subpath './package.json' is not defined by "exports"

Expected behavior

I installed this package via npm to use in my react native 0.68.5 app

Actual behavior

I get this error on my react native app:

Package obscenity has been ignored because it contains invalid configuration. Reason: Package subpath './package.json' is not defined by "exports" in /Users/gabriel/Desktop/proyectos/MyApp/node_modules/obscenity/package.json

Minimal reproducible example

npm install obscenity

Steps to reproduce

npm install obscenity

Additional context

My react native version:

"react-native": "0.68.5",

Node.js version

16.0.0

Obscenity version

0.2.0

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

bug: Kung Fu false positive

Expected behavior

matcher.hasMatch('Kung-Fu') returns true

Actual behavior

matcher.hasMatch('Kung-Fu') returns false

Minimal reproducible example

import assert from 'node:assert'

import {
  englishDataset,
  englishRecommendedTransformers,
  RegExpMatcher,
} from 'obscenity'

const matcher = new RegExpMatcher({
  ...englishDataset.build(),
  ...englishRecommendedTransformers,
})

assert.equal(matcher.hasMatch('Kung-Fu'), false)
assert.equal(matcher.hasMatch('Kung Fu'), false)
assert.equal(matcher.hasMatch('Kung Fu Panda'), false)

// This one actually works
assert.equal(matcher.hasMatch('KungFu'), false)

Steps to reproduce

  1. Run the code above
  2. It falls with assert exception

Additional context

No response

Node.js version

v20.15.0

Obscenity version

0.2.1

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

bug: Using .addPhrase with Angular script optimization causes error that prevents Angular from bootstrapping

Expected behavior

I expected .addPhrase to include my added obscene term and build properly.

Actual behavior

Using .addPhrase in an Angular component and optimization: scripts = true in angular.json's build config causes the following error:
Cannot read properties of undefined (reading 'Literal');

This error prevented Angular from bootstrapping

Minimal reproducible example

Add this block to angular.json. "scripts: true" being the one that creates the issue. Seems like something to do with minification.

"optimization": { "styles": { "minify": true, "inlineCritical": true }, "scripts": true, "fonts": true },

Import the package to a component.
import { DataSet, RegExpMatcher, englishDataset, englishRecommendedTransformers, pattern } from 'obscenity';

In ngOnInit, add custom phrases to the DataSet

    const customDataSet = new DataSet()
      .addAll(englishDataset)
      .addPhrase((phrase) => phrase.addPattern(pattern`|damn|`))
      .addPhrase((phrase) => phrase.addPattern(pattern`|hell|`).addWhitelistedTerm('hello'));

    this.matcher = new RegExpMatcher({
      ...customDataSet.build(),
      ...englishRecommendedTransformers,
    });

...

Steps to reproduce

NOTE: This works fine in v0.1.4

  • Configure you Angular project to optimize scripts (minify) in the angular.json file
  • Using an Angular component...
  • Import englishDataset and DataSet
  • In ngOnInit, create a custom dataset that starts with the englishDataset
  • Add custom phrases to that DataSet using addPhrase => addPattern
  • Notice error: Cannot read properties of undefined (reading 'Literal');

Additional context

No response

Node.js version

v18.12.1

Obscenity version

v0.2.0

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

bug: Censoring of the n-word results in more asterisks than expected

Expected behavior

Actual behavior

matcher.getAllMatches('nigger') results in an array of length 2, when it should only be 1. This causes the resulting censored string to be n*********r, when it should be n****r.

Screenshot 2024-01-05 171832

Minimal reproducible example

import {
  RegExpMatcher,
  TextCensor,
  englishDataset,
  englishRecommendedTransformers,
  keepStartCensorStrategy,
  keepEndCensorStrategy,
  asteriskCensorStrategy
} from 'obscenity'

const matcher = new RegExpMatcher({
  ...englishDataset.build(),
  ...englishRecommendedTransformers
})

const strategy = keepStartCensorStrategy(keepEndCensorStrategy(asteriskCensorStrategy()))
const censor = new TextCensor().setStrategy(strategy)

const input = 'nigger'

const matches = matcher.getAllMatches(input)
console.log(matches)
console.log(censor.applyTo(input, matches))

Steps to reproduce

  1. Run above code
  2. View console

Additional context

No response

Node.js version

N/A

Obscenity version

v0.1.4

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

Emoji is not supported when a profanity is found next to that character

Expected behavior

Using obscenity to censor a string containing an emoji, like this one: 🤣bummer, and a dataset that contains the word bummer.

Using this strategy,

const CENSOR_STRATEGY = (censorContext) => ''.repeat(censorContext.matchLength);

for removing the profanities,

The expected output would be 🤣.

Actual behavior

Instead, the output is this: 🤣b. It matches the word bummer correctly, BUT when the matcher tries to find the matches, there's an error in the index.

Minimal reproducible example

const {
  englishDataset,
  parseRawPattern,
  DataSet,
  RegExpMatcher,
} = require('obscenity');

const data = new DataSet()
    .addAll(englishDataset)
    .addPhrase(phrase => 
      phrase
        .setMetadata({ originalWord: 'bummer' })
        .addPattern(parseRawPattern('bummer'))
    ).build();

const matcher = new RegExpMatcher({
    ...profanityDataset, // no transformers
  });

const stringBummer = '🤣bummer';
if (matcher.hasMatch(stringBummer)) {
  const matches = matcher.getAllMatches(stringBummer, true);
  return textCensor.applyTo(stringBummer, matches);
}
return stringBummer;

Steps to reproduce

No response

Additional context

No response

Node.js version

18.17.1

Obscenity version

0.3.1

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

Request: Symbols that can represent multiple letters

Description

I've tried with a few combinations of EnglishTransformers, but I haven't been able to correctly censor words like sh*t or f*ck. In both cases, words should be censored, however, in the first word * represents an i and * represents a u. Is there a way to create a new transformer for multiple letters/regex?

Solution

I do not know how this can be implemented. Looking at the L33tspeak transformer, I can see there's a map per character:

	['a', '@4'],
	['c', '('],
	['e', '3'],
	['i', '1|'],
	['o', '0'],
	['s', '$'],

However, I don't know how it would work for multiple characters where for example, we could have

	['*', 'any_letter_or_vowel_etc.'],

Code of Conduct

  • I agree to follow this project's Code of Conduct.

request: Censor the word "shit"

Description

I was surprised to realize that this library doesn't censor the word "shit" by default, given that it's one of the most common English swear words.

Solution

I'm not fully versed with the pattern syntax used by this project, but here's my attempt at implementing it:

.addPhrase((phrase) => 
  phrase
    .setMetadata({ originalWord: 'shit' })
    .addPattern(pattern`shit`)
    .addWhitelistedTerm('s hit')
    .addWhitelistedTerm('sh it')
    .addWhitelistedTerm('shi t')
    .addWhitelistedTerm('shitake')
)

This should cover words where "shit-" is the prefix ("shitty", "shite", "shithead", etc.), as well as words where "-shit" is the suffix ("bullshit", "dipshit", "batshit", etc.)

Code of Conduct

  • I agree to follow this project's Code of Conduct.

bug: Memory leak when using an empty string

Expected behavior

Proper error message

Actual behavior

JavaScript heap out of memory

❯ node index.js 

<--- Last few GCs --->

[79252:0x4d7f380]    21093 ms: Mark-sweep (reduce) 4080.9 (4142.9) -> 4080.7 (4141.9) MB, 1573.5 / 0.0 ms  (+ 1.8 ms in 2 steps since start of marking, biggest step 1.8 ms, walltime since start of marking 1584 ms) (average mu = 0.146, current mu = 0.098) [79252:0x4d7f380]    21095 ms: Scavenge 4082.3 (4142.4) -> 4081.3 (4143.4) MB, 1.2 / 0.0 ms  (average mu = 0.146, current mu = 0.098) allocation failure 


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xafedf0 node::Abort() [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 2: 0xa1814d node::FatalError(char const*, char const*) [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 3: 0xce795e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 4: 0xce7cd7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 5: 0xeb16b5  [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 6: 0xeb21a4  [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 7: 0xec0617 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 8: 0xec39cc v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
 9: 0xe862ec v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
10: 0x11f3156 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
11: 0x15c9ed9  [/home/jeremy/.asdf/installs/nodejs/16.6.1/bin/node]
Aborted (core dumped)

Minimal reproducible example

import {
  DataSet,
  RegExpMatcher,
  englishRecommendedTransformers,
  pattern,
  parseRawPattern,
} from "obscenity";

const customDataset = new DataSet();
const bannedChatWords = [""];
bannedChatWords.forEach((item, _idx) => {
  const word = item.toLowerCase();
  customDataset.addPhrase((phrase) => {
    return phrase
      .setMetadata({ originalWord: word })
      .addPattern(parseRawPattern(word))
      .addPattern(pattern`|${word}`)
      .addPattern(pattern`${word}|`)
  });
});

const customMatcher = new RegExpMatcher({
  ...customDataset.build(),
  ...englishRecommendedTransformers
});

function messageViolation(message) {
  return  customMatcher.getAllMatches(message).length > 0;
}

console.log("test", messageViolation("test"))

Steps to reproduce

  1. Save that code to index.js
  2. run node index.js
  3. ...
  4. Profit?

Additional context

The words come from user generated content. My app was improperly storing an empty string. When the code would try to dynamically generate the banned word list with an empty string in the mix, it would tank the site.

Node.js version

v16.6.1

Obscenity version

obscenity@^0.1.1:
version "0.1.1"

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.