Giter Site home page Giter Site logo

text-censor's Introduction

NPM Version

Introduction

A simple&basic text filter that is under censor of GFW, with DFA.

为nodejs提供的简单GFW敏感词过滤器,使用DFA实现。

Usage

var tc = require('text-censor')
tc.filter('Ur so sexy babe!',function(err, censored){
    console.log(censored) // 'Ur so ***y babe!'
})

If you want to add key words of your own, simply add them to the end of 'keywords' file, one word per line.

在'keywords'文件末尾增加自定义敏感词,每行一个。

Performance

Under 1ms for a 10-20 words sentence. Around 10ms for 1000 words.

10-20字的短句在1ms以内替换完成,1000字左右需要10ms左右

Thanks

Keyword list from https://github.com/observerss/textfilter

License

MIT

text-censor's People

Contributors

aojiaotage avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

text-censor's Issues

关于执行


已经解决了.原来log会有问题,实际已经替换了...
var tc = require('text-censor')
tc.filter('Ur so sexy babe!',function(err, censored){
console.log(censored) // 'Ur so ***y babe!'
})

node执行结果'Ur so sexy babe!'
mocha 执行结果 'Ur so ***y babe!'
npm -v 5.3.0
node -v v8.2.1
不明白是怎么回事,现在只是想把这个封装成接口做敏感词过滤,不需要替换,只需要得到非法词.想稍微修改洗啊,现在执行无法过滤了...

修改filter方法

var fs = require('fs')

var path = __dirname + '/keywords'

var map = {}

var lineReader = require('readline').createInterface({
input: fs.createReadStream(path, { encoding: 'UTF-8' })
});

lineReader.on('line', function (line) {
if (!line) return
addWord(line)
});

function addWord(word) {
var parent = map

for (var i = 0; i < word.length; i++) {
if (!parent[word[i]]) parent[word[i]] = {}
parent = parent[word[i]]
}
parent.isEnd = true
}

function filter(s) {
return new Promise((res) => {
lineReader.on('close', () => {
var parent = map

  for (var i = 0; i < s.length; i++) {
    if (s[i] == '*') {
      continue
    }

    var found = false
    var skip = 0
    var sWord = ''

    for (var j = i; j < s.length; j++) {

      if (!parent[s[j]]) {
        // console.log('skip ', s[j])
        found = false
        skip = j - i
        parent = map
        break;
      }

      sWord = sWord + s[j]
      if (parent[s[j]].isEnd) {
        found = true
        skip = j - i
        break
      }
      parent = parent[s[j]]
    }

    if (skip > 1) {
      i += skip - 1
    }

    if (!found) {
      continue
    }

    var stars = '*'
    for (var k = 0; k < skip; k++) {
      stars = stars + '*'
    }

    var reg = new RegExp(sWord, 'g')
    s = s.replace(reg, stars)

  }

  res(s)
})

})

}

module.exports = {
filter: filter
}

关于过滤和渲染

你好,我现在用你的过滤器做网站敏感词过滤。
我的用法是在render的回调中取到html字符串,然后通过你的这个包来过滤,但是问题是,现在连好多标签都被过滤了。当然,像你的示例那样做是没有问题的,只是我的需求得全站过滤呀?下面是代码

var express = require('express');
var router = express.Router();
var textCensor = require('text-censor');
console.log(textCensor)



router.get('/', function(req, res, next) {
	res.render('tpl/about', {
		title: '关于我们',
		routerName: 'about'
	},function(cerr,chtml){
		if(cerr){
			res.sendStatus(500);
			return;
		}
		textCensor.filter(chtml,function(tcerr,tcstr){
			if(tcerr){
				res.sendStatus(500);
				return;
			}
			console.log(tcstr)
			res.send(tcstr)
		})
	});
});

module.exports = router;

qq 20161229113428

这效果不行啊....

var tc = require('./index.js');
setTimeout(function () {

	tc.filter('Ur so sexy babe! 摸你鸡巴', function (err, censored) {
		console.error("result", censored,);
	});

}, 20000)

result Ur so ***y babe! **鸡巴

无法引入keyword文件

源码中

var path = './keywords'

导致npm安装后,使用

var tc = require('text-censor')

引用时,会在项目的根目录寻找keyword文件,而无法直接引用模块目录中的keyword文件

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.