Giter Site home page Giter Site logo

importcjj / sensitive Goto Github PK

View Code? Open in Web Editor NEW
626.0 13.0 152.0 142 KB

敏感词查找,验证,过滤和替换 🤓 FindAll, Validate, Filter and Replace words.

License: MIT License

Go 100.00%
trie sensitive golang word filter keyword go dirtywords text trie-tree

sensitive's Introduction

Sensitive

敏感词查找,验证,过滤和替换

FindAll, Validate, Filter and Replace words.

Build Status GoDoc

新增分支Aho-Corasick以支持AC自动机

用法

package main

import (
	"fmt"
	"github.com/importcjj/sensitive"
)

func main() {
	filter := sensitive.New()
	filter.LoadWordDict("path/to/dict")
	// Do something
}

AddWord

添加敏感词

filter.AddWord("垃圾")

Replace

把词语中的字符替换成指定的字符,这里的字符指的是rune字符,比如*就是'*'

filter.Replace("这篇文章真的好垃圾", '*')
// output => 这篇文章真的好**

Filter

直接移除词语

filter.Filter("这篇文章真的好垃圾啊")
// output => 这篇文章真的好啊

FindIn

查找并返回第一个敏感词,如果没有则返回false

filter.FindIn("这篇文章真的好垃圾")
// output => true, 垃圾

Validate

验证内容是否ok,如果含有敏感词,则返回false和第一个敏感词。

filter.Validate("这篇文章真的好垃圾")
// output => false, 垃圾

FindAll

查找内容中的全部敏感词,以数组返回。

filter.FindAll("这篇文章真的好垃圾")
// output => [垃圾]

LoadNetWordDict

加载网络词库。

filter.LoadNetWordDict("https://raw.githubusercontent.com/importcjj/sensitive/master/dict/dict.txt")

UpdateNoisePattern

设置噪音模式,排除噪音字符。

// failed
filter.FindIn("这篇文章真的好垃x圾")      // false
filter.UpdateNoisePattern(`x`)
// success
filter.FindIn("这篇文章真的好垃x圾")      // true, 垃圾
filter.Validate("这篇文章真的好垃x圾")    // False, 垃圾

sensitive's People

Contributors

bbbht avatar importcjj avatar nlimpid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sensitive's Issues

关键词过滤问题

你的这个库在本地环境使用关键字过滤是正常的,放到服务器的时候就用不了
服务上使用关键查询不管输入什么内容 都是匹配关键字成功 可是匹配出来的关键字是空

语句末尾字符会被错误的替换

代码:
fmt.Println(filter.Replace("hello", 42))
fmt.Println(filter.Replace("hello world", 42))
fmt.Println(filter.Replace("人", 42))
fmt.Println(filter.Replace("不办人事 人", 42))

输出结果:
hell*
hello worl*
*


如何释放一个Trim树,重新载入

功能:程序初始化载入到word,后续请求使用word这个指针直接处理即可。但又需要定时刷新词库,但如果重新AddWord会导致内存上涨,如何先释放后重新addWord

可以支持过滤含noise的文本吗?

package wordpackage_test

import (
	"github.com/importcjj/sensitive"
	"testing"
)

// 范例
func TestFilterExample(t *testing.T) {
	filter := sensitive.New()
	filter.AddWord("这句话应被屏蔽")

	actual := filter.Replace("这 句 话 应 被 屏 蔽", '*')
	expect := "*******"
	if actual != expect {
		t.Error("屏蔽失败")
	}
}

上方是测试用例,过滤语句中如果含noise,会导致替换失败,目前还不支持测试用例里的这种用法,可以加入支持吗?

A more serious readme is required

Hello,

Your project is great, the code is easy to understand and use, however, the ReadMe of this project seems to be funny but not that professional.

I want to use this lib but a little worried about Code Review. Are you guys considering make an Readme/doc for all users from many other countries?

Thanks for your hard work, it helps a lot

LoadWordDict无法加载文件

open ../dict/dict.txt: The system cannot find the path specified.

很奇怪的一件事,无法加载显示ioutil.ReadFile(path)这里出错,但是我自己重新写了一个加载文件的方法,采用绝对路径,可以加载成功,不知道问题出在哪里

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.