nlpchina / nlp-lang Goto Github PK
View Code? Open in Web Editor NEW这个项目是一个基本包.封装了大多数nlp项目中常用工具
License: Apache License 2.0
这个项目是一个基本包.封装了大多数nlp项目中常用工具
License: Apache License 2.0
测试代码如下:
TagContent tw = new TagContent("<em>", "</em>");
String content = "abc123";
List<Keyword> keywords = new ArrayList<Keyword>();
keywords.add(new Keyword("abc12", 1.0));
System.out.println(tw.tagContent(keywords, content));
输出结果为:<em>abc12</em>
svn: Failed to add directory 'E:\Search\eclipse-luna\workspace\nlp-lang\src\main\java\org\nlpcn\commons\lang\finger\util.svn': object of the same name as the administrative directory ,src\main\java\org\nlpcn\commons\lang\finger\util\下的.svn文件
乾坤 =》千坤
简直被坑惨了
我发现调用简繁转换的时候会自动套用词典
可以选择关闭词典套用功能吗?
例如此类代码:
for(ini i=0;i<100000;i++){
Pinyin.insertPinyin("疾风传",new String[]{"ji","feng","zhuan"});
}
是会后面的覆盖前面的?还是一直插入导致内存溢出?
是Trie不是Tire,对吧?貌似笔误了。
有 Golang 版本的没?支持多音字的,谢谢!
$ java -version
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
$ mvn compile
OK.
$ mvn test: FAILED; see below.
INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.nlpcn:nlp-lang:jar:0.2
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-jar-plugin is missing. @ line 78, column 12
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building nlp-lang 0.2
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ nlp-lang ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ nlp-lang ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ nlp-lang ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /Users/william/Documents/AtWork/github/nlp-lang/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ nlp-lang ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ nlp-lang ---
[INFO] Surefire report directory: /Users/william/Documents/AtWork/github/nlp-lang/target/surefire-reports
Running org.nlpcn.commons.lang.dat.DATMakerTest
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.106 sec <<< FAILURE!
test(org.nlpcn.commons.lang.dat.DATMakerTest) Time elapsed: 0.043 sec <<< ERROR!
java.lang.NullPointerException
at org.nlpcn.commons.lang.dat.DATMaker.maker(DATMaker.java:76)
at org.nlpcn.commons.lang.dat.DATMakerTest.test(DATMakerTest.java:14)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
loadTest(org.nlpcn.commons.lang.dat.DATMakerTest) Time elapsed: 0.007 sec <<< ERROR!
java.io.FileNotFoundException: 生成模型的路径 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:131)
at java.io.FileInputStream.(FileInputStream.java:87)
at org.nlpcn.commons.lang.dat.DoubleArrayTire.load(DoubleArrayTire.java:31)
at org.nlpcn.commons.lang.dat.DATMakerTest.loadTest(DATMakerTest.java:31)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Running org.nlpcn.commons.lang.dat.DATTest
Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.051 sec <<< FAILURE!
loadTextTest(org.nlpcn.commons.lang.dat.DATTest) Time elapsed: 0.012 sec <<< ERROR!
java.lang.NullPointerException
at java.io.Reader.(Reader.java:78)
at java.io.InputStreamReader.(InputStreamReader.java:97)
at org.nlpcn.commons.lang.util.IOUtil.getReader(IOUtil.java:63)
at org.nlpcn.commons.lang.util.FileIterator.(FileIterator.java:24)
at org.nlpcn.commons.lang.util.IOUtil.instanceFileIterator(IOUtil.java:200)
at org.nlpcn.commons.lang.dat.DoubleArrayTire.loadText(DoubleArrayTire.java:70)
at org.nlpcn.commons.lang.dat.DoubleArrayTire.loadText(DoubleArrayTire.java:57)
at org.nlpcn.commons.lang.dat.DoubleArrayTire.loadText(DoubleArrayTire.java:93)
at org.nlpcn.commons.lang.dat.DATTest.loadTextTest(DATTest.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
loadTest(org.nlpcn.commons.lang.dat.DATTest) Time elapsed: 0.016 sec <<< ERROR!
java.io.FileNotFoundException: /home/ansj/公共的/pinyin.obj (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:131)
at java.io.FileInputStream.(FileInputStream.java:87)
at org.nlpcn.commons.lang.dat.DoubleArrayTire.load(DoubleArrayTire.java:31)
at org.nlpcn.commons.lang.dat.DATTest.loadTest(DATTest.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
makerTest(org.nlpcn.commons.lang.dat.DATTest) Time elapsed: 0.02 sec <<< ERROR!
java.lang.NullPointerException
at org.nlpcn.commons.lang.dat.DATMaker.maker(DATMaker.java:76)
at org.nlpcn.commons.lang.dat.DATMaker.maker(DATMaker.java:48)
at org.nlpcn.commons.lang.dat.DATTest.makerTest(DATTest.java:11)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Running org.nlpcn.commons.lang.finger.FingerprintServiceTest
76cebd01faa63f38b45ea9756d26872c
76cebd01faa63f38b45ea9756d26872c
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.558 sec
Running org.nlpcn.commons.lang.index.MemoryIndexTest
[**]
[**]
[**]
[**]
init ok use time 547
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.538 sec
Running org.nlpcn.commons.lang.jianfan.JianFanTest
草莓是红色的
士多啤棃是紅色的
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.015 sec
Running org.nlpcn.commons.lang.pinyin.PinyinTest
[ma3, wan2, dai4, ma3, ,, ta1, qi3, shen1, guan1, shang4, dian4, nao3, ,, yong4, gun3, tang4, de5, kai1, shui3, wei2, zi4, ji3, pao4, zhi4, yi1, wan3, teng2, zhe5, re4, qi4, de5, lao3, tan2, suan1, cai4, mian4, 。, zhong1, guo2, de5, cheng2, xu4, yuan2, geng4, pian1, ai4, la1, shang4, chuang1, lian2, ,, zai4, hei1, an4, zhong1, xiang3, shou4, zhei4, du2, te4, de, mei3, shi2, 。, zhei4, shi4, xian4, dai4, gong1, ye4, ji3, yi1, tian1, xin1, ku3, lao2, zuo4, de5, ren2, zui4, hao3, de5, kui4, zeng4, 。, nan2, fang1, yi1, dai4, sheng1, chang2, de5, cheng2, xu4, yuan2, sui1, ran2, zai4, jing1, cheng2, duo1, nian2, ,, dan4, reng2, kou3, wei4, qing1, dan4, ,, ta1, men2, wang3, wang3, bu4, jia1, liao4, bao1, ,, you2, lian3, jia2, zi4, ran2, tang3, xia4, de, re4, lei4, bu3, chong1, qia4, dang1, de5, yan2, fen1, 。, ta1, men2, xiang1, xin4, ,, yong4, zhei4, zhong3, fang1, shi4, ,, neng2, gou4, mo3, ping2, si1, kao3, zhe5, xian4, zai4, shi4, bu4, shi4, guo4, qu4, xiang3, yao4, de5, wei4, lai2, er2, dai4, lai2, de5, da4, bu4, fen1, you1, shang1, …, xiao3, li3, de5, fu4, qin1, zai4, nian2, qing1, de5, shi2, hou4, ye3, shi4, cong2, ye2, ye2, shou3, li3, jie1, shou1, le5, zu3, chuan2, de5, dai4, ma3, ,, bu4, guo4, ling4, ren2, jing1, ya4, de, shi4, ,, dao4, le, xiao3, li3, zhei4, yi1, dai4, ,, hen3, duo1, dong1, xi1, dou1, yi2, shi1, le5, ,, dan4, shi4, cheng2, xu4, yuan2, ku3, bi1, de5, wei4, dao4, bao3, cun2, de, shi4, ru2, ci3, de5, wan2, zheng3, 。, , jiu4, zai4, 2, 4, xiao3, shi2, zhi1, qian2, ,, zui4, xin1, de5, xu1, qiu2, cong2, P, M, chu3, chuan2, lai2, ,, wei2, le, de2, dao4, zhei4, fen4, zi4, ran2, de5, kui4, zeng4, ,, ma3, nong2, men5, kai1, ji1, 、, xie3, ma3, 、, diao4, shi4, 、, zhong4, gou4, ,, si4, ji4, lun2, hui2, de5, deng3, dai4, huan4, lai2, zhei4, nan2, de2, de5, feng1, shou1, shi2, ke4, 。, ma3, nong2, zhi1, dao4, ,, xu1, qiu2, de, bao3, xian1, qi1, zhi1, you3, duan3, duan3, de5, liang3, tian1, ,, ma3, nong2, men5, yao4, yi3, zui4, kuai4, de5, su4, du4, dui4, dai4, ma3, jin4, xing2, jing1, zhi4, de5, jia1, gong1, ,, ren4, he2, yi1, ge4, xu1, qiu2, dou1, ke3, neng2, zai4, 2, 4, xiao3, shi2, zhi1, hou4, shi1, qu4, yuan2, ben3, de5, huo2, li4, ,, bian4, cheng2, yi1, wen2, bu4, zhi2, de5, la1, ji1, chuang4, yi4, 。]
[ma, wan, dai, ma, ,, ta, qi, shen, guan, shang, dian, nao, ,, yong, gun, tang, de, kai, shui, wei, zi, ji, pao, zhi, yi, wan, teng, zhe, re, qi, de, lao, tan, suan, cai, mian, 。, zhong, guo, de, cheng, xu, yuan, geng, pian, ai, la, shang, chuang, lian, ,, zai, hei, an, zhong, xiang, shou, zhei, du, te, de, mei, shi, 。, zhei, shi, xian, dai, gong, ye, ji, yi, tian, xin, ku, lao, zuo, de, ren, zui, hao, de, kui, zeng, 。, nan, fang, yi, dai, sheng, chang, de, cheng, xu, yuan, sui, ran, zai, jing, cheng, duo, nian, ,, dan, reng, kou, wei, qing, dan, ,, ta, men, wang, wang, bu, jia, liao, bao, ,, you, lian, jia, zi, ran, tang, xia, de, re, lei, bu, chong, qia, dang, de, yan, fen, 。, ta, men, xiang, xin, ,, yong, zhei, zhong, fang, shi, ,, neng, gou, mo, ping, si, kao, zhe, xian, zai, shi, bu, shi, guo, qu, xiang, yao, de, wei, lai, er, dai, lai, de, da, bu, fen, you, shang, …, xiao, li, de, fu, qin, zai, nian, qing, de, shi, hou, ye, shi, cong, ye, ye, shou, li, jie, shou, le, zu, chuan, de, dai, ma, ,, bu, guo, ling, ren, jing, ya, de, shi, ,, dao, le, xiao, li, zhei, yi, dai, ,, hen, duo, dong, xi, dou, yi, shi, le, ,, dan, shi, cheng, xu, yuan, ku, bi, de, wei, dao, bao, cun, de, shi, ru, ci, de, wan, zheng, 。, , jiu, zai, 2, 4, xiao, shi, zhi, qian, ,, zui, xin, de, xu, qiu, cong, P, M, chu, chuan, lai, ,, wei, le, de, dao, zhei, fen, zi, ran, de, kui, zeng, ,, ma, nong, men, kai, ji, 、, xie, ma, 、, diao, shi, 、, zhong, gou, ,, si, ji, lun, hui, de, deng, dai, huan, lai, zhei, nan, de, de, feng, shou, shi, ke, 。, ma, nong, zhi, dao, ,, xu, qiu, de, bao, xian, qi, zhi, you, duan, duan, de, liang, tian, ,, ma, nong, men, yao, yi, zui, kuai, de, su, du, dui, dai, ma, jin, xing, jing, zhi, de, jia, gong, ,, ren, he, yi, ge, xu, qiu, dou, ke, neng, zai, 2, 4, xiao, shi, zhi, hou, shi, qu, yuan, ben, de, huo, li, ,, bian, cheng, yi, wen, bu, zhi, de, la, ji, chuang, yi, 。]
[m, w, d, m, ,, t, q, s, g, s, d, n, ,, y, g, t, d, k, s, w, z, j, p, z, y, w, t, z, r, q, d, l, t, s, c, m, 。, z, g, d, c, x, y, g, p, a, l, s, c, l, ,, z, h, a, z, x, s, z, d, t, d, m, s, 。, z, s, x, d, g, y, j, y, t, x, k, l, z, d, r, z, h, d, k, z, 。, n, f, y, d, s, c, d, c, x, y, s, r, z, j, c, d, n, ,, d, r, k, w, q, d, ,, t, m, w, w, b, j, l, b, ,, y, l, j, z, r, t, x, d, r, l, b, c, q, d, d, y, f, 。, t, m, x, x, ,, y, z, z, f, s, ,, n, g, m, p, s, k, z, x, z, s, b, s, g, q, x, y, d, w, l, e, d, l, d, d, b, f, y, s, …, x, l, d, f, q, z, n, q, d, s, h, y, s, c, y, y, s, l, j, s, l, z, c, d, d, m, ,, b, g, l, r, j, y, d, s, ,, d, l, x, l, z, y, d, ,, h, d, d, x, d, y, s, l, ,, d, s, c, x, y, k, b, d, w, d, b, c, d, s, r, c, d, w, z, 。, , j, z, 2, 4, x, s, z, q, ,, z, x, d, x, q, c, P, M, c, c, l, ,, w, l, d, d, z, f, z, r, d, k, z, ,, m, n, m, k, j, 、, x, m, 、, d, s, 、, z, g, ,, s, j, l, h, d, d, d, h, l, z, n, d, d, f, s, s, k, 。, m, n, z, d, ,, x, q, d, b, x, q, z, y, d, d, d, l, t, ,, m, n, m, y, y, z, k, d, s, d, d, d, m, j, x, j, z, d, j, g, ,, r, h, y, g, x, q, d, k, n, z, 2, 4, x, s, z, h, s, q, y, b, d, h, l, ,, b, c, y, w, b, z, d, l, j, c, y, 。]
[zheng4, pin3, xing2, huo4, !]
[zheng4, pin3, hang2, huo4, !]
[ma3, wan2, dai4, ma3, ,, ta1, qi3, shen1, guan1, shang4, dian4, nao3, ,, yong4, gun3, tang4, de5, kai1, shui3, wei2, zi4, ji3, pao4, zhi4, yi1, wan3, teng2, zhe5, re4, qi4, de5, lao3, tan2, suan1, cai4, mian4, 。, zhong1, guo2, de5, cheng2, xu4, yuan2, geng4, pian1, ai4, la1, shang4, chuang1, lian2, ,, zai4, hei1, an4, zhong1, xiang3, shou4, zhei4, du2, te4, de, mei3, shi2, 。, zhei4, shi4, xian4, dai4, gong1, ye4, ji3, yi1, tian1, xin1, ku3, lao2, zuo4, de5, ren2, zui4, hao3, de5, kui4, zeng4, 。, nan2, fang1, yi1, dai4, sheng1, chang2, de5, cheng2, xu4, yuan2, sui1, ran2, zai4, jing1, cheng2, duo1, nian2, ,, dan4, reng2, kou3, wei4, qing1, dan4, ,, ta1, men2, wang3, wang3, bu4, jia1, liao4, bao1, ,, you2, lian3, jia2, zi4, ran2, tang3, xia4, de, re4, lei4, bu3, chong1, qia4, dang1, de5, yan2, fen1, 。, ta1, men2, xiang1, xin4, ,, yong4, zhei4, zhong3, fang1, shi4, ,, neng2, gou4, mo3, ping2, si1, kao3, zhe5, xian4, zai4, shi4, bu4, shi4, guo4, qu4, xiang3, yao4, de5, wei4, lai2, er2, dai4, lai2, de5, da4, bu4, fen1, you1, shang1, …, xiao3, li3, de5, fu4, qin1, zai4, nian2, qing1, de5, shi2, hou4, ye3, shi4, cong2, ye2, ye2, shou3, li3, jie1, shou1, le5, zu3, chuan2, de5, dai4, ma3, ,, bu4, guo4, ling4, ren2, jing1, ya4, de, shi4, ,, dao4, le, xiao3, li3, zhei4, yi1, dai4, ,, hen3, duo1, dong1, xi1, dou1, yi2, shi1, le5, ,, dan4, shi4, cheng2, xu4, yuan2, ku3, bi1, de5, wei4, dao4, bao3, cun2, de, shi4, ru2, ci3, de5, wan2, zheng3, 。, , jiu4, zai4, 2, 4, xiao3, shi2, zhi1, qian2, ,, zui4, xin1, de5, xu1, qiu2, cong2, P, M, chu3, chuan2, lai2, ,, wei2, le, de2, dao4, zhei4, fen4, zi4, ran2, de5, kui4, zeng4, ,, ma3, nong2, men5, kai1, ji1, 、, xie3, ma3, 、, diao4, shi4, 、, zhong4, gou4, ,, si4, ji4, lun2, hui2, de5, deng3, dai4, huan4, lai2, zhei4, nan2, de2, de5, feng1, shou1, shi2, ke4, 。, ma3, nong2, zhi1, dao4, ,, xu1, qiu2, de, bao3, xian1, qi1, zhi1, you3, duan3, duan3, de5, liang3, tian1, ,, ma3, nong2, men5, yao4, yi3, zui4, kuai4, de5, su4, du4, dui4, dai4, ma3, jin4, xing2, jing1, zhi4, de5, jia1, gong1, ,, ren4, he2, yi1, ge4, xu1, qiu2, dou1, ke3, neng2, zai4, 2, 4, xiao3, shi2, zhi1, hou4, shi1, qu4, yuan2, ben3, de5, huo2, li4, ,, bian4, cheng2, yi1, wen2, bu4, zhi2, de5, la1, ji1, chuang4, yi4, 。]
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.014 sec
Running org.nlpcn.commons.lang.standardization.SentencesUtilTest
**
123.1
你好。
123
hello
word
.
hello
word
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.017 sec
Running org.nlpcn.commons.lang.tire.splitWord.SmartGetWordTest
android 3
java 3
**人 3
0
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running org.nlpcn.commons.lang.util.StringUtilTest
true
hello ansj
'ansj','2134','123','123','123'
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running org.nlpcn.commons.lang.util.WordAlertTest
az az az az 09·
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec
Results :
Tests in error:
test(org.nlpcn.commons.lang.dat.DATMakerTest)
loadTest(org.nlpcn.commons.lang.dat.DATMakerTest): 生成模型的路径 (No such file or directory)
loadTextTest(org.nlpcn.commons.lang.dat.DATTest)
loadTest(org.nlpcn.commons.lang.dat.DATTest): /home/ansj/公共的/pinyin.obj (No such file or directory)
makerTest(org.nlpcn.commons.lang.dat.DATTest)
Tests run: 17, Failures: 0, Errors: 5, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.831 s
[INFO] Finished at: 2014-07-14T18:04:35+08:00
[INFO] Final Memory: 7M/184M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project nlp-lang: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/william/Documents/AtWork/github/nlp-lang/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
010Coffee这个词在构建DoubleArrayTire时出错
如题,别发到最后自己都分不清就尴尬了。
NLPchina/nlp-lang/src/main/resources/simp.txt 與 NLPchina/nlp-lang/src/main/resources/trad.txt 自 2015年7月之後未再更新。
我發現了一些問題. 例如,藴 (以下註為 T2) 是 蘊 (以下註為 T1) 的異體字。而 蕴 (以下註為 S) 是 蘊 (T1) 及 藴 (T2) 的簡體字. 請參見 http://www.cojak.org/index.php?function=code_lookup&term=8574
因此, 基於為了符合上述的異體與簡繁體關係
14197 蘊 (T1) 14197 蕴 (S)
14325 藴 (T2) 14325 藴 (T2)
14347 蘊 (T1) 14347 蕴 (S)
應修正為
14197 蘊 (T1) 14197 蕴 (S)
14325 藴 (T2) 14325 蕴 (S)
14347 蘊 (T1) 14347 藴 (T2)
此問題已在 Unihan 15.1.0 > Unihan_Variants.txt 中更正。 除此之外,我還發現 Unihan 15.0.0 > Unihan_Variants.txt 和 Unihan 15.1.0 > Unihan_Variants.txt 之間還有的其他修正需要合併到 simp.txt 和 trad.txt 裏。
您是否同意讓我根據 Unihan 15.1.0 > Unihan_Variants.txt 來更新 trad.txt 和 simp.txt 呢?
如同意,煩請告知 trad.txt 和 simp.txt 的編纂規則。 多謝。
Raymond Jou 周永瑞
FamilySearch International
[email protected]
(801)240-3871 office 辦公室
(408)568-8989 mobile & text 手機 及 簡訊
方便提供指纹去重的准确率吗?
地址更换了么?
跑mvn package一堆错误。
看了一下是找不到 pinyin.dic 和pinyin.obj, pinyin.obj
The program can potentially fail to release a system resource.
1.在library.properties里面指定用户自定义词典路径,发现自定义词不能导入
2.利用该方法也不能实现:UserDefineLibrary.loadFile
只有使用UserDefineLibrary.insertWord这种方法插入的用户词可以使用
public boolean contains(char c) {
if (this.branches == null) {
return false;
}
**return Arrays.binarySearch(this.branches, c) > -1; // 是否要改为 return Arrays.binarySearch(this.branches, new SmartForest<T>(c)) > -1;**
}
我这有simhash的实现代码,这个项目里面需要不
是否考虑下兼容多音字的问题呢,现在多音字拼音分词出来的结果只会取一个读音出来
算指纹方法里面的特征词计算的是HashSet的toString()方法的Hash值。
HashSet元素顺序与实现有关。
不同版本的JDK,HashSet顺序,很可能不一样。
不同环境,可能有不同指纹。
ansj加载自定义词典
#path of userLibrary this is default library
dic=library/default.dic
dic_name=library/name.dic
dic_company=library/company.dic
dic_term=library/term.dic
#redress dic file path
ambiguityLibrary=library/ambiguity.dic
stop_dic1=library/stop.dic
synonymsLibrary=library/synonyms.dic
#set real name
isRealName=true
#isNameRecognition default true
isNameRecognition=true
#isNumRecognition default true
isNumRecognition=true
分词使用的是 DictAnalysis:
Result terms = DicAnalysis.parse(sent1);
但分词结果一直没有变化,实在是找不到原因了,还望大侠解救。
环境如下:
OS: macOS 10.12.5
JDK: 1.8.0_65
ansj-seg: 5.1.2
nlp-lang: 1.7.2
BTW, 使用 DictLibrary.insert添加新词后没问题。
现在指定 source 和 target 都是1.6,而代码中使用了 try-with-resources 语法(例如DoubleArrayTire.java),会导致 mvn install 之类的命令执行失败。
把 source 和 target 改成7、8或者不指定的话,在本地就可以编译了。
http://www.nlpcn.org/docs/7/c0815cc4-4377-4990-a7a8-2325d76f260c
“基本tire树”中的“删除一个单词”,演示代码为
Library.insertWord(forest, "**");
实际应为
Library.removeWord(forest, "**");
你好,
如下代码
Pinyin.pinyinWithoutTone("日往月来")
会返回两个字符串:"ri"及"wang yue lai",而非理想的四个字符串:ri wang yue lai
用您的demo网站来标注该词时,也返回同样的情况:
[ri4, wang yue lai2]
请问是我对你的API理解有误,还是这是一个bug?
多谢诸位在NLP开源上的努力!
BF的实现可以参考Guava中的实现,那个实现似乎更好。
建议在项目中使用Guava作为基础类库,里面有非常很棒的特性,可以简化和提高程序的可读性,提高稳定性。
guava是google的基础java类库,没有依赖其他。
nlpcn demo上演示的关键词提取,情感趋向等复杂功能会开源么?如果开源大概会在什么时候放出?
DATMaker类中的getPre方法在目标词中间出现叠字,如“一机机床”,时会出现错误。
在编译器中运行没有问题,但是打包发布之后,简繁字典读取出现问题。
发现是因为客户端的编码问题导致,需要使用UTF-8编码读取。
org/nlpcn/commons/lang/dic/DicManager.java
92行开始
修改前:
private static Forest init(String dicName, InputStream is) {
return init(dicName, new BufferedReader(new InputStreamReader(is)));
}
修改后
private static Forest init(String dicName, InputStream is) {
BufferedReader reader = null;
try {
reader = IOUtil.getReader(is, IOUtil.UTF8);
return init(dicName, reader);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return null;
}
还有诸如巴基斯坦等等都有这样的问题
巴基斯坦 =》 巴基斯坦斯坦
厄瓜多尔 =》 厄瓜多尔 尔
【着、著】古今没有繁体写法,都是各自独立的写法
你都已经有nlpcn 域名了,为什么不把jar发布到maven **仓库呢。让全世界的maven仓库都可以下载到你这个jar,不然还要在pom里添加一个你的额外的源?
att。
我也不知道这是个什么鬼,全角的.?
总之我这边直接改源码加上了~
在69之后加上这个:
CHARCOVER['.'] = '.';
如题
安全柜转为拼音为anquanju,正确应该为anquangui
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.