Giter Site home page Giter Site logo

Comments (10)

BestBin avatar BestBin commented on May 27, 2024 2

我下载了源码,已经找到bug了,顺便在bug周围发现了点小问题给你建议一下。

1.bug:
JiebaNet.Analyser.TfidfExtractor 文件第34-37行
if (StopWords.IsEmpty())
{
StopWords.UnionWith(DefaultStopWords);
}
此处的StopWords在IsEmpty()==true的情况下是可以为null的,那么就不能.UnionWith()方法,会报我的这个“未将对象引用设置到对象的实例”的错误。
建议修改JiebaNet.Analyser.KeywordExtractor的SetStopWords方法,
将StopWords = new HashSet();移到if前面,让StopWords不为null。

2.建议:
发现JiebaNet.Analyser.ConfigManager中有这样一行代码来指定路径目录:
ConfigurationManager.AppSettings["JiebaConfigFileDir"] ?? "Resources";
config配置为空的时候,直接赋值Resources是不合适的吧?

虽然你最终调用了Path.GetFullPath()方法来获取绝对路径,但是很有可能获取到的绝对路径并不是我最终启动项目的绝对路径(因为Resources我放在了启动项目下)。
我实机测试将jieba相关放在一个类库,在MVC里放Resources文件夹引用类库方法,VS2013调试启动,断点到Path.GetFullPath()生成的路径是C盘IIS的路径。

所以建议替换成:HttpContext.Current.Server.MapPath("/Resources/")

今天一天公司都没网,一来网就来回复了~~

from jieba.net.

anderscui avatar anderscui commented on May 27, 2024

@BestBin 我这里是可以的,也是0.38.3,通过NuGet安装,项目类型为ASP.NET MVC.

from jieba.net.

BestBin avatar BestBin commented on May 27, 2024

@anderscui

from jieba.net.

anderscui avatar anderscui commented on May 27, 2024

@BestBin 很感谢你的调试:)按照你说的改为空的HashSet,等下次发布再更新。但是这里默认情况下是有一个stopwords文件的,你的配置文件夹里没有吗?

关于你的建议:我这里本意还是希望通过绝对路径来配置,直接使用Resources是兼容最开始的配置方式,当时主要考虑的是控制台应用程序。现在不管对于什么类型的应用,还是直接用绝对路径吧:)

还有一个问题,你对.NET Core熟悉吗?将以前的代码移植到.NET Core工作量大吗?

from jieba.net.

BestBin avatar BestBin commented on May 27, 2024

@anderscui 哈哈,就是因为第二个文件路径的问题,造成我读取不到stopwords文件,所以一直为空~~

.NET Core 只是自己玩过哦,没啥实际的深入理解。我估摸着移植到.net Core问题不会太大,纯属个人看法~仅供参考,哈哈!

from jieba.net.

zuiyuewentian avatar zuiyuewentian commented on May 27, 2024

@BestBin 非常感谢,正好遇到这个问题,按照你的方法修改了。楼主git上的版本似乎还没有修改。

from jieba.net.

anderscui avatar anderscui commented on May 27, 2024

@zuiyuewentian 还是建议使用绝对路径来配置:)

@BestBin @zuiyuewentian 能否问一下,你们的分词主要是用在什么项目或者问题上?

from jieba.net.

BestBin avatar BestBin commented on May 27, 2024

@anderscui 好多地方啊,以前做文库系统用的盘古分词,最近用是做资讯,看盘古好久没啥动静了,就搜到了JieBa...

from jieba.net.

zuiyuewentian avatar zuiyuewentian commented on May 27, 2024

@anderscui 我在配置文件里面配在主目录下,用的相对路径 。
以前用过盘古分词,感觉很麻烦,而且好久了。 在网上搜分词的时候发现了jieba分词还不错,正好发现了.net版的,就想拿过来用的试试。
放到了nuget上使用确实很方便,感谢楼主分享。
可能用的比较多的会在做搜索的时候,定向爬虫也可能会用到。
想到这个的原因是记得罗永浩的手机上的分词功能。

from jieba.net.

anderscui avatar anderscui commented on May 27, 2024

@BestBin @zuiyuewentian 看来搜索还是最常见的应用,我目前的工作中主要使用Python,对于.NET关注地少了。如果对分词或其它相关的地方有新的需求和想法,欢迎一起来讨论,慢慢添加其它功能:)

from jieba.net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.