Giter Site home page Giter Site logo

sunspot_chinese_example's Introduction

这是一个安装脚本,能够build和sunspot兼容的solr/mmseg4j版本,方便用来作中文的全文索引。

执行install.sh脚本以后,会在本目录下面生成一个sunspot_solr_mmseg4j目录,这里包括了jetty,solr的war文件,以及mmseg4j的集成,你只需要执行java -jar start.jar,就能够启动一个solr server。

启动后,用浏览器访问 http://localhost:8983/solr/admin/analysis.jsp

选择Field为type,输入text。 Field value (Index) 输入 “研究生研究生命科学” 。点击Analyze,如果输出“研究|生|研究|生命|科学”,就说明mmseg4j + sunspot 已经build成功。

sunspot_solr_mmseg4j这个目录可以拷贝到任意地方,作为提供全文索引的服务器。

对于高级的配置,你可以修改sunspot_solr_mmseg4j/solr/conf/schema.xml文件

比如添加index和query的分析器,让查询也需要走分词器,或者更改mmseg4j的分词模式(build好的默认是max-word)

    <fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer type="index">
        <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="mmseg4j_dict"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="mmseg4j_dict"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PositionFilterFactory" />
      </analyzer>
    </fieldType>

或者在solr/mmseg4j_dict目录下面添加更多的自定义词库。

sunspot_chinese_example's People

Contributors

quake avatar huacnlee avatar gbammc avatar

Watchers

Vincent Xie avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.