Giter Site home page Giter Site logo

mod_elasticsearch's Introduction

mod_elasticsearch

Build Status

This Zotonic module gives you more relevant search results by making resources searchable through
Elasticsearch.

Note

This module uses Elastic Search 5.x, which has support for Types.

For Elastic 7+, use https://github.com/driebit/mod_elasticsearch2

Installation

mod_elasticsearch acts as a bridge between Zotonic and the tsloughter/erlastic_search Erlang library, so install that and its dependencies first by adding them to deps in zotonic.config:

{deps, [
    %% ...
    {erlastic_search, ".*", {git, "https://github.com/tsloughter/erlastic_search.git", {tag, "master"}}},
    {hackney, ".*", {git, "https://github.com/benoitc/hackney.git", {tag, "1.6.1"}}},
    {jsx, ".*", {git, "https://github.com/talentdeficit/jsx.git", {tag, "2.8.0"}}}      
]}

Configuration

To configure the Elasticsearch host and port, edit your erlang.config file:

[
    %% ...
    {erlastic_search, [
        {host, <<"elasticsearch">>}, %% Defaults to 127.0.0.1
        {port, 9200}                 %% Defaults to 9200
    ]},
    %% ...
].

Search queries

When mod_elasticsearch is enabled, it will direct all search operations of the ‘query’ type to Elasticsearch:

z_search:search({query, Args}, Context).

For Args, you can pass all regular Zotonic query arguments, such as:

z_search:search({query, [{hasobject, 507}]}, Context).

Query context filters

The filter search argument that you know from Zotonic will be used in Elasticsearch’s filter context. To add filters that influence score (ranking), use the query_context_filter instead. The syntax is identical to that of filter:

z_search:search({query, [{query_context_filter, [["some_field", "value"]]}]}, Context).

Extra query arguments

This module adds some extra query arguments on top of Zotonic’s default ones.

To find documents that have a field, whatever its value (make sure to pass exists as atom):

{filter, [<<"some_field">>, exists]}

To find documents that do not have a field (make sure to pass missing as atom):

{filter, [<<"some_field">>, missing]}

For a match phrase prefix query, use the prefix argument:

z_search:search({query, [{prefix, <<"Match this pref">>}]}, Context).

To exclude a document:

{exclude_document, [Type, Id]}

To supply a custom function_score clause, supply one or more score_functions. For instance, to rank recent articles above older ones:

z_search:search(
    {query, [
        {text, "Search this"},
        {score_function, #{
            <<"filter">> => [{cat, "article"}],
            <<"exp">> => #{
                <<"publication_start">> => #{
                    <<"scale">> => <<"30d">>
                }
            }
        }}
    ]},
    Context
).

Notifications

elasticsearch_fields

Observe this foldr notification to change the document fields that are queried. You can use Elasticsearch multi_match syntax for boosting fields:

%% your_site.erl

-export([
    % ...
    observe_elasticsearch_fields/3
]).

observe_elasticsearch_fields({elasticsearch_fields, QueryText}, Fields, Context) ->
    %% QueryText is the search query text

    %% Add or remove fields: 
    [<<"some_field">>, <<"boosted_field^2">>|Fields].   

elasticsearch_put

Observe this notification to change the resource properties before they are stored in Elasticsearch. For instance, to store their zodiac sign alongside person resources:

%% your_site.erl

-include_lib("mod_elasticsearch/include/elasticsearch.hrl").

-export([
    % ...
    observe_elasticsearch_put/3
]).

-spec observe_elasticsearch_put(#elasticsearch_put{}, map(), z:context()) -> map().
observe_elasticsearch_put(#elasticsearch_put{id = Id}, Props, Context) ->
    case m_rsc:is_a(Id, person, Context) of
        true ->
            Props#{zodiac => calculate_zodiac(Id, Context)};
        false ->
            Props
    end.

Logging

By default, mod_elasticsearch logs outgoing queries at the debug log level. To see them in your Zotonic console, change the minimum log level to debug:

lager:set_loglevel(lager_console_backend, debug).

How resources are indexed

Content in all languages is stored in the index, following the one language per field strategy:

Each translation is stored in a separate field, which is analyzed according to the language it contains. At query time, the user’s language is used to boost fields in that particular language.

mod_elasticsearch's People

Contributors

ddeboer avatar dirklectisch avatar doriend avatar githubrow avatar linuss avatar loetie avatar mworrell avatar rl-king avatar robvandenbogaard avatar row-b avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mod_elasticsearch's Issues

Support cat_exact

We currently store all categories a resource belongs to. Elasticsearch does not support searching in only the last value (which in our case is the most specific category, which we need for cat_exact), so we need to store the most specific category separately.

/cc @DorienD

Convert all is_* properties to boolean

As long as booleans come in, that’s fine. Unfortunately, devs sometimes use 0/1 for false/true, which will (correctly) be interpreted as a numeric field by Elasticsearch. Just do a z_convert:to_bool to enforce boolean.

Mapping upgrades

In general, existing field mappings cannot be updated.

To be able to change mappings (both dynamic mappings and explicit mappings) the index needs to be recreated. Offer a way to ‘upgrade’ an index when manage_schema contains a mapping change:

  1. version the index
  2. change the current index name to an alias that points to the latest version
  3. when a mapping change occurs, apply it to index{current_version+1}, reindex the data using the reindex API
  4. switch the alias when reindexing has finished (reindex is synchronous).

Fall back to PostgreSQL if Elasticsearch is unreachable

If Elasticsearch is unreachable, fall back to the PostgreSQL data source. Let’s try try to differentiate between:

  • unreachable Elasticsearch: perhaps retry once, then fall back to PostgreSQL
  • error in Elasticsearch query: do not fall back but show error page (so devs/editors know they made an error in the template or search query resource).

Add status button to re-install the Elastic index

If a document is inserted without first creating the index and installing the mappings, then the default mappings are used.

These default mappings are wrong, especially for related objects.

If such a wrong index exists, or if mappings are changed, then we need a simple method to re-install the index with the new mappings.

Proposal is to add a button to /admin/status to drop the index, re-create the index, and start a re-pivot of all resources.

Denormalise related resources

Currently this module leaves out related resources (Zotonic includes their titles to find resources by their relations). Using the pivot templates in Zotonic 1.0, we can include them again. Remember that we already have the elasticsearch_put notification for really customised inclusion of related resource terms.

Optimise mappings

This module already has some good mappings, but we may be able to improve upon them. For inspiration, cf. the mappings used in mod_search_solr.

  • postal codes not_analyzed
  • don’t index summary_html, body_html etc.
  • store less data, as we only need to return the ids (and some facets). Fields are not stored by default, so just add to ignored_props to get rid of them in _source.

Use separate Hackney pools for reads and writes

A large number of writes (e.g. when indexing a large dataset) can cause the Hackney pool used by erlastic_search to become congested. Raising the maximum number of connections for that pool helps but is no definitive solution, because writes still contend with reads.

Therefore, it’s better to use two separate pools: one for all reads (GET/HEAD requests) and another for all indexing operations (PUT/POST/DELETE).

Boost specific categories

Elasticsearch has an elegant way to prefer resources in specific categories (using should with an optional boost):

GET yoursite/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "category": {
              "query": "person",
              "boost": 2
            }
          }
        }
      ],
      "must": [
        {
          "multi_match": {
            "query": "james bond",
            "fields": [
              "title_*"
            ]
          }
        }
      ]
    }
  }

Zotonic only supports must match for a category. So we need to either:

  • add custom m.search properties that only mod_elasticsearch supports (e.g., prefer_cat[s]). Disadvantage: the extended query breaks when mod_elasticsearch is disabled and Zotonic falls back to the default PostgreSQL search.
  • or add a custom m.elasticsearch model to make it very explicit that the search query depends on mod_elasticsearch. Disadvantage: requires close coupling between templates and the search engine.

Fix Erlang 18 compatibility

ERROR: OTP release 18 does not match required regex R15|R16|17
ERROR: compile failed while processing /opt/zotonic/deps/idna: rebar_abort
GNUmakefile:56: recipe for target 'compile' failed
make: *** [compile] Error 1

tsloughter/erlastic_search depends on hackney, which depends on idna.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.