Giter Site home page Giter Site logo

filiphanes / fts-elastic Goto Github PK

View Code? Open in Web Editor NEW
26.0 7.0 16.0 371 KB

ElasticSearch FTS implementation for the Dovecot mail server

License: Other

Makefile 0.40% Shell 42.40% M4 4.41% C 52.80%
dovecot fulltextsearch elasticsearch fts-elastic dovecot-fts

fts-elastic's People

Contributors

alpianon avatar atkinsj avatar bubu avatar filiphanes avatar harryyoud avatar hashworks avatar infernix avatar jantlwoomy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

fts-elastic's Issues

ngram issue

Hello, I want use fts-elastic in our envirroment but I have little problem with ngram filter.
When I modified schema for elastic7 to use ngrams here is example

my_analyzer was added to body section https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

and this works okay

when I have e-mail with content in body like:

"Hello, my name is Jared"

when i search by words:

Hello, Hell, my, name, nam etc.
I see this e-mail in output

{
    "settings": {
      "number_of_shards": 10,
      "number_of_replicas": 1,
      "analysis": {
        "normalizer": {
          "lowercase": {
            "type": "custom",
            "char_filter": [],
            "filter": ["lowercase"]
          }
        },
        "char_filter": {
          "remove_email_address": {
            "type": "pattern_replace",
            "pattern": "(<[^>]*>)",
            "replacement": " "
          }
        },
        "filter": {
          "email": {
            "type": "pattern_capture",
            "preserve_original": true,
            "max_token_length": 64,
            "patterns": [
              "([^@]+)",
              "(\\p{L}+)",
              "(\\d+)",
              "@(.+)",
              "([^-@]+)"
            ]
          },
          "max50char": {
            "type": "length",
            "min": 2,
            "max": 50
          },
          "limit1000": {
            "type": "limit",
            "max_token_count": 1000
          },
          "pl_PL" : {
            "type" : "hunspell",
            "locale" : "pl_PL",
            "dedup" : true
          }
        },
        "tokenizer": {
          "email_address": {
            "type": "pattern",
            "pattern": "<([^>]+)>",
            "group": 1
          },
        "my_tokenizer": {
           "type": "edge_ngram",
           "min_gram": 2,
           "max_gram": 10,
           "token_chars": [
            "letter",
            "digit"
           ]
        }

        },
        "analyzer": {
          "body": {
            "tokenizer": "standard",
            "filter": ["asciifolding", "lowercase", "unique", "max50char", "limit1000"]
          },
          "email": {
            "tokenizer": "uax_url_email",
            "filter": ["asciifolding", "email", "lowercase", "unique"]
          },
          "email_name": {
            "tokenizer": "standard",
            "filter": ["asciifolding", "lowercase", "unique"],
            "char_filter": ["remove_email_address"]
          },
          "email_address": {
            "tokenizer": "email_address",
            "filter": ["email", "lowercase", "unique"]
          },
          "my_analyzer": {
            "tokenizer": "my_tokenizer",
            "filter": ["asciifolding", "lowercase", "unique", "max50char", "limit1000"]
          }
        }
      }
    },
    "mappings": {
      "_source":  {
        "enabled": true
      },
      "properties": {
        "user":   {"type": "keyword", "normalizer": "lowercase"},
        "box":    {"type": "keyword", "normalizer": "lowercase"},
        "uid":    {"type": "integer"},
        "subject":{"type": "text"},
        "body":   {"type": "text", "analyzer": "my_analyzer"},
        "message-id":{"type": "keyword"},
        "date": {
          "type": "date",
          "format": "EEE, d LLL yyyy HH:mm:ss Z||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
        },
        "from": {
          "type": "text", "analyzer": "email",
          "fields": {
            "name":    {"type": "text", "analyzer": "email_name"},
            "address": {"type": "text", "analyzer": "email_address"}
          }
        },
        "sender": {
          "type": "text", "analyzer": "email",
          "fields": {
            "name":    {"type": "text", "analyzer": "email_name"},
            "address": {"type": "text", "analyzer": "email_address"}
          }
        },
        "to":     {
          "type": "text", "analyzer": "email",
          "fields": {
            "name":    {"type": "text", "analyzer": "email_name"},
            "address": {"type": "text", "analyzer": "email_address"}
          }
        },
        "cc":     {
          "type": "text", "analyzer": "email",
          "fields": {
            "name":    {"type": "text", "analyzer": "email_name"},
            "address": {"type": "text", "analyzer": "email_address"}
          }
        },
        "bcc":    {
          "type": "text", "analyzer": "email",
          "fields": {
            "name":    {"type": "text", "analyzer": "email_name"},
            "address": {"type": "text", "analyzer": "email_address"}
          }
        }
      }
    }
  }

but problems was start when I want add my_analyzer to subject like:

      "properties": {
        "user":   {"type": "keyword", "normalizer": "lowercase"},
        "box":    {"type": "keyword", "normalizer": "lowercase"},
        "uid":    {"type": "integer"},
        "subject":{"type": "text", "analyzer": "my_analyzer"},
        "body":   {"type": "text", "analyzer": "my_analyzer"},
        "message-id":{"type": "keyword"},
        "date": {
          "type": "date",
          "format": "EEE, d LLL yyyy HH:mm:ss Z||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
        },

then search engine shows me to to much emails - without matching

when I have email with

subject: hello
Body: hello

and I'm searching pharse 'dupka6' I see this email in matching

In solr this works okay here is debug from solr:

1582286181.431311 GET /solr/storage-s36/select?wt=xml&fl=uid,score&rows=6&sort=uid+asc&q=%7b!lucene+q.op%3dAND%7dsubject:dupka6+OR+from:dupka6&fq=%2Bbox:e0546f0f70b14f5e5c2b000068139654+%2Buser:szukajka@jakub

here is debug from elastic:

1582282575.211705 {"query":{"bool":{"filter":[{"term":{"user":"szukajka@jakub36"}},{"term":{"box": "e0546f0f70b14f5e5c2b000068139654"}}],"must":[{"multi_match":{"query":"dupka7","operator":"and","fields":["subject","from"]}}]}}, "size":10000, "_source":false}POST /s36/_search?routing=szukajka@jakubHTTP/1.1

difference is with or and AND separators

Solr doing
dsubject:dupka6+OR+from:dupka6
Elasting doing
{"query":"dupka7","operator":"and","fields":["subject","from"]}}]}}, "size":10000,

Maybe You know how to use ngrams in body and subject?
On body this works okay with schema above but I have problem with subject

Dovecot doesn't search the index

Hey!

Thanks for this project.

I'm having some troubles though, and I can't figure out how to get further.

I've compiled fts-elastic, installed it, and setup dovecot as per instructions.

When I run

doveadm fts rescan -u [email protected]
doveadm index -u user@domain -q '*'

The messages gets indexed.

I can see the messages in the ES index, curl http://localhost:9200/m/_search gives me the most recently indexed messages.

My 90-fts.conf looks like:

# cat /etc/dovecot/conf.d/90-fts.conf 
mail_plugins = $mail_plugins fts fts_elastic

plugin {
    fts = elastic
    fts_elastic = debug url=http://localhost:9200/m/ bulk_size=5000000 refresh=fts rawlog_dir=/var/log/fts-elastic/
    fts_autoindex = yes
    fts_enforced = yes
    fts_autoindex_exclude = \Trash
}

service indexer-worker {
    # Increase vsz_limit to 2GB or above.
    # Or 0 if you have rather large memory usable on your server, which is preferred for performance)
    vsz_limit = 2G
}

However, when doing IMAP searches, I can't see any log messages that inidicates that the request is going to ES.

When enabling mail_debug in dovecot I can see it doing a sequential scan of all the messages on disk. (4 SEARCH (BODY 'test'))

I can't see no new log files in /var/log/fts-elastic when searching, only when indexing.

If I do a search to ES

$ curl -X GET "localhost:9200/m/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "user": {
        "query": "[email protected]"
      }
    }
  }
}
'

I get the proper response, within a second or so.

Do you have an idea why Dovecot won't use this plugin?

ES 7.16, Dovceot 2.3.4.1, fts-elastic d6291f1

Thanks!

wildcard search support

https://github.com/filiphanes/fts-elastic/blob/master/src/fts-backend-elastic.c
Trying to figure out what to modify to enable wildcard searches like this:
curl -X GET "http://127.0.0.1:9200/m/_search?pretty" -H 'Content-Type: application/json' -d ' { "query": { "bool": { "filter": [ {"term": {"user": "[email protected]"}} ], "must": [{ "wildcard": { "body": "searchstrin*" } }] } }, "size": 100 }'
Or is it bad idea, what you think? (ES7)
Multiple fields aren't supported with wildcard search, in this example its only "body", would need to iterate fields (subject, from, to etc) somehow maybe..

log messages

Hi,

i'm currently on commit 6c79520

the plugin creates log messages, that don't include the port everytime, which is a bit confusing

2020-05-07 09:48:13 imap(rbendig)<4039><k5dYGgql4ql/AAAB>: Debug: http-client[1]: request [Req2: POST http://localhost/m/_search?routing=rbendig]: Send more (sent 264, buffered=0)
2020-05-07 09:48:13 imap(rbendig)<4039><k5dYGgql4ql/AAAB>: Debug: http-client[1]: request [Req2: POST http://localhost/m/_search?routing=rbendig]: Finished sending payload
2020-05-07 09:48:13 imap(rbendig)<4039><k5dYGgql4ql/AAAB>: Debug: http-client: conn [::1]:9200 [1]: Got 200 response for request [Req2: POST http://localhost:9200/m/_search?routing=rbendig]: OK (took 5 ms + 0 ms in queue)

IMHO it should include the port :)

Can't create log file, permission denied

Hi, I have installed the plugin and elastic search but when trying to execute doveadm fts rescan -u [email protected] I get an error Error: creat(/var/log/fts-elastic/20191227-031719.17197.1.in) failed: Permission denied
I have changed ownership of the fts-elastic folder to group root, user dovecot, and set even to 0666 but doesn't work, any help is appreciated.

Version Tags / Releases

Could you publish version tags or releases? I would like to push this to the Arch Linux community repository, but I'll need versions for that.

Query does not deliver any match

Hey,

I installed this plugin following the given guide. It works in a sense where it does not throw an error ;)
But it also does not deliver any search result...
Inside the logs i can see the actual search query and after fiddling around with this query i could identify an issue:


....
"bool": {
      "filter": [
        {"term": {"user": "[email protected]"}},
        {"term": {"box": "f40efa2f8f44ad54424000006e8130ae"}}
      ],
....

this part of the query always fails! Especially the {"term": {"user": "[email protected]"}}, part.
If i delete this part of the query i can see matching (i think) results....

I can see that all my mails are inside my elastic search instance...
... has anyone any idea what i do wrong?

Cheers!

Improve String escaping when indexing

In my real-life scenario, when I try to index my whole mailbox (> 100k mails), I get lots of different chars that are invalid in JSON.
All "Illegal unquoted character" messages from my ElasticSearch logs:

Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 11)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 14)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 16)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 19)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 1)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 21)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 25)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 26)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 27)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 2)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 3)): has to be escaped using backslash to be included in string value
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 7)): has to be escaped using backslash to be included in string value

Of course all of these codes could be mapped by hand in

static const char elastic_escape_chars[]
but that just screams "race condition" :)
Maybe using some kind of library could help here. Unfortunately, I'm not very profound in the C world...

Date format errors in elasticsearch: failed to parse field [date] of type [date]

Can you please assist with issue I see in logs:

{"index":
{"_index":"m","_id":"23510/4f8ae61ae1902151d6120000bb28466e/[email protected]","status":400,
"error":{
    "type":"document_parsing_exception",
    "reason":"[1:338] failed to parse field [date] of type [date] in document with id '23510/4f8ae61ae1902151d6120000bb28466e/[email protected]'. Preview of field's value: 'Wed, 18 Jan 2023 13:00:20 +0500 (+05)'",
    "caused_by":{
        "type":"illegal_argument_exception",
        "reason":"failed to parse date field [Wed, 18 Jan 2023 13:00:20 +0500 (+05)] with format [[EEE, ][ ]d MMM yyyy HH:mm:ss[ Z][ (z)]]",
        "caused_by":{
            "type":"date_time_parse_exception","reason":"Text 'Wed, 18 Jan 2023 13:00:20 +0500 (+05)' could not be parsed, unparsed text found at index 31"
        }
    }
}
}},

"Index 31" is "Thu, 2 Sep 2021 16:48:39 +0500 (+05)"

I use elasticsearch-8.8.1 and elastic7-schema.json from repo.

References: #7 #8

date format errors in elasticsearch

I wanted to set this up using elasticsearch 7.6.1 following your readme.

Indexing worked fine whenever I want to search via imap, I get no results and elasticsearch puts a lot of java exceptions amounting to

Caused by: java.lang.IllegalArgumentException: failed to parse date field [Mon, 20 Apr 2020 15:00:07 -0400] with format [EEE, d LLL yyyy HH:mm:ss Z||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd]

in the log.

Any idea what might be going wrong, or how I'd fix that?

(The format looks correct to me?)

Searching the wrong box

Hey,

its me again. First, thanks for the help the last time. I hope it is ok if i have another question. It might be that i am doing something wrong again...

I noticed that not all mails ware listed if a search for a particular keyword. I was testing this by searching for a particular mail in a sub folder... but the mail was not displayed when i was searching for the exact subject field...

I was running the search by hand directly on my elastic search system with the search string copied from the logs... the mail was not there (same behavior as on my phone)...
It turned out that the box field of the mail was not matching the one of the search query. As soon as i deleted the box field inside the search query, the mail was shown...
And indeed the box field contains a different box string than the one in the seach query from the plugin/dovecot...

What is the box parameter used for? Can you give me some pointers?

Thanks again...

Build fails with dovecot 2.3.18

It seems like DOVECOT_PREREQ now requires major minor and patch.

/bin/sh ../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..    -I/usr/lib/dovecot/src/plugins/fts -I/usr/include/dovecot   -std=gnu99 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fstack-protector-strong -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -mfunction-return=keep -mindirect-branch=keep -Wall -W -Wmissing-prototypes -Wmissing-declarations -Wpointer-arith -Wchar-subscripts -Wformat=2 -Wbad-function-cast -fno-builtin-strftime -Wstrict-aliasing=2  -I.. -MT fts-elastic-plugin.lo -MD -MP -MF .deps/fts-elastic-plugin.Tpo -c -o fts-elastic-plugin.lo fts-elastic-plugin.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I/usr/lib/dovecot/src/plugins/fts -I/usr/include/dovecot -std=gnu99 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fstack-protector-strong -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -mfunction-return=keep -mindirect-branch=keep -Wall -W -Wmissing-prototypes -Wmissing-declarations -Wpointer-arith -Wchar-subscripts -Wformat=2 -Wbad-function-cast -fno-builtin-strftime -Wstrict-aliasing=2 -I.. -MT fts-elastic-plugin.lo -MD -MP -MF .deps/fts-elastic-plugin.Tpo -c fts-elastic-plugin.c  -fPIC -DPIC -o .libs/fts-elastic-plugin.o
In file included from fts-elastic-plugin.c:10:
fts-elastic-plugin.h:40:50: error: macro "DOVECOT_PREREQ" requires 3 arguments, but only 2 given
   40 | #if defined(DOVECOT_PREREQ) && DOVECOT_PREREQ(2,3)
      |                                                  ^
In file included from /usr/include/dovecot/lib.h:27,
                 from fts-elastic-plugin.c:5:
/usr/include/dovecot/macros.h:237: note: macro "DOVECOT_PREREQ" defined here
  237 | #define DOVECOT_PREREQ(maj, min, micro) \
      |
make[2]: *** [Makefile:476: fts-elastic-plugin.lo] Error 1
make[2]: Leaving directory '/build/dovecot-fts-elastic/src/fts-elastic-1.0.0/src'
make[1]: *** [Makefile:434: all-recursive] Error 1
make[1]: Leaving directory '/build/dovecot-fts-elastic/src/fts-elastic-1.0.0'
make: *** [Makefile:366: all] Error 2

Failed to parse HTTP url: HTTP URL does not allow `userinfo@' part

Hi there,

First of all thanks for this plugin. I would love to make it work, but unfortunately I'm experiencing a connection issue between the plugin and ElasticSearch.

$ doveadm fts rescan -u [email protected]
Error: fts: Failed to initialize backend 'elastic': fts_elastic: Failed to parse HTTP url: HTTP URL does not allow `userinfo@' part

This is my /etc/dovecot/conf.d/90-fts.conf file:

plugin {
  fts = elastic
  fts_elastic = debug url=https://user:[email protected]:9200/m/ bulk_size=5000000 refresh=fts rawlog_dir=/var/log/elasticsearch/

# no indexes new emails when user make search
# yes indexes every email when delivered
  fts_autoindex = yes
fts_autoindex_exclude = \Junk
fts_autoindex_exclude2 = \Trash
}

I managed to successfully upload the schema:

$ curl -X PUT "https://user:[email protected]:9200/m?pretty" -H 'Content-Type: application/json' -d "@elastic7-schema.json"  
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "m"
}

But unfortunately the plugin cannot connect.. as far as I can see it doesn't like the user:password@. ElasticSearch is running in production, with Basic Auth enabled and secured.

I'm running ElasticSearch 7.10.1 with Dovecot 2.3.11.3 on 5.9.6 x86_64 GNU/Linux.

Am I doing something wrong? Do I miss a setting?
Thanks in advance for your time and keep up the good work.

No "date" field in pushed document

There is no "date" field in the pushed documents, and it's not in _source either. Elasticsearch version 7.6.0, and index mapping is created with the included elastic7-schema.json file. Is this supposed to be the case or have I missed something? Thanks.

Dovecot crashes when searching virtual folders

OS: RedHat 8
Dovecot: 2.3.8

I compiled and installed the plugin and after searching normal folders, everything works fine. However, if I try to search a virtual folder, dovecot crashes.

Same issue on OpenSUSE 15.2 with Dovecot 2.3.16

Everything works fine when using Solr, virtual folders are searchable and Solr returns results.


2021-08-28 02:47:21 imap(***)<2208381><pDnUpZrKiONEknGC>: Fatal: master: service(imap): child 2208381 killed with signal 6 (core not dumped - https://dovecot.org/bugreport.html#coredumps - set /proc/sys/fs/suid_dumpable to 2)
2021-08-28 02:47:21 imap(***)<2208381><pDnUpZrKiONEknGC>: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0xf8cab) [0x7fa9da261cab] -> /usr/lib64/dovecot/libdovecot.so.0(+0xf8d47) [0x7fa9da261d47] -> /usr/lib64/dovecot/libdovecot.so.0(+0x55c1d) [0x7fa9da1bec1d] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4655c) [0x7fa9da56555c] -> /usr/lib64/dovecot/lib21_fts_elastic_plugin.so(+0x40a5) [0x7fa9d77030a5] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup_multi+0x15c) [0x7fa9d87955dc] -> /usr/lib64/dovecot/lib20_fts_plugin.so(+0x10306) [0x7fa9d879a306] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_search_lookup+0xd8) [0x7fa9d879a758] -> /usr/lib64/dovecot/lib20_fts_plugin.so(+0x13308) [0x7fa9d879d308] -> dovecot/imap(imap_search_start+0x6e) [0x55781594069e] -> dovecot/imap(cmd_search+0xdd) [0x557815930ccd] -> dovecot/imap(command_exec+0x78) [0x557815938eb8] -> dovecot/imap(+0x2000f) [0x55781593700f] -> dovecot/imap(+0x200c3) [0x5578159370c3] -> dovecot/imap(+0x20282) [0x557815937282] -> dovecot/imap(client_handle_input+0x1b5) [0x557815937485] -> dovecot/imap(client_input+0x82) [0x5578159379c2] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x73) [0x7fa9da279413] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0x135) [0x7fa9da27aac5] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0x50) [0x7fa9da2794c0] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x48) [0x7fa9da279628] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x17) [0x7fa9da1f0fd7] -> dovecot/imap(main+0x335) [0x5578159293e5] -> /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fa9d98e1493] -> dovecot/imap(_start+0x2e) [0x55781592959e]
2021-08-28 02:47:21 imap(***)<2208381><pDnUpZrKiONEknGC>: Panic: file mail-storage.c: line 1990 (mailbox_get_open_status): assertion failed: (box->opened)

#25 removed compatibility with older dovecot versions

I am running dovecot 2.3.7, as delivered with Ubuntu 20.04 LTS.
This would also hit Debian users on still-supported versions. Debian 10 provides dovecot 2.3.4, Debian 11 provides dovecot 2.3.13.

The fix for enabling compilation against newer dovecot versions broke backwards compatibility:

In file included from fts-elastic-plugin.c:11:
fts-elastic-plugin.h:42:52: error: macro "DOVECOT_PREREQ" passed 3 arguments, but takes just 2
   42 | #if defined(DOVECOT_PREREQ) && DOVECOT_PREREQ(2,3,0)
      |                                                    ^
In file included from /usr/include/dovecot/lib.h:27,
                 from fts-elastic-plugin.c:5:
/usr/include/dovecot/macros.h:226: note: macro "DOVECOT_PREREQ" defined here
  226 | #  define DOVECOT_PREREQ(maj, min) \

I don't know enough about C to know whether the macro signature could also be checked, to fix it in a way that works for both worlds.
It feels as if the Dovecot team shouldn't have made that change the way they did, at least not in a micro increment.

Index gets stale

I'm not exactly sure what I'm doing wrong but I keep running in situations where newer mails seem not searchable unless I explicitly call

doveadm fts rescan -u [email protected]
doveadm index -u user@domain -q '*'

for that user.

Here's my config:

mail_plugins = $mail_plugins fts fts_elastic

plugin {
  fts = elastic
  fts_elastic = url=http://localhost:9200/m/ bulk_size=5000000 refresh=fts

# no indexes new emails when user make search
# yes indexes every email when delivered
fts_autoindex = no
fts_autoindex_exclude = \Junk
fts_autoindex_exclude2 = \Trash
}

Any ideas on how to debug this?

segmentation fault with Dovecot 2.3.x

I installed fts-elastic on Debian 10 with Dovecot 2.3.4.1 and Elasticsearch 7.5.0 with the following procedure:

apt install libjson-c-dev libjson-c3 dovecot-dev
apt install default-libmysqlclient-dev dh-exec krb5-multidev \
   libapparmor-dev libbz2-dev libclucene-dev libdb-dev libicu-dev \
   libexttextcat-dev libldap2-dev liblz4-dev liblzma-dev liblua5.3-dev \ 
   libpam0g-dev libpq-dev libsasl2-dev libsodium-dev libsqlite3-dev \ 
   libstemmer-dev libwrap0-dev zlib1g-dev
apt-get --build source dovecot
rm dovecot-*.deb
rm dovecot_*
git clone https://github.com/filiphanes/fts-elastic.git 
cd fts-elastic/
./autogen.sh
./configure --with-dovecot=/root/dovecot-2.3.4.1/
make
make install
ln -s /usr/lib/dovecot/lib21_fts_elastic_plugin.so \
   /usr/lib/dovecot/modules/lib21_fts_elastic_plugin.so

I enabled fts and fts_elastic plugins in dovecot, and I successfully setup index mapping as suggested in the README.

However, fts_elastic does not work correctly, since imap search returns only a few (one or two, out of, say, tens of) email messages, and if I try to reindex a mailbox with doveadm fts rescan -u [email protected] I get Segmentation fault.

If I try to run it with gdb from within dovecot source folder I get the following output:

root@server:~# cd ~/dovecot-2.3.4.1/
root@server:~/dovecot-2.3.4.1# gdb --args doveadm fts rescan -u "[email protected]"
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
[...]
Reading symbols from doveadm...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/doveadm fts rescan -u [email protected]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b51f10 in buffer_free () from /usr/lib/dovecot/libdovecot.so.0
(gdb) backtrace
#0  0x00007ffff7b51f10 in buffer_free () from /usr/lib/dovecot/libdovecot.so.0
#1  0x00007ffff51e246c in array_free_i (array=0x555555654650) at ./src/lib/array.h:138
#2  fts_backend_elastic_rescan (_backend=0x555555639de0) at fts-backend-elastic.c:778
#3  0x00007ffff768163d in fts_backend_rescan () from /usr/lib/dovecot/modules/lib20_fts_plugin.so
#4  0x00007ffff7fc8d77 in ?? () from /usr/lib/dovecot/modules/doveadm/lib20_doveadm_fts_plugin.so
#5  0x000055555558493f in ?? ()
#6  0x000055555558558d in ?? ()
#7  0x0000555555586389 in doveadm_cmd_ver2_to_mail_cmd_wrapper ()
#8  0x0000555555596afd in doveadm_cmd_run_ver2 ()
#9  0x0000555555596b57 in doveadm_cmd_try_run_ver2 ()
#10 0x0000555555575b21 in main ()
(gdb) frame 2
#2  fts_backend_elastic_rescan (_backend=0x555555639de0) at fts-backend-elastic.c:778
778		array_free(&result->maybe_uids);
(gdb) frame 1
#1  0x00007ffff51e246c in array_free_i (array=0x555555654650) at ./src/lib/array.h:138
138		buffer_free(&array->buffer);
(gdb) frame 0
#0  0x00007ffff7b51f10 in buffer_free () from /usr/lib/dovecot/libdovecot.so.0
(gdb) 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.