Giter Site home page Giter Site logo

go-mysql-elasticsearch's Introduction

go-mysql-elasticsearch is a service syncing your MySQL data into Elasticsearch automatically.

It uses mysqldump to fetch the origin data at first, then syncs data incrementally with binlog.

Install

  • Install Go (1.6+) and set your GOPATH
  • go get github.com/siddontang/go-mysql-elasticsearch, it will print some messages in console, skip it. :-)
  • cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
  • make

How to use?

  • Create table in MySQL.
  • Create the associated Elasticsearch index, document type and mappings if possible, if not, Elasticsearch will create these automatically.
  • Config base, see the example config river.toml.
  • Set MySQL source in config file, see Source below.
  • Customize MySQL and Elasticsearch mapping rule in config file, see Rule below.
  • Start ./bin/go-mysql-elasticsearch -config=./etc/river.toml and enjoy it.

Notice

  • binlog format must be row.
  • binlog row image must be full for MySQL, you may lost some field data if you update PK data in MySQL with minimal or noblob binlog row image. MariaDB only supports full row image.
  • Can not alter table format at runtime.
  • MySQL table which will be synced must have a PK(primary key), multi columns PK is allowed now, e,g, if the PKs is (a, b), we will use "a:b" as the key. The PK data will be used as "id" in Elasticsearch.
  • You should create the associated mappings in Elasticsearch first, I don't think using the default mapping is a wise decision, you must know how to search accurately.
  • mysqldump must exist in the same node with go-mysql-elasticsearch, if not, go-mysql-elasticsearch will try to sync binlog only.
  • Don't change too many rows at same time in one SQL.

Source

In go-mysql-elasticsearch, you must decide which tables you want to sync into elasticsearch in the source config.

The format in config file is below:

[[source]]
schema = "test"
tables = ["t1", t2]

[[source]]
schema = "test_1"
tables = ["t3", t4]

schema is the database name, and tables includes the table need to be synced.

Rule

By default, go-mysql-elasticsearch will use MySQL table name as the Elasticserach's index and type name, use MySQL table field name as the Elasticserach's field name. e.g, if a table named blog, the default index and type in Elasticserach are both named blog, if the table field named title, the default field name is also named title.

In addition, one-to-many join ( parent-child relationship in Elasticsearch ) is supported. Simply specify the field name for parent property.

Rule can let you change this name mapping. Rule format in config file is below:

[[rule]]
schema = "test"
table = "t1"
index = "t"
type = "t"
parent = "parent_id"

    [[rule.fields]]
    mysql = "title"
    elastic = "my_title"

In the example above, we will use a new index and type both named "t" instead of default "t1", and use "my_title" instead of field name "title".

Rule field types

In order to map a mysql column on different elasticsearch types you can define the field type as follows:

[[rule]]
schema = "test"
table = "t1"
index = "t"
type = "t"
parent = "parent_id"

    [rule.field]
    // This will map column title to elastic search my_title
    title="my_title"

    // This will map column title to elastic search my_title and use array type
    title="my_title,list"

    // This will map column title to elastic search title and use array type
    title=",list"

Modifier "list" will translates a mysql string field like "a,b,c" on an elastic array type '{"a", "b", "c"}' this is specially useful if you need to use those fields on filtering on elasticsearch.

Wildcard table

go-mysql-elasticsearch only allows you determind which table to be synced, but sometimes, if you split a big table into multi sub tables, like 1024, table_0000, table_0001, ... table_1023, it is very hard to write rules for every table.

go-mysql-elasticserach supports using wildcard table, e.g:

[[source]]
schema = "test"
tables = ["test_river_[0-9]{4}"]

[[rule]]
schema = "test"
table = "test_river_[0-9]{4}"
index = "river"
type = "river"

"test_river_[0-9]{4}" is a wildcard table definition, which represents "test_river_0000" to "test_river_9999", at the same time, the table in the rule must be same as it.

At the above example, if you have 1024 sub tables, all tables will be synced into Elasticsearch with index "river" and type "river".

Why not other rivers?

Although there are some other MySQL rivers for Elasticsearch, like elasticsearch-river-jdbc, elasticsearch-river-mysql, I still want to build a new one with Go, why?

  • Customization, I want to decide which table to be synced, the associated index and type name, or even the field name in Elasticsearch.
  • Incremental update with binlog, and can resume from the last sync position when the service starts again.
  • A common sync framework not only for Elasticsearch but also for others, like memcached, redis, etc...
  • Wildcard tables support, we have many sub tables like table_0000 - table_1023, but want use a unique Elasticsearch index and type.

Todo

  • Filtering table field support, only fields in filter config will be synced.
  • Statistic.
  • Docker

Feedback

go-mysql-elasticsearch is still in development, and we will try to use it in production later. Any feedback is very welcome.

Email: [email protected]

go-mysql-elasticsearch's People

Contributors

siddontang avatar eaglechen avatar adriacidre avatar sputnik13 avatar

Watchers

James Cloos avatar Cuc Corwin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.