Giter Site home page Giter Site logo

importer's Introduction

tikv_logo

Build Status Coverage Status CII Best Practices

TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL systems, TiKV not only provides classical key-value APIs, but also transactional APIs with ACID compliance. Built in Rust and powered by Raft, TiKV was originally created by PingCAP to complement TiDB, a distributed HTAP database compatible with the MySQL protocol.

The design of TiKV ('Ti' stands for titanium) is inspired by some great distributed systems from Google, such as BigTable, Spanner, and Percolator, and some of the latest achievements in academia in recent years, such as the Raft consensus algorithm.

If you're interested in contributing to TiKV, or want to build it from source, see CONTRIBUTING.md.

cncf_logo cncf_logo

TiKV is a graduated project of the Cloud Native Computing Foundation (CNCF). If you are an organization that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how TiKV plays a role, read the CNCF announcement.


With the implementation of the Raft consensus algorithm in Rust and consensus state stored in RocksDB, TiKV guarantees data consistency. Placement Driver (PD), which is introduced to implement auto-sharding, enables automatic data migration. The transaction model is similar to Google's Percolator with some performance improvements. TiKV also provides snapshot isolation (SI), snapshot isolation with lock (SQL: SELECT ... FOR UPDATE), and externally consistent reads and writes in distributed transactions.

TiKV has the following key features:

  • Geo-Replication

    TiKV uses Raft and the Placement Driver to support Geo-Replication.

  • Horizontal scalability

    With PD and carefully designed Raft groups, TiKV excels in horizontal scalability and can easily scale to 100+ TBs of data.

  • Consistent distributed transactions

    Similar to Google's Spanner, TiKV supports externally-consistent distributed transactions.

  • Coprocessor support

    Similar to HBase, TiKV implements a coprocessor framework to support distributed computing.

  • Cooperates with TiDB

    Thanks to the internal optimization, TiKV and TiDB can work together to be a compelling database solution with high horizontal scalability, externally-consistent transactions, support for RDBMS, and NoSQL design patterns.

Governance

See Governance.

Documentation

For instructions on deployment, configuration, and maintenance of TiKV,see TiKV documentation on our website. For more details on concepts and designs behind TiKV, see Deep Dive TiKV.

Note:

We have migrated our documentation from the TiKV's wiki page to the official website. The original Wiki page is discontinued. If you have any suggestions or issues regarding documentation, offer your feedback here.

TiKV adopters

You can view the list of TiKV Adopters.

TiKV software stack

The TiKV software stack

  • Placement Driver: PD is the cluster manager of TiKV, which periodically checks replication constraints to balance load and data automatically.
  • Store: There is a RocksDB within each Store and it stores data into the local disk.
  • Region: Region is the basic unit of Key-Value data movement. Each Region is replicated to multiple Nodes. These multiple replicas form a Raft group.
  • Node: A physical node in the cluster. Within each node, there are one or more Stores. Within each Store, there are many Regions.

When a node starts, the metadata of the Node, Store and Region are recorded into PD. The status of each Region and Store is reported to PD regularly.

Quick start

Deploy a playground with TiUP

The most quickest to try out TiKV with TiDB is using TiUP, a component manager for TiDB.

You can see this page for a step by step tutorial.

Deploy a playground with binary

TiKV is able to run separately with PD, which is the minimal deployment required.

  1. Download and extract binaries.
$ export TIKV_VERSION=v7.5.0
$ export GOOS=darwin  # only {darwin, linux} are supported
$ export GOARCH=amd64 # only {amd64, arm64} are supported
$ curl -O  https://tiup-mirrors.pingcap.com/tikv-$TIKV_VERSION-$GOOS-$GOARCH.tar.gz
$ curl -O  https://tiup-mirrors.pingcap.com/pd-$TIKV_VERSION-$GOOS-$GOARCH.tar.gz
$ tar -xzf tikv-$TIKV_VERSION-$GOOS-$GOARCH.tar.gz
$ tar -xzf pd-$TIKV_VERSION-$GOOS-$GOARCH.tar.gz
  1. Start PD instance.
$ ./pd-server --name=pd --data-dir=/tmp/pd/data --client-urls="http://127.0.0.1:2379" --peer-urls="http://127.0.0.1:2380" --initial-cluster="pd=http://127.0.0.1:2380" --log-file=/tmp/pd/log/pd.log
  1. Start TiKV instance.
$ ./tikv-server --pd-endpoints="127.0.0.1:2379" --addr="127.0.0.1:20160" --data-dir=/tmp/tikv/data --log-file=/tmp/tikv/log/tikv.log
  1. Install TiKV Client(Python) and verify the deployment, required Python 3.5+.
$ pip3 install -i https://test.pypi.org/simple/ tikv-client
from tikv_client import RawClient

client = RawClient.connect("127.0.0.1:2379")

client.put(b'foo', b'bar')
print(client.get(b'foo')) # b'bar'

client.put(b'foo', b'baz')
print(client.get(b'foo')) # b'baz'

Deploy a cluster with TiUP

You can see this manual of production-like cluster deployment presented by @c4pt0r.

Build from source

See CONTRIBUTING.md.

Client drivers

If you want to try the Go client, see Go Client.

Security

Security audit

A third-party security auditing was performed by Cure53. See the full report here.

Reporting Security Vulnerabilities

To report a security vulnerability, please send an email to TiKV-security group.

See Security for the process and policy followed by the TiKV project.

Communication

Communication within the TiKV community abides by TiKV Code of Conduct. Here is an excerpt:

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

Social Media

Slack

Join the TiKV community on Slack - Sign up and join channels on TiKV topics that interest you.

License

TiKV is under the Apache 2.0 license. See the LICENSE file for details.

Acknowledgments

  • Thanks etcd for providing some great open source tools.
  • Thanks RocksDB for their powerful storage engines.
  • Thanks rust-clippy. We do love the great project.

importer's People

Contributors

3pointer avatar andremouche avatar breezewish avatar brson avatar busyjay avatar connor1996 avatar crlf0710 avatar csmoe avatar disksing avatar dorianzheng avatar h-zex avatar hicqu avatar hijiao avatar hoverbear avatar huachaohuang avatar kennytm avatar lance6716 avatar lonng avatar nrc avatar overvenus avatar psiace avatar rleungx avatar seansu4you87 avatar solotzg avatar yangkeao avatar yiwu-arbug avatar zhangjinpeng87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

importer's Issues

tikv-importer configuration options

Feature Request

Is your feature request related to a problem? Please describe:

I have a script that imports an example database with tidb-lightning / tikv-importer. It works like this:

#!/bin/sh
mysql -e 'drop database if exists ontime'

killall -9 tikv-importer
killall -9 tidb-lightning

BASE=/mnt/evo860

cd $BASE
rm -rf $BASE/tmp && mkdir -p $BASE/tmp
./bin/tikv-importer --import-dir $BASE/tmp &
sleep 5
./bin/tidb-lightning -d $BASE/data-sets/ontime-data -importer localhost:20160

Describe the feature you'd like:

Besides pingcap/tidb-lightning#264, I have two critiques of tikv-importer's configuration:

  1. The default port of 20160 is opaque. It is not well documented, and running tikv-importer --help does not show what the default values for options are (tidb-lightning --help for example will show defaults). I had to run lsof to discover the port :-)

  2. The option called --import-dir appears to be purely for ephemeral storage, since it ends up being stored in tikv, and sourced from tidb-lightning. This would more correctly be called --tmp-dir. The name is important because it communicates how to provision its storage (i.e. in cloud it should be on the local ephemeral drive, not network attached, and backup/redundancy etc is not important).

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

v3.0 KV Protocol Proposal: Add a `GetVersion` and `GetMetrics` gRPC endpoint

service ImportKV {
    rpc GetVersion(GetVersionRequest) returns (GetVersionResponse) {}
    rpc GetMetrics(GetMetricsRequest) returns (GetMetricsResponse) {}
}

message GetVersionResponse {
    string version = 1;
}

message GetMetricsResponse {
    // To be determined
}

The GetVersion API will be used to determine whether new APIs will be available.

The GetMetrics API will be used by Lightning to report import progress to the log and web interface (refer to debugpb implementation in TiKV).

Legworks

  • Setup Jenkins CI, with proper caching (currently it takes 30 minutes to compile and test the entire importer in debug mode on Travis CI)
  • Upload importer into the internal staging server, such that the Lightning / TiKV / TiDB-Toolkit releases can find this release tarball.
  • Add README.md
  • Add Makefile (support cargo features, rustfmt, clippy) (#1)
  • Add code coverage
  • Add GitHub issue / PR templates (#1)
  • Review GOVERNANCE.md (#1)

consider change region-split-size as 96M default

Question

the default value of region-split-size is 500M

but the default value region-max-size of tikv is 144MiB:

{"split-region-on-table":false,"batch-split-limit":10,"region-max-size":"144MiB","region-split-size":"96MiB","region-max-keys":1440000,"region-split-keys":960000}

So after import to tikv, tikv will check and split the region again and this take times, also we may consider split on key number.

Failed to start importer

Bug Report

What version of TiKV Importer are you using?

TODO

What operating system and CPU are you using?

What did you do?

What did you expect to see?

What did you see instead?

[2020/05/18 14:40:16.552 +08:00] [FATAL] [setup.rs:118] ["failed to start memory monitor: Error: descriptor Desc { fq_name: \"process_virtual_memory_bytes\", help: \"Virtual memory size in bytes.\", const_label_pairs: [], variable_labels: [], id: 11660941
70200363441, dim_hash: 18077976125237009095 } already exists with the same fully-qualified name and const label values"]

tikv-importer fails with no error message when using domain name as listening address

Bug Report

I was running tidb-lightning in Kubernetes with domain name as the listening address, but the tikv-importer failed with no logs. After some debugging, I found that tikv-importer can start using Pod IP as listening address.

To reproduce this, simply add tikv-importer binary to pingcap/alpine-glibc Docker image, and run the following commands in the new image. To help ease the reproduce, I've built an image uhub.ucloud.cn/pingcap/tidb-lightning:latest.

$ docker run -it --rm uhub.ucloud.cn/pingcap/tidb-lightning:latest
/ # cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2      318b01eaf0c1
/ # /tikv-importer -h
TiKV Importer 3.0.2
The TiKV Authors
The importer server for TiKV

USAGE:
    tikv-importer [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -A, --addr <IP:PORT>       Set the listening address
    -C, --config <FILE>        Set the configuration
        --import-dir <PATH>    Set the directory to store importing kv data
        --log-file <FILE>      Set the log file
        --log-level <LEVEL>    Set the log level [possible values: trace, debug, info, warn, error, off]
/ # /tikv-importer --addr=318b01eaf0c1:20170
/ # echo $?
1
/ # /tikv-importer --addr=172.17.0.2:20170
^C[2019/08/11 05:48:33.494 +00:00] [INFO] [signal_handler.rs:21] ["receive signal 2, stopping server..."]
/#

To run tidb-lightning in Kubernetes, tidb-lightning needs to support domain name which is required for stateful applications.

Support TLS for security

Feature Request

Is your feature request related to a problem? Please describe:

When PD/TiKV/TiDB configures with TLS for connection, tikv importer may not work and also no security.
Describe the feature you'd like:

Support TLS for connecting with PD/TiKV/TiDB.
Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

v3.0 KV Protocol Proposal: Strip the first 11 bytes from key in WriteEngine, specify it in ImportEngine

message ImportEngineRequest {
    bytes uuid = 1;
    string pd_addr = 2;
    bytes key_prefix = 3;
}

Rationale:

An engine currently targets a single table, and contains either only data or only index. This means the first 11 bytes of all keys in an engine are always the same (t«tableID»_r or t«tableID»_i). If a table has 1 billion entries this will waste 11 GB of Lightning → Importer traffic. This also allows the engine be agnostic of table ID, so that the engine file could be reused multiple times.

v3.0 KV Protocol Proposal: Change `WriteEngine` from a stream to a single message

service ImportKV {
    rpc WriteEngineV3(WriteEngineV3Request) returns (WriteEngineResponse) {}
}

message KVPair {
    bytes key = 1;
    bytes value = 2;
}

message WriteEngineV3Request {
    bytes uuid = 1;
    uint64 commit_ts = 2;
    repeated KVPair pairs = 3; 
}

Rationale:

  • In the stream version, unrecoverable errors are detected only when the stream is closed (where the server can reply one message). Therefore Lightning often only need to write 2 messages to the stream and then close it, which is a waste of resource.
  • We don't need to guarantee ordering between each streamed request.
  • Properly sending the write request needs 4 gRPC messages: Open the stream, Write the UUID, Write the KV pairs, Close the stream. This is very annoying to mock.

Here we just combine everything into one message.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.