horoscope

horoscope is an optimizer inspector for DBMS.

Get Started

Install

Install using script(recommend)

curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/chaos-mesh/horoscope/master/install.sh | sh

Then open a new terminal and try horo -h.

NAME:
   horoscope - An optimizer inspector for DBMS

USAGE:
   horo [global options] command [command options] [arguments...]

COMMANDS:
   init, i     initialize workload
   test        test the optimizer
   gen, g      Generate a dynamic bench scheme
   query, q    Execute a query
   hint, H     Explain hint of a query
   explain, e  Explain analyze a query
   info        Show database information
   index       Add indexes for tables
   card        test the cardinality estimations
   split, s    Split data into several slices
   load        Load data in a directory
   help, h     Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --dsn DSN, -d DSN          set DSN of target db (default: "root:@tcp(localhost:4000)/test?charset=utf8")
   --workload DIR, -w DIR     workload DIR of horo (default: "workload")
   --json, -j                 format log with json formatter (default: false)
   --file FILE, -f FILE       set FILE to store log
   --verbose LEVEL, -v LEVEL  set LEVEL of log: trace|debug|info|warn|error|fatal|panic (default: "info")
   --max-open-conns numbers   the max numbers of connections (default: 100)
   --max-idle-conns numbers   the max numbers of idle connections (default: 20)
   --max-lifetime seconds     the max seconds of connections lifetime (default: 10)
   --not-save                 do not save options (default: false)
   --help, -h                 show help (default: false)

Build from source

git clone https://github.com/chaos-mesh/horoscope.git
make

Then try bin/horo -h.

Initialize Workload

Enter a clean directory and execute:

horo init

You may fail because there is no tidb or mysql server listening on localhost:4000. A custom data source name or workload directory is also supported:

horo -d "root@tcp(172.20.1.1)/test" -w . init

All options will be saved in horo.json of current directory:

{
    "main": {
        "workload": ".",
        "dsn": "root@tcp(172.20.1.1:4000)/test",
        "json_formatter": false,
        "log_file": "",
        "verbose": "trace",
        "pool": {
            "max_open_conns": 100,
            "max_idle_conns": 20,
            "max_life_seconds": 10
        }
    },
    "bench": {
        "round": 1,
        "need_prepare": false,
        "disable_collect_card_error": false,
        "no_verify": false,
        "report_fmt": "table"
    },
    "card": {
        "columns": "",
        "type": "emq",
        "timeout": 0
    },
    "query": {
        "plan_id": 0
    },
    "generate": {
        "queries": 20,
        "and_op_weight": 3,
        "mode": "op-compose",
        "generator": {
            "max_tables": 3,
            "min_duration_threshold": 10000000,
            "limit": 100,
            "key_only": false,
            "unstable_order_by": false,
            "max_by_items": 3,
            "enable_key_map": false,
            "aggregate_weight": 0.5
        }
    },
    "index": {
        "max_indexes": 10,
        "compound_level": 1,
        "reserve_indexes": false
    },
    "info": {
        "table": ""
    },
    "load": {
        "data_source": ""
    },
    "split": {
        "group": "",
        "slices": 100,
        "batch_size": 100,
        "use_bit_array": false
    }
}

Generate Queries
```
horo gen -c 3
```
Start Benching
```
horo test -p
```

Bench effectiveness

horo -w benchmark/tpch test -p -r 4

Bench cardinality estimation

For example, measures the EMQ(exact match queries) row cnt error on customer.C_NAME for total 100 seconds.

horo card -columns 'customer.C_NAME' -type emq -timeout 100s

Summary report

There will generate a summary report after bench sub-command is finished.

+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID  | #PLAN SPACE | DEFAULT EXECUTION TIME | BEST PLAN EXECUTION TIME | EFFECTIVENESS | BETTER OPTIMAL PLANS            | ESTROW Q-ERROR                                                     | QUERY                                                                                                                                                                                                                                                                                                                                                                        |
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| q3  |          11 | 12061.0ms ±11%         | 5401.8ms ±19%            | 72.7%         | #6(44.8%),#10(66.4%),#11(47.3%) | count:3, median:1.0, 90th:7173270.0, 95th:7173270.0, max:7173270.0 | SELECT l_orderkey,sum(l_extendedprice*(1-l_discount)) AS revenue,o_orderdate,o_shippriority FROM ((customer) JOIN orders) JOIN lineitem WHERE c_mktsegment="AUTOMOBILE" AND c_custkey=o_custkey AND l_orderkey=o_orderkey AND o_orderdate<"1995-03-13" AND l_shipdate>"1995-03-13" GROUP BY l_orderkey,o_orderdate,o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10 |
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

ID: query id
#PLAN SPACE: the plan space size of a query
DEFAULT EXECUTION TIME: the execution time of default plan, giving in the format of "Mean ±Diff", "Mean" is the mean value of round rounds, and "Diff" is the lower/upper bound of the mean value
BEST PLAN EXECUTION TIME: the execution time of the best plan
EFFECTIVENESS: the percent of the execution time of the default plan better than others on plan space
- We use Pd to represent the default plan generated for the query, Pi as one of plan on plan space
- If execution time(Pi) < 0.9 * execution time(Pd), Pi is a better plan
BETTER OPTIMAL PLANS: gives the better plan, each item is giving in the format of "nth_plan id(execution time / default execution time)"
ESTROW Q-ERROR: Base table row cnt estimation q-error for each query
QUERY: the query

Dataset

We integrate the SQL queries of TPCH, TPCDS, SSB, and JOB benchmarks on the repo, you can use go-tpc and tidb-bench to import the dataset.

For the JOB benchmark, join-order-benchmark is helpful.

Index selection fuzz

Refer to index selection fuzz

pingcap-qe / horoscope Goto Github PK

horoscope's Introduction

horoscope

Get Started

Bench effectiveness

Bench cardinality estimation

Summary report

Dataset

Index selection fuzz

License

horoscope's People

Contributors

Stargazers

Watchers

Forkers

horoscope's Issues

Recommend Projects

Recommend Topics

Recommend Org