vitessio / arewefastyet Goto Github PK

Automated Benchmarking System for Vitess

License: Apache License 2.0

CSS 0.86% Makefile 0.29% Shell 0.75% Go 48.33% HCL 0.64% Jinja 2.08% JavaScript 0.20% HTML 0.29% Dockerfile 0.39% TypeScript 46.17%

benchmark cncf vitess

arewefastyet's Introduction

Arewefastyet

Pull Request after Pull Request, the Vitess codebase changes a lot. We must ensure that the performance of the codebase is not diminishing over time. Arewefastyet automatically tests the performance of Vitess by benchmarking it using several workloads. The performance is compared against the main branch, release branches and recent git tags, along with custom git SHA.

Pull Request needing benchmarks

When someone wants to know if a Pull Request will affect the performance of Vitess, one might wish to benchmark it before merging it. This can be done by setting the Benchmark me label to your Pull Request. Arewefastyet will then start benchmarking the head commit of your Pull Request and to compare against the Pull Request's base.

How to run

Arewefastyet uses Docker and Docker Compose to easily run on any environment. You will need to install both tools before running arewefastyet.

Moreover, some secrets are required to run arewefastyet correctly which can be provided by a maintainer of Vitess. Those secrets will allow you to connect to the arewefastyet database, to connect to the remote benchmarking server etc.

Locally

docker compose build
docker compose up

Production

docker compose -f docker-compose.prod.yml build
docker compose -f docker-compose.prod.yml up

arewefastyet's People

Contributors

Stargazers

Watchers

Forkers

isabella232 frouioui 418sec akilan1999 systay guptamanan100 vkozjak mscoutermarsh vmg suryatmodulus chalin rohankumardubey arafatabdussalam invibeast sohamratnaparkhi aayush-1412 sumitmaithani punithnayak aditygrg2 shantkhatri ashutosh-rath02 hetvisoni kirtanchandak sivasathyaseeelan doyedesigns atulrajput01 a-singh09 roopeshsn karmpatel960 projec9t prathamesh-mutkure mayankbansal12 mianhk camillemtd marsian83 seanpm2001 nityam24 maniktherana gmin2 akhilj321 dhairyamajmudar ibishal coderakhand connectbhawna atulyadav745 ari1009 vishalk91-4 divyamdotfoo whatdeepak shivansh-bhatnagar18 mohitvdx surajgjadhav aman-manwani pranitpatil03 mejack43 vishalmishraa

arewefastyet's Issues

Integrating add graphs for OLTP and TPCC runs

Add OLTP and TPCC graphs
Change Query to displays commit for past 30 days

Specify the instance type on which to run benchmarks

Description

Equinix Metal is composed of a large bare metal server fleet listed here. As of today, we only use m2.xlarge.x86 for every task. A more dynamic configuration would be preferable so we can execute tasks on a broader variety of hardware, thus increasing our overall confidence.

A concrete example of this is #62, microbenchmarks are lightweight processes that can be run on a single-core machine.

Initial thoughts on implementation

This issue can be resolved by adding a field in the configuration file (and CLI flags), the new field can be the string we give to Equinix API.

Cleaner CLI flag names

Setup cleaner and easy-to-understand CLI flag names.

For instance, the following flags:

--commit
--source
--inventory-file

Can be put under the --tasks category, transforming them into --tasks-commit, --tasks-source, --tasks-inventory-file.

UI bugs

Ensure search and compare display TPCC results

Converting python server module to go

Reimplementing entire python flask code to Go Gin framework.

Benchmark compatible with latest version of Ansible and Python 3.8

Error when running with latest version of ansible:

GPG key for YUM
Extra CNF file on VTTablet not being called

Code to change to make compatible for python 3.8

(initialize_benchmark.py)

def recursive_dict(data,ip):
     for k, v in data.items():
        if isinstance(data[k],dict) and k == "hosts":
            data[k] = recursive_dict_ip(data[k],ip)
        elif isinstance(data[k],dict):
            data[k] = recursive_dict(data[k],ip)
     return data

def recursive_dict_ip(data,ip):
    for k, v in data.items():
        old_key = k
        data[ip] = data.pop(old_key)
    return data

Exceeding GH API rates

Using GitHub API without being authenticated exposes us to API rate limit issue. This issue initially took place during local development when repeatedly running unit tests, precisely the TestBuildAnsibleInventoryFile test suite.

The following may appear:

API rate limit exceeded for xx.xx.xx.xx. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

Link to their documentation page here.

Nightly test

Task:

To create a vps using packet
- Deploy Spot Market Servers
  - Location : NY5 - Secaucus, NJ
  - n2large.xlarge.x86
  - OS : Ubuntu 20.04 or CentOS 8
Run @dkhenry Ansible or Build Vitess and run Sysbench on Vitess Mysql
Create a scheduler that runs every night and stores the result in a database

Upload profiles to AWS S3

Profiles dump located in the report directory should be uploaded to S3. This option can be toggled using CLI flags.

Single command to provision infra and start a microbenchmark

The CLI should be composed of a single command allowing us to run a microbenchmark. The interface used should be general enough so that, in the future, we can use the same command and piece of code to execute other tasks such as TPCC and OLTP.

Render microbenchmarks in the UI through a table

Microbenchmarks data are currently stored in multiple MySQL tables, however, we want them to be accessible and available to anyone. For that, we will need to add a table allowing us to visualize all the benchmarks that were run on Vitess's latest master commit.

The table will be pretty simple at first, from there, we can increment.

package name	benchmark name	results	last commit diff	last release diff

To keep it simple for now, results can be represented by the ns/op metric, diffs will also be based on that metric.

Enable tasks on pull requests and commits other than HEAD

Description

As of today launching a task on a commit other than HEAD will fail for multiple reasons:

If the commit is not within the repo's origin Ansible will fail to find the ref.
Commits other than HEAD are set to null in the built Ansible inventory file, making the role fail.

Use cases

Besides fixing the fact that we can't run on other commits than HEAD, and regarding the upcoming features of arewefastyet, we want to be able to benchmark incoming changes to Vitess's master branch. Thus, we need to enable checkout to pull request refs.

Add Makefile

Add a simple Makefile allowing us to run generic tasks. The Makefile should support:

Dependance installation
Run tests
Run a benchmark

Update Makefile

Change OLTP and TPCC targets

GitHub API to get commit SHAs

Add LICENSE.md

A license file should be added to this repository.

Configure infrastructure with Ansible for Microbenchmarks

Description

Newly provisioned infrastructures need to be configured to host Microbenchmark runs by using Ansible. As of today, the Ansible configuration we have allows us to configure the bare metal (disk, network, ...), download and build Vitess, and setup a Vitess cluster based on the configuration given through an Ansible Playbook.

This issue is part of the push MVP Microbenchmarks.

Link the new Go CLI to the Go WebUI

Description

Now that we have switched both CLI and WebUI from Python to Golang, we need to link them together. The server has its own main which needs to be transformed into a func server(config ServerConfig) function or else. That function will then be called from the cmd package.

This issue is part of the sprint MVP microbenchmarks.

Slack module

Create slack module to send benchmarks results to a slack channel.

Read slack api token from configuration file
Send slack messages

Large profiles are not recorded properly

After running large benchmarks (approx 30 minutes), profiles are no longer readable by go tool pprof and return an unexpected EOF error.
To solve this issue, we can use SIGUSR1 to indicate vitess needs to stop profiling and start writing the profiles.

cc vitessio/vitess#7594

Make sure Rest Api's and Cli match

Ensure all Cli commands are a part of the REST API's.
(To talk about in open hour)
https://github.com/danielgtaylor/openapi-cli-generator

Add the creation of Equinix packet devices to the Go CLI

A new CLI command needs to be implemented, in the new Go CLI, to allow the creation of Equinix packet devices.

Execute tasks on different Golang versions

Allowing Golang version to be changed for each run would bring us more confidence and metrics on how well we perform. In fact, similar strategies can be adopted for unit and E2E testing, comparing Vitess with at least Golang's last three minor versions would be a nice add.

Add possibility to change go lang version in inventory file
Add to Rest Api and Cli
Cli and. Rest Api commands to display go version runnable because md5sum hash
Store Go lang version in mysql database

System stats where benchmarks are running

Description

All the system stats where benchmarks are running must be recorded and kept for deeper analysis of benchmark results. System stats include the system itself, Golang stats, and all of Vitess's processes.

Implementation

Use Prometheus to scrap data from our processes and systems, and write this data to an influxdb.

Add Cli to run benchmark

Create Cli to run benchmarks in different server sizes , inventories and other parameters

Start vitess servers with pprof option

Context

This issue will start a list of upcoming features and enhancements that enable profiling of Vitess throughout benchmarks.

Luckily, @vmg has implemented vitessio/vitess#7496 easing the profiling of Vitess.

Description

This initial issue requests the addition of the -pprof flag to Vitess servers. These changes will have to take place within our Ansible configuration and also the CLI.

Let's start with the CLI, we want the activation of -pprof flag to be dynamic, thus a new CLI flag will be required. Moreover, additional configuration regarding profiling (path, etc) could be used.

Finally, Ansible, the flags we pass down to (for instance) vtgate will need to be changed. As we can see in the snippet below, flags are listed one by one using Jinja and ultimately we will need to make -pprof dynamic as well:

arewefastyet/ansible/roles/vtgate/templates/[email protected]

Lines 24 to 41 in 05bb6ba

    
           ExecStart=/bin/bash -c 'vtgate \ 
        
               -gateway_implementation discoverygateway \ 
        
               -service_map "grpc-vtgateservice" \ 
        
               -alsologtostderr \ 
        
               -enable_buffer \ 
        
               -buffer_size=${VTGATE_BUFFER_SIZE} \ 
        
               -buffer_max_failover_duration ${VTGATE_BUFFER_DURATION} \ 
        
               -cell ${CELL} \ 
        
               -cells_to_watch ${CELL}${ADDITIONAL_CELLS} \ 
        
               -mysql_server_port ${MYSQL_PORT} \ 
        
               -mysql_server_socket_path ${VTROOT}/socket/gateway-%i.sock \ 
        
               -grpc_port ${GRPC_PORT} \ 
        
               -port ${VTGATE_PORT} \ 
        
               -mysql_auth_server_impl none \ 
        
               -topo_global_root ${TOPO_GLOBAL_ROOT} \ 
        
               -topo_implementation ${TOPO_IMPLEMENTATION} \ 
        
               -topo_global_server_address ${TOPO_GLOBAL_SERVER_ADDRESS} \ 
        
               ${EXTRA_VTGATE_FLAGS}'

Use Molecule with CLI

Specify the MySQL version on which to run the benchmarks

Users should be able to specify the MySQL version they want to use for a benchmarking task.
To Do:

Change in Cli
Integrating to ansible inventory file
Getting mysql GPG key or using galaxy library for mysql
Storing mysql version in mysql database

Add contribution guidelines

Add and create new contribution guidelines for this project.

Create sub-configuration managers

As the number of CLI flags increases, so does the number of parameters for our functions.

This issue proposes to fix this by implementing sub-configuration classes/managers. The Config class will remain the "default" configuration holder, but will now contain subclasses that are focused on a single topic (tasks conf, slack conf, MySQL conf, Web conf, etc).

Create Docs for Api

Create docs for Api calls

View all runs
Trigger run
Add scheduler

Profiling runs (Implement it in Go)

1. Profiling information for each run

Vitess broadcasts profiling information
Storing profiling information
Automatically retrieveing information of pprof file
Capture them and display flamegraph

MVP Microbenchmarks Tracking

This issue tracks all known tasks needed to accomplish a feasible MVP of the Microbenchmarks:

Rolling update of the server #42 #97
Microbenchmarks: use existing tests #70
Specify the instance type on which to run benchmarks #75 #85
Addition of sample Equinix Terraform create node #77
Render microbenchmarks in the UI through a table #79 #95
Add the creation of Equinix packet devices to the Go CLI #82 #83
Link the new Go CLI to the Go WebUI #86 #90
Configure infrastructure with Ansible for Microbenchmarks #87 #89
Execute microbenchmarks on an infra with ansible in a single command #92 #93

Removing vtctl from ansible

Ensuring that we don't have to build Vitess locally when running benchmarks

Implement Go linter

Implement Go linter as used in the Vitess repository:

Linter GitHook
Linter in GitHub action

Reuse the same packet device across different run

We are currently creating a new packet device for each individual run. Creating a new packet device takes up a lot of time, monetary resources, etc.

When in development, the time taken to create a new device is significantly increasing lead time. In fact, it takes about 10 minutes to spin up an OLTP test, where the test itself runs for less than a minute and the creation of the device takes approximately 9 minutes.

Instead, we can reuse the same device for a batch of runs/tasks. The Ansible codebase already integrates this feature. Current Ansible playbooks and roles allow us to "clean" the previous state. Among them, the following snippet enables us to stop previous services and remove all previously used Vitess data:

arewefastyet/ansible/roles/vitess_build/tasks/clean.yml

Lines 1 to 25 in 4f66264

    
           - name: Turn off everything 
        
             shell: | 
        
               systemctl stop mysqlctld@* || /bin/true 
        
               systemctl stop vttablet@* || /bin/true 
        
               systemctl stop vtctld@* || /bin/true 
        
               systemctl stop cell@* || /bin/true 
        
               systemctl stop vtgate@* || /bin/true 
        
             ignore_errors: yes 
        
             changed_when: false 
        
           - name: Find Elements to remove 
        
             find: 
        
               path: /vt 
        
               file_type: directory 
        
             register: directories 
        
           - debug: 
        
               msg: '{{ directories }}' 
        
           - name: Remove Elements 
        
             file: 
        
               state: absent 
        
               path: '{{ item.path }}' 
        
             loop: '{{ directories.files }}' 
        
             when: clean is defined and ((directories.files | length) > 0)

Moreover, the full.yml file, which is the "main" playbook we use, is "customized" as followed:

arewefastyet/ansible/full.yml

Lines 2 to 3 in 4f66264

    
           - import_playbook: provision.yml 
        
             when: provision is defined

When running ansible-playbook as shown in the following snippet, we can omit provision as explained in this doc, that way we can use the previous provisioned device by using the built Ansible file we used in ./ansible/build/.

arewefastyet/run-oltp

Line 15 in 6aea9b4

    
           ANSIBLE_HOST_KEY_CHECKING=False ../benchmark/bin/ansible-playbook --private-key=~/.ssh/id_rsa -i build/$1 full.yml -u root -e provision=True -e clean=True

In order to integrate this with the current run-benchmark, changes are required in how we execute ansible-playbook.

Add go unit test to GitHub Actions

Description

Recently a bunch of go code got added along with their unit tests. These tests must be tested through actions to validate or invalidate commits.

Release report

Add all Cli commands to the docs folder

Add them to docs folder

Reduce repetition when documenting new comments

Originally posted by @Akilan1999 in #33 (comment)

Command microbench run not returning status 0 when pkg path is incorrect

Description

When we feed the microbench run command with a wrong pkg path the program will still exit 0. The command below can reproduce the bug:

arewefastyetcli microbench run invalid_path output.txt --config /tmp/config.yaml

echo $?
0

Add the benchmarks as part of github actions and fails PR if sees degradation

Allow PRs and commits in vitess to run a benchmarking job through GitHub Actions. This will allow us to verify the performance impacts of incoming changes.

Microbenchmarks: Use existing benchmark tests

The existing performance benchmarks are end-to-end. This is really great.

Just like in testing, end-to-end is well supported by testing/benchmarking of smaller units.

Today, we have a few func BenchFoo(b *testing.B) method sprinkled throughout the code base.

It would be awesome if we could pick use these benchmarks and just start running them, collecting numbers and then creating a pretty charts for them.

Add GitHub Actions workflow for run-benchmark

AreWeFastYet should have a way to automatically check and validate PR status. We need to ensure that future changes are not affecting the way that Ansible and our python scripts are working.

We must be able to verify that we can run a smaller set of benchmarks with Ansible using run-benchmark.py.

Add Ansible Api

Use Ansible Api to handle errors better

Originally posted by @Akilan1999 in #33 (comment)

CentOS is planning to be discontinued. This means no support for CentOS

Problem
CentOS is planning to be discontinued. This means no support for CentOS

Solution
Make ansibles run on Debian based system

Building a git hook

The hooks are built for pre checks before commits

DCO check

Use Packet Terraform Provider

Description

Packet has a provider on Terraform we can use to manage instances. Where we used the Packet library before, we can now use the Terraform provider. Using it will increase task reproducibility as well as our overall confidence in the underlying infrastructure used to run tasks.

My thought on how the complete setup should look like is:

Having a "main" Terraform directory hosting the required configuration to run a single-node on Packet. #77
For each task we run, copy that "main" Terraform directory to the task's build directory. #93
Proceed to initialize Terraform in the task's build directory. #93
Plan and Apply the configuration. #83
Later on, destroy the plan. #99 #101

Because these steps are too numerous and broad, this issue will focus on the first step. But will be used as a reference for future issues.

Rolling update of the server

Setup and automate the deployment of the benchmark server on new releases and commits to master. This should be achieved through GitHub Actions.

Arewefastyet HEAD = deployed server HEAD

Destroy terraform infrastructure after running an exec command

Description

After running the exec command, the infrastructure that was created must be deleted. After the Ansible task finished, we must apply a terraform destroy in the run's directory using the go library.

create a flag to delete VPS

Add flag to cli to delete packet VPS

Originally posted by @Akilan1999 in #33 (comment)

	ExecStart=/bin/bash -c 'vtgate \
	-gateway_implementation discoverygateway \
	-service_map "grpc-vtgateservice" \
	-alsologtostderr \
	-enable_buffer \
	-buffer_size=${VTGATE_BUFFER_SIZE} \
	-buffer_max_failover_duration ${VTGATE_BUFFER_DURATION} \
	-cell ${CELL} \
	-cells_to_watch ${CELL}${ADDITIONAL_CELLS} \
	-mysql_server_port ${MYSQL_PORT} \
	-mysql_server_socket_path ${VTROOT}/socket/gateway-%i.sock \
	-grpc_port ${GRPC_PORT} \
	-port ${VTGATE_PORT} \
	-mysql_auth_server_impl none \
	-topo_global_root ${TOPO_GLOBAL_ROOT} \
	-topo_implementation ${TOPO_IMPLEMENTATION} \
	-topo_global_server_address ${TOPO_GLOBAL_SERVER_ADDRESS} \
	${EXTRA_VTGATE_FLAGS}'

	- name: Turn off everything
	shell: \|
	systemctl stop mysqlctld@* \|\| /bin/true
	systemctl stop vttablet@* \|\| /bin/true
	systemctl stop vtctld@* \|\| /bin/true
	systemctl stop cell@* \|\| /bin/true
	systemctl stop vtgate@* \|\| /bin/true
	ignore_errors: yes
	changed_when: false

	- name: Find Elements to remove
	find:
	path: /vt
	file_type: directory
	register: directories

	- debug:
	msg: '{{ directories }}'

	- name: Remove Elements
	file:
	state: absent
	path: '{{ item.path }}'
	loop: '{{ directories.files }}'
	when: clean is defined and ((directories.files \| length) > 0)

vitessio / arewefastyet Goto Github PK

arewefastyet's Introduction

Pull Request needing benchmarks

How to run

Locally

Production

arewefastyet's People

Contributors

Stargazers

Watchers

Forkers

arewefastyet's Issues

Description

Initial thoughts on implementation

Error when running with latest version of ansible:

Code to change to make compatible for python 3.8

Description

Use cases

Description

Description

Description

Implementation

Context

Description

Create docs for Api calls

1. Profiling information for each run

Description

Description

Description

Description

Recommend Projects

Recommend Topics

Recommend Org