Giter Site home page Giter Site logo

dsi's Introduction

Distributed Systems Infrastructure 2.0

Quick Start (Ubuntu)

sudo apt install awscli  # / pip install awscli / brew install awscli
aws configure # API credentials

ssh-keygen -m PEM -t rsa -b 2048 -C $(whoami)-dsikey \
    -f  ~/.ssh/$(whoami)-dsikey #no pass

ssh-agent bash # initialize ssh-agent, assuming you are using bash
ssh-add ~/.ssh/$(whoami)-dsikey

for a in $(aws ec2 describe-regions --query 'Regions[].{Name:RegionName}' --output text); do aws ec2 import-key-pair --key-name $(whoami)-dsikey --public-key-material file://~/.ssh/$(whoami)-dsikey.pub --region $a ; done

git clone [email protected]:10gen/dsi.git; cd dsi; git checkout stable

# Activate virtualenv / workon here if you want (python3)
pip3 install --user -r requirements.txt

curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_linux_amd64.zip 
# mac: curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_darwin_amd64.zip 
sudo unzip terraform.zip -d /usr/local/bin

WORK=any-path
$EDITOR configurations/bootstrap/bootstrap.example.yml
./bin/bootstrap.py --directory $WORK --bootstrap-file configurations/bootstrap/bootstrap.example.yml
cd $WORK

# You can put the following line in .bashrc if you don't mind adding a relative path to PATH
export PATH=./.bin:$PATH
infrastructure_provisioning.py
workload_setup.py
mongodb_setup.py
test_control.py
analysis.py
infrastructure_teardown.py

More docs to get started

  • The above steps in long form: Getting Started
  • Frequently Asked Questions
  • DSI is a complex system with hundreds of configuration options. All of them are documented under docs/config-specs/.
  • Our paper from DBTest.io 2020 describes how we developed and used DSI to test MongoDB performance.
    • The branch mongodb-2020 is a DSI version frozen in time to reflect the state of this project as described in that paper. (I've left MongoDB shortly after and removed some hard dependencies on infrastructure only available to MongoDB employees.)

Navigating and using this repo

DSI = Distributed Systems Infrastructure. At MongoDB we use this for system level performance tests where we deploy real MongoDB clusters in AWS.

DSI is the orchestrator which drives all of the below:

A key principle in developing DSI was that DSI owns and has access to all configuration. For example, we use vanilla AMI images and all system setup is in terraform/remote-scripts/system-setup.sh. If you look at a file called mongodb_setup.yml, you will see that it embeds a mongod.conf file (among other things). Similarly infrastructure_provisioning.yml embeds some input parameters to terraform *.tf files. All DSI config is in YAML. Since terraform uses JSON, DSI will convert the YAML to JSON when executing terraform.

The reasons for having all configuration in DSI are:

  • Consistency: All configuration is in the same syntax (YAML) and in a limited set of files, which always have the same names, whether you use YCSB or Linkbench.
  • Tracking: All configuration changes are committed to this repo. This avoids situations where performance changes are due to changes to a specially crafted AMI, generated by scripts in another repo, by a person on a different team.
  • Globally shared, "normalized" config: All DSI binaries always read the entire set of config files. For example, mongodb_setup.py will use the same SSH key as terraform used in infrastructure_provisioning.py.

You use DSI by creating a work directory and putting some configuration files into it. (At least once upon a time it was even possible to run all DSI commands using just defaults, without any configuration files.) This directory will also hold your terraform tfstate files, benchmark output, logs, etc...

A helper script bin/bootstrap.py is a convenient way to create a directory and copy some canned configuration files into it. In fact, we almost always use files available under configurations/. You list the combination of configs you want to use in a simple bootstrap.yml file. See configurations/bootstrap.example.yml to get started!

All configuration is in files, command line options aren't supported. This way there's a permanent record of all config that was used to create a specific benchmark result. (In CI we tar and store the entire work directory, containing both all your configuration as well as result files.) It's also simple to rerun the exact same test without having to copy paste cli options from a log file or a friend.

The effective runtime configuration is a blend of three levels of configuration:

  1. configurations/defaults.yml
  2. infrastructure_provisioning.yml, workload_setup.yml, mongodb_setup.yml, test_control.yml
  3. overrides.yml

...where later configurations override those in the former level.

The second level is split into one file per section, but are logically a single configuration. The reason for splitting into multiple files is modularity: Whether you want to deploy a 1-node or 3-node replica set, you can use the same test_control.yml with both.

The file overrides.yml is a small config file where you can conveniently add manual changes if you don't want to edit the files in level 2, as they tend to be bigger. However, editing those files is perfectly allowed too. It's up to you!

Development & Testing

Run all validations, linters and tests:

testscripts/runtests.sh

Run all the unit tests:

testscripts/run-nosetest.sh

Run a specific test:

testscripts/run-nosetest.sh bin/tests/test_bootstrap.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.