Giter Site home page Giter Site logo

easy-cass-lab's Introduction

easy-cass-lab

This is a tool to create lab environments with Apache Cassandra in AWS.

We use packer to create a single AMI with the following:

easy-cass-lab provides tooling to create the AMI and provision the environments.

Pre-requisites

The following must be set up before using this project:

Usage

Either clone and build the repo or grab a release.

Download A Release

Download the latest release and add the project's bin directory to your PATH.

export PATH="$PATH:/Users/username/path/to/easy-cass-lab/bin"

You can skip to Build the Universal AMI

Optional: Build the Project

If you've downloaded a pre-built release, you can skip this step.

If you're using the project repo, you'll need to build the project. Fortunately, it's straightforward.

The following command should be run from the root directory of the project. Docker will need to be running for this step.

git clone https://github.com/rustyrazorblade/easy-cass-lab.git
cd easy-cass-lab
./gradlew shadowJar installdist

Build the Universal AMI

cd packer
packer init cassandra.pkr.hcl # only needs to be run the first time you setup the project
packer build base.pkr.hcl # build the base image 
packer build cassandra.pkr.hcl # extends the base image

You'll get a bunch of output from Packer. At the end, you'll see something like this:

==> Builds finished. The artifacts of successful builds are:
--> cassandra.amazon-ebs.ubuntu: AMIs were created:
us-west-2: ami-abcdeabcde1231231

That means you're ready!

Optional: Set the AMI

When you create a cluster, you can optionally pass an --ami, or set EASY_CASS_LAB_AMI.

If you don't specify an AMI, it'll use the latest AMI you've built.

Read the Help

Run easy-cass-lab without any parameters to view all the commands an options.

Create The Environment

First create a directory for the environment, then initialize it, and start the instances.

This directory is your working space for the cluster.

mkdir cluster
cd cluster
easy-cass-lab init -c 3 -s 1 myclustername # 3 node cluster with 1 stress instance

You can start your instances now.

easy-cass-lab up 

To access the cluster, follow the instructions at the end of the output of the up command:

source env.sh # to setup local environment with commands to access the cluster

# ssh to a node
ssh cassandra0
ssh cassandra1 # number corresponds to an instance
c0 # short cut to ssh cassandra0

Select The Cassandra Version

While the nodes in the cluster are up, a version isn't yet selected. Since the AMI contains multiple versions, you'll need to pick one.

To see what versions are supported, you can do the following:

easy-cass-lab list

You'll see 3.0, 3.11, 4.0, 4.1, and others.

Choose your cassandra version.

easy-cass-lab use 4.1

easy-cass-lab will automatically configure the right Python and Java versions on the instances for you.

Optional: Modify the Configuration

You'll see a file called cassandra.patch.yaml in your directory. You can add any valid cassandra.yaml parameters, and the changes will be applied to your cluster. The listen_address is handled for you, you do not need to supply it. The data directories are set up for you.

You can also edit jvm.options. Different versions of Cassandra use different names for jvm.options. easy-cass-lab handles this for you as well.

easy-cass-lab update-config # uc for short

Start The Cluster

Start the cluster. It will take about a minute to go through the startup process

easy-cass-lab start

Log In and Have Fun!

Important Directories:

# The ephemeral or EBS disk is automatically formatted as XFS and mounted here.
/mnt/cassandra 

# data files
/mnt/cassandra/data

# hints
/mnt/cassandra/hints

# commitlogs
/mnt/cassandra/commitlog

# flame graphs
/mnt/cassandra/artifacts

Multiple cassandra versions are installed at /usr/local/cassandra.

The current version is symlinked as /usr/local/cassandra/current:

ubuntu@cassandra0:~$ readlink /usr/local/cassandra/current
/usr/local/cassandra/4.1

This allows us to support updating, mixed version clusters, A/B version testing, etc.

Profiling with Flame Graphs

https://rustyrazorblade.com/post/2023/2023-11-07-async-profiler/

Using easy-cass-lab env.sh, you can run a profile and generate a flamegraph, which will automatically download after it's complete by doing the following:

c-flame cassandra0

The data will be saved in artifacts/cassandra0

Or on a node, generate flame graphs with flamegraph.

There are several convenient aliases defined in env.sh. You may substitute any cassandra host.
You may pass extra parameters, they will be passed along automatically.

Command Description
c-flame CPU Flame graph
c-flame-wall wall clock profiling, picks up I/O, parked threads filtered out
c-flame-compaction More specific wall clock profiling to compaction
c-flame-offcpu Just tracks time spent when cpu is unscheduled, mostly I/O
c-flame-sepworker Request handling, by default this is CPU time. You can add -e wall to make it wall time.

Aliases

On each node there are several aliases for commonly run commands:

command action
c cqlsh (auto use the correct hostname)
ts tail cassandra system log
nt nodetool
d cd to /mnt/cassandra/data directory
l list /mnt/cassandra/logs directory
v ls -lahG (friendly output)

Shut it Down

To tear down the entire environment, simply run the following and confirm:

easy-cass-lab down

Tools

bcc-tools is a useful package of tools

https://rustyrazorblade.com/post/2023/2023-11-14-bcc-tools/

Interested in contributing? Check out the good first issue tag first! Please read the development documentation before getting started.

easy-cass-lab's People

Contributors

rustyrazorblade avatar jrwest avatar ossarga avatar arodrime avatar adejanovski avatar velo avatar busbey avatar paliwalashish avatar

Stargazers

Sarma Pydipally avatar  avatar Crystalloide avatar

Watchers

 avatar  avatar James Cloos avatar Crystalloide avatar

easy-cass-lab's Issues

Consider breaking up build into steps

I was looking for a way of caching build artifacts with Terraform and came across [this issue](hashicorp/packer#9164. It doesn't look like there's anything built in, but we could split the build into multiple stages of a build pipeline.

We also recommend breaking up long-running builds into smaller chunks to create your own "save points", and tying builds together in a pipeline. The output artifact from one can become the input artifact to another.

We could probably break up the build into multiple phases. There are a few components that dominate the setup time:

  • compiling bcc
  • pyenv
  • compiling fio

I'm not sure how much time we'll spend in the long run rebuilding these images so it might not be worth it - hard to say.

Update `start` command

easy-cass-lab start should start whatever is in /usr/local/cassandra. via the service.

Depends on #8

on startup check for key pair in AWS

If you've manually cleaned up the keypair, you will see something along these lines from Terraform:

Error: creating EC2 Instance: InvalidKeyPair.NotFound: The key pair 'tlp-cluster-fe2e42de-5c65-4c01-bbff-fc49a56f9927' does not exist

sudo needs $PATH

ubuntu@ip-172-31-44-65:~$ /usr/local/async-profiler/bin/aprof
-bash: /usr/local/async-profiler/bin/aprof: No such file or directory
ubuntu@ip-172-31-44-65:~$ ls /usr/local/async-profiler/bin/
asprof
ubuntu@ip-172-31-44-65:~$ sudo asprof
sudo: asprof: command not found
ubuntu@ip-172-31-44-65:~$ sudo /usr/local/async-profiler/bin/asprof

Intermittent Error During `up`

Every once in a while when running up I get this error:

│ Error: creating EC2 Instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
│ 	status code: 400, request id: 80f73e95-199c-4b49-9f28-759aeef56bb4
│
│   with aws_instance.cassandra[1],
│   on terraform.tf.json line 48, in resource.aws_instance.cassandra:
│   48:       },
│
╵
╷
│ Error: creating EC2 Instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
│ 	status code: 400, request id: 0973946d-3cb1-40f6-9883-dbdf8be622ab
│
│   with aws_instance.cassandra[0],
│   on terraform.tf.json line 48, in resource.aws_instance.cassandra:
│   48:       },
│
╵
╷
│ Error: creating EC2 Instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
│ 	status code: 400, request id: c00615b2-fd07-469f-af5f-1a1f848cf530
│
│   with aws_instance.cassandra[2],
│   on terraform.tf.json line 48, in resource.aws_instance.cassandra:
│   48:       },
│
╵

Running up again as directed seems to fix it

Fix docs

There's asciidoctor documentation which should build to docs/, but I can't remember how it's regenerated. Need to figure that out and document it.

easy-cass-stress relies on a docker container to build the docs, probably best to just be consistent with that.

version: '3'

services:
    pandoc:
        image: jagregory/pandoc
        volumes:
          - ./:/source
        command: -s --toc manual/MANUAL.md -o docs/index.html

    docs:
        image: asciidoctor/docker-asciidoctor

        volumes:
        - ./manual:/documents
        - ./docs:/html

        command: asciidoctor -o /html/index.html MANUAL.adoc -a EASY_CASS_STRESS_VERSION=${EASY_CASS_STRESS_VERSION}

Configure non-root storage volumes

We need to be able to pass certain flags to --init to control how disks are configured. Here's some examples of configurations I'd like to support:

  • single drive as XFS, ZFS, ext4
  • Multiple drives using Linux software RAID using mdadm
  • Multiple drives using LVM with weird configurations like an NVMe local cache + GP2 EBS

We also want control over:

  • Scheduler
  • Readahead

Probably enough there to get the idea.

I think we could simply start with a directory of small scripts uploaded during packer provisioning that can do some basic stuff (single drive), and maybe takes optional parameters to pass to mkfs, so I can do all my fun tests without having to pre-bake every idea.

The options can be written to the terraform config and added to user_data to be executed on first startup, or run ad-hoc.

Still need to work out some details here I'm probably missing something important.

Add script to initialize data dir

I haven't thought this all the way through yet..

We need a mechanism to initialize the instance or EBS storage so we can point the data dir there. My initial thought was a script that runs at startup, but we need to make sure it only runs once and doesn't nuke the disks if we restart a node.

Long term I want to be able to run multiple disk configurations, such as EBS + instance w/ LVM cache pools, different filesystems (ext4 / xfs / zfs), or other options. I also want to be able to set read ahead on the devices.

It might make sense to support this as part of the init command, where a disk profile is selected, and then we can add as many disk profiles as we want which could just be shell scripts. I lean in this direction right now because it's easy to set up the first profile.

I think this is the workflow I'd like to have:

easy-cass-lab init --disk-profile xfs --readahead 16
easy-cass-lab up # probably where disks get configured
easy-cass-lab use 4.0.5  # sets up the symlink from the version specific dir from /usr/local/cassanda to the right version
easy-cass-lab start

Depends on #8


I put this in a different issue b/c I forgot it was here:

We need to be able to pass certain flags to --init to control how disks are configured. Here's some examples of configurations I'd like to support:

single drive as XFS, ZFS, ext4
Multiple drives using Linux software RAID using mdadm
Multiple drives using LVM with weird configurations like an NVMe local cache + GP2 EBS
We also want control over:

Scheduler
Readahead
Probably enough there to get the idea.

I think we could simply start with a directory of small scripts uploaded during packer provisioning that can do some basic stuff (single drive), and maybe takes optional parameters to pass to mkfs, so I can do all my fun tests without having to pre-bake every idea.

The options can be written to the terraform config and added to user_data to be executed on first startup, or run ad-hoc.

Still need to work out some details here I'm probably missing something important.

allow user to override java version

C* works on multiple JVMs we might want to test the effectiveness of GC, for example. use-cassandra script should allow the user to pass a JVM version

set kernel.perf_event_paranoid on first boot

Cloud init user data needs to include this:

ubuntu@ip-172-31-44-65:~$ sudo  sysctl kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1
ubuntu@ip-172-31-44-65:~$ sudo   sysctl kernel.kptr_restrict=0
kernel.kptr_restrict = 0

cqlsh not working

Just ran into this when trying to use cqlsh after sudo use-cassandra 4.1

ubuntu@ip-172-31-44-65:~$ cqlsh
Traceback (most recent call last):
  File "/usr/local/cassandra/4.1/bin/cqlsh.py", line 134, in <module>
    from cassandra.cluster import Cluster
  File "/usr/local/cassandra/4.1/bin/../lib/cassandra-driver-internal-only-3.25.0.zip/cassandra-driver-3.25.0/cassandra/cluster.py", line 33, in <module>
ModuleNotFoundError: No module named 'six.moves'

Add support for AxonOps agent

We need two things for this.

#1. the axonops agent downloads on build w/ packer. I think it's reasonable to pass that information through to packer if we make the assumption that an AMI is tied to the a packer account it was created on. Only issue would be if you shared your ami. The alternative would be to pass it during the --init, and allow an environment variable to be the default. I think on init is more work.
#2. For every version of C* we support it automatically gets the agent.

I'd like to do this without requiring the user to change their behavior aside from the additional flag on either the packer image creation or the init call.

Integrate packer the same way we do terraform

Simply for uniformity, we have our own way of tracking creds that's different from packers, so if we use the packer docker image we can make it consistent, and also support multiple regions b/c right now that's hard coded in packer.

bin/easy_cass_lab create-ami should create the ami and it would be pretty awesome if it could automatically set the default AMI on init. I haven't looked at the code that proxies the docker output to standard out in a while, but if we could look for the ami output at the end we could parse it and store it locally.

==> Wait completed after 17 minutes 15 seconds

==> Builds finished. The artifacts of successful builds are:
--> cassandra.amazon-ebs.ubuntu: AMIs were created:
us-west-2: ami-0123127f33dd3f

add command to list versions

Now that we have the ability to do custom builds that use arbitrary names, we should have a way of listing them. The yaml doc at /etc/cassandra_versions is the source of truth for this, and can even tell us additional details about it. I'm thinking something like list-versions gives a simple list and the -v flag tells us all the details. It would be nice if this was in a tabular format.

I wrote something for easy-cass-stress we could probably take and repurpose: https://github.com/rustyrazorblade/easy-cass-stress/blob/main/src/main/kotlin/com/rustyrazorblade/easycassstress/SingleLineConsoleReporter.kt#L12

Add sidecar to AMI

The sidecar should be made available in the AMI so we can start doing bulk analytics.

We'll want a systemd service to manage it as well.

support Azure

While we're currently doing things only with AWS, there's not that many places where we are tightly coupled.

Init generates terraform. This could be init-aws or init-azure easily, and we could even allow the user to pick what init does by default.

We'd need to rename the current terraform config classes to AWS Terraform, then make an azure specific one.

Startup checks should maybe ask which cloud provider you wanted to use, or ask for credentials for all of them.

I'm guessing the terraform state reading code might have to change a little, but it might not.

Support editable configuration

I think as part of the use command we should download the entire conf directory to the local directory so it can be edited.

Seeds should be populated automatically.

What's a reasonable way to have the config be edited then pushed up with the node specific variables substituted? Maybe when it's uploaded we pull the yaml into easy-cass-lab, set the listen address and then send it the modified version to the node.

In the working directory for the cluster, I'm thinking we have a conf/<version> directories, that get populated via the easy-cass-lab pull-configs command.

easy-cass-lab push-configs to push everything up.

Rethink custom builds

There's some leftover stuff for doing custom C* builds. I think this should be moved into the packer side of things, but I'm not sure. Need to figure out what this should look like from a user perspective.

support more than us-west-2

The packer config is hard coded to us-west-2.

I think this should come after #22, but it might not have to. I don't know packer well enough yet to be confident about implementation.

improve async profiler

Async profiler is added to the path. It would be nice if there was a simple wrapper that grabbed the C* pid

Depends on #15 for the start / stop for the pid.

select the right java version (allowing override)

cassandra_versions should have the java version that matches up with the C* version, and that should be set when using the use command. User should be able to override with --java. update-java-alternatives can pick the right version.

investigate flamescope

I'm not sure where the status is for being able to use async profiler w/ flamescope. If it's supported, it could be pretty awesome.

Support for sharing the cluster with folks not using `easy-cass-lab`

Support multiple people logging in by sharing env.sh with them. env.sh will need be modified to prompt for a private SSH key location if the sshConfig is not generated and there is no easy-cass-lab private key in its default location (secret.pem). It will output an sshConfig.

If there's going to be a bunch of people logging in, it would be nice to have a single command that takes a bunch of public keys (maybe in a directory) add handles authorized keys.

There's already a plan to have a thing to update configs in #17 , so I think it makes sense to put this as part of update-configs.

Improve directory layout

The cassandra tarballs are currently put in

/usr/local/

But I think we should use this directory structure:

/usr/local/cassandra_versions/[version]

For example:

/usr/local/cassandra_versions/3.0/

This will make it a bit easier for the command easy-cass-lab use 3.0 to work since then we can just fiddle with symlinks, and make /usr/local/cassandra point to /usr/local/cassandra_versions/3.0

I put a placeholder cassandra_versions.yaml

install python

Looks like python isn't installed, I think the packages are python2 and python3. I'm not sure what happens if both are installed and someone types cqlsh. Might need to do the same thing I did for java, picking a current version, but we'll see.

Set up CI

This is really not my thing, I'm not sure if there's a reason to use CircleCI anymore or if we can do this using GitHub actions.

Remove required init fields

The fields are currently hardcoded to use ticket client jira but I think this should actually be moved to an optional flag where we can specify key value pairs to tag all the resources with.

Something like this might work:

@Parameter(names = "--tag", variableArity = true)
public List<String> tags = new ArrayList<>();

Then init could look like this:

bin/easy-cass-lab --tag key=value

add rolling restart command

easy-cass-lab rolling-restart needs to restart the service, then wait for cassandra to be UP before moving to the next node.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.