rustyrazorblade / easy-cass-lab Goto Github PK

View Code? Open in Web Editor NEW

3.0 4.0 3.0 1.96 MB

License: Other

Kotlin 55.48% Shell 11.42% Jsonnet 25.99% Dockerfile 0.97% HCL 3.85% Python 2.29%

easy-cass-lab's Introduction

easy-cass-lab

This is a tool to create lab environments with Apache Cassandra in AWS.

We use packer to create a single AMI with the following:

Multiple versions of Cassandra
bcc tools, learn about these tools here
async-profiler, learn about it here
easy-cass-stress
AxonOps agent (free monitoring up to six nodes)

easy-cass-lab provides tooling to create the AMI and provision the environments.

Pre-requisites

The following must be set up before using this project:

Usage

Either clone and build the repo or grab a release.

Download A Release

Download the latest release and add the project's bin directory to your PATH.

export PATH="$PATH:/Users/username/path/to/easy-cass-lab/bin"

You can skip to Build the Universal AMI

Optional: Build the Project

If you've downloaded a pre-built release, you can skip this step.

If you're using the project repo, you'll need to build the project. Fortunately, it's straightforward.

The following command should be run from the root directory of the project. Docker will need to be running for this step.

git clone https://github.com/rustyrazorblade/easy-cass-lab.git
cd easy-cass-lab
./gradlew shadowJar installdist

Build the Universal AMI

cd packer
packer init cassandra.pkr.hcl # only needs to be run the first time you setup the project
packer build base.pkr.hcl # build the base image 
packer build cassandra.pkr.hcl # extends the base image

You'll get a bunch of output from Packer. At the end, you'll see something like this:

==> Builds finished. The artifacts of successful builds are:
--> cassandra.amazon-ebs.ubuntu: AMIs were created:
us-west-2: ami-abcdeabcde1231231

That means you're ready!

Optional: Set the AMI

When you create a cluster, you can optionally pass an --ami, or set EASY_CASS_LAB_AMI.

If you don't specify an AMI, it'll use the latest AMI you've built.

Read the Help

Run easy-cass-lab without any parameters to view all the commands an options.

Create The Environment

First create a directory for the environment, then initialize it, and start the instances.

This directory is your working space for the cluster.

mkdir cluster
cd cluster
easy-cass-lab init -c 3 -s 1 myclustername # 3 node cluster with 1 stress instance

You can start your instances now.

easy-cass-lab up

To access the cluster, follow the instructions at the end of the output of the up command:

source env.sh # to setup local environment with commands to access the cluster

# ssh to a node
ssh cassandra0
ssh cassandra1 # number corresponds to an instance
c0 # short cut to ssh cassandra0

Select The Cassandra Version

While the nodes in the cluster are up, a version isn't yet selected. Since the AMI contains multiple versions, you'll need to pick one.

To see what versions are supported, you can do the following:

easy-cass-lab list

You'll see 3.0, 3.11, 4.0, 4.1, and others.

Choose your cassandra version.

easy-cass-lab use 4.1

easy-cass-lab will automatically configure the right Python and Java versions on the instances for you.

Optional: Modify the Configuration

You'll see a file called cassandra.patch.yaml in your directory. You can add any valid cassandra.yaml parameters, and the changes will be applied to your cluster. The listen_address is handled for you, you do not need to supply it. The data directories are set up for you.

You can also edit jvm.options. Different versions of Cassandra use different names for jvm.options. easy-cass-lab handles this for you as well.

easy-cass-lab update-config # uc for short

Start The Cluster

Start the cluster. It will take about a minute to go through the startup process

easy-cass-lab start

Log In and Have Fun!

Important Directories:

# The ephemeral or EBS disk is automatically formatted as XFS and mounted here.
/mnt/cassandra 

# data files
/mnt/cassandra/data

# hints
/mnt/cassandra/hints

# commitlogs
/mnt/cassandra/commitlog

# flame graphs
/mnt/cassandra/artifacts

Multiple cassandra versions are installed at /usr/local/cassandra.

The current version is symlinked as /usr/local/cassandra/current:

ubuntu@cassandra0:~$ readlink /usr/local/cassandra/current
/usr/local/cassandra/4.1

This allows us to support updating, mixed version clusters, A/B version testing, etc.

Profiling with Flame Graphs

https://rustyrazorblade.com/post/2023/2023-11-07-async-profiler/

Using easy-cass-lab env.sh, you can run a profile and generate a flamegraph, which will automatically download after it's complete by doing the following:

c-flame cassandra0

The data will be saved in artifacts/cassandra0

Or on a node, generate flame graphs with flamegraph.

There are several convenient aliases defined in env.sh. You may substitute any cassandra host.
You may pass extra parameters, they will be passed along automatically.

Command	Description
`c-flame`	CPU Flame graph
`c-flame-wall`	wall clock profiling, picks up I/O, parked threads filtered out
`c-flame-compaction`	More specific wall clock profiling to compaction
`c-flame-offcpu`	Just tracks time spent when cpu is unscheduled, mostly I/O
`c-flame-sepworker`	Request handling, by default this is CPU time. You can add -e wall to make it wall time.

Aliases

On each node there are several aliases for commonly run commands:

command	action
c	cqlsh (auto use the correct hostname)
ts	tail cassandra system log
nt	nodetool
d	cd to /mnt/cassandra/data directory
l	list /mnt/cassandra/logs directory
v	ls -lahG (friendly output)

Shut it Down

To tear down the entire environment, simply run the following and confirm:

easy-cass-lab down

Tools

bcc-tools is a useful package of tools

https://rustyrazorblade.com/post/2023/2023-11-14-bcc-tools/

Interested in contributing? Check out the good first issue tag first! Please read the development documentation before getting started.

easy-cass-lab's People

Contributors

Stargazers

Watchers

Forkers

arodrime adejanovski paliwalashish

easy-cass-lab's Issues

Consider breaking up build into steps

I was looking for a way of caching build artifacts with Terraform and came across [this issue](hashicorp/packer#9164. It doesn't look like there's anything built in, but we could split the build into multiple stages of a build pipeline.

We also recommend breaking up long-running builds into smaller chunks to create your own "save points", and tying builds together in a pipeline. The output artifact from one can become the input artifact to another.

We could probably break up the build into multiple phases. There are a few components that dominate the setup time:

compiling bcc
pyenv
compiling fio

I'm not sure how much time we'll spend in the long run rebuilding these images so it might not be worth it - hard to say.

Update `start` command

easy-cass-lab start should start whatever is in /usr/local/cassandra. via the service.

Depends on #8

pull terraform if it's not local

From a cleanliness perspective, it's probably best to put this in the java code before trying to use the image.

on startup check for key pair in AWS

If you've manually cleaned up the keypair, you will see something along these lines from Terraform:

Error: creating EC2 Instance: InvalidKeyPair.NotFound: The key pair 'tlp-cluster-fe2e42de-5c65-4c01-bbff-fc49a56f9927' does not exist

Running `down`, can't complete

This happens occasionally but I have no idea why.

sudo needs $PATH

ubuntu@ip-172-31-44-65:~$ /usr/local/async-profiler/bin/aprof
-bash: /usr/local/async-profiler/bin/aprof: No such file or directory
ubuntu@ip-172-31-44-65:~$ ls /usr/local/async-profiler/bin/
asprof
ubuntu@ip-172-31-44-65:~$ sudo asprof
sudo: asprof: command not found
ubuntu@ip-172-31-44-65:~$ sudo /usr/local/async-profiler/bin/asprof

Intermittent Error During `up`

Every once in a while when running up I get this error:

│ Error: creating EC2 Instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
│ 	status code: 400, request id: 80f73e95-199c-4b49-9f28-759aeef56bb4
│
│   with aws_instance.cassandra[1],
│   on terraform.tf.json line 48, in resource.aws_instance.cassandra:
│   48:       },
│
╵
╷
│ Error: creating EC2 Instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
│ 	status code: 400, request id: 0973946d-3cb1-40f6-9883-dbdf8be622ab
│
│   with aws_instance.cassandra[0],
│   on terraform.tf.json line 48, in resource.aws_instance.cassandra:
│   48:       },
│
╵
╷
│ Error: creating EC2 Instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
│ 	status code: 400, request id: c00615b2-fd07-469f-af5f-1a1f848cf530
│
│   with aws_instance.cassandra[2],
│   on terraform.tf.json line 48, in resource.aws_instance.cassandra:
│   48:       },
│
╵

Running up again as directed seems to fix it

bundle node exporter

Fix docs

There's asciidoctor documentation which should build to docs/, but I can't remember how it's regenerated. Need to figure that out and document it.

easy-cass-stress relies on a docker container to build the docs, probably best to just be consistent with that.

version: '3'

services:
    pandoc:
        image: jagregory/pandoc
        volumes:
          - ./:/source
        command: -s --toc manual/MANUAL.md -o docs/index.html

    docs:
        image: asciidoctor/docker-asciidoctor

        volumes:
        - ./manual:/documents
        - ./docs:/html

        command: asciidoctor -o /html/index.html MANUAL.adoc -a EASY_CASS_STRESS_VERSION=${EASY_CASS_STRESS_VERSION}

Configure non-root storage volumes

We need to be able to pass certain flags to --init to control how disks are configured. Here's some examples of configurations I'd like to support:

single drive as XFS, ZFS, ext4
Multiple drives using Linux software RAID using mdadm
Multiple drives using LVM with weird configurations like an NVMe local cache + GP2 EBS

We also want control over:

Scheduler
Readahead

Probably enough there to get the idea.

I think we could simply start with a directory of small scripts uploaded during packer provisioning that can do some basic stuff (single drive), and maybe takes optional parameters to pass to mkfs, so I can do all my fun tests without having to pre-bake every idea.

The options can be written to the terraform config and added to user_data to be executed on first startup, or run ad-hoc.

Still need to work out some details here I'm probably missing something important.

snapshot command to back up prometheus

https://prometheus.io/docs/prometheus/2.1/querying/api/#snapshot

Add script to initialize data dir

I haven't thought this all the way through yet..

We need a mechanism to initialize the instance or EBS storage so we can point the data dir there. My initial thought was a script that runs at startup, but we need to make sure it only runs once and doesn't nuke the disks if we restart a node.

Long term I want to be able to run multiple disk configurations, such as EBS + instance w/ LVM cache pools, different filesystems (ext4 / xfs / zfs), or other options. I also want to be able to set read ahead on the devices.

It might make sense to support this as part of the init command, where a disk profile is selected, and then we can add as many disk profiles as we want which could just be shell scripts. I lean in this direction right now because it's easy to set up the first profile.

I think this is the workflow I'd like to have:

easy-cass-lab init --disk-profile xfs --readahead 16
easy-cass-lab up # probably where disks get configured
easy-cass-lab use 4.0.5  # sets up the symlink from the version specific dir from /usr/local/cassanda to the right version
easy-cass-lab start

Depends on #8

I put this in a different issue b/c I forgot it was here:

We need to be able to pass certain flags to --init to control how disks are configured. Here's some examples of configurations I'd like to support:

single drive as XFS, ZFS, ext4
Multiple drives using Linux software RAID using mdadm
Multiple drives using LVM with weird configurations like an NVMe local cache + GP2 EBS
We also want control over:

Scheduler
Readahead
Probably enough there to get the idea.

I think we could simply start with a directory of small scripts uploaded during packer provisioning that can do some basic stuff (single drive), and maybe takes optional parameters to pass to mkfs, so I can do all my fun tests without having to pre-bake every idea.

The options can be written to the terraform config and added to user_data to be executed on first startup, or run ad-hoc.

Still need to work out some details here I'm probably missing something important.

allow user to override java version

C* works on multiple JVMs we might want to test the effectiveness of GC, for example. use-cassandra script should allow the user to pass a JVM version

set kernel.perf_event_paranoid on first boot

Cloud init user data needs to include this:

ubuntu@ip-172-31-44-65:~$ sudo  sysctl kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1
ubuntu@ip-172-31-44-65:~$ sudo   sysctl kernel.kptr_restrict=0
kernel.kptr_restrict = 0

cqlsh not working

Just ran into this when trying to use cqlsh after sudo use-cassandra 4.1

ubuntu@ip-172-31-44-65:~$ cqlsh
Traceback (most recent call last):
  File "/usr/local/cassandra/4.1/bin/cqlsh.py", line 134, in <module>
    from cassandra.cluster import Cluster
  File "/usr/local/cassandra/4.1/bin/../lib/cassandra-driver-internal-only-3.25.0.zip/cassandra-driver-3.25.0/cassandra/cluster.py", line 33, in <module>
ModuleNotFoundError: No module named 'six.moves'

Add support for AxonOps agent

We need two things for this.

#1. the axonops agent downloads on build w/ packer. I think it's reasonable to pass that information through to packer if we make the assumption that an AMI is tied to the a packer account it was created on. Only issue would be if you shared your ami. The alternative would be to pass it during the --init, and allow an environment variable to be the default. I think on init is more work.
#2. For every version of C* we support it automatically gets the agent.

I'd like to do this without requiring the user to change their behavior aside from the additional flag on either the packer image creation or the init call.

Integrate packer the same way we do terraform

Simply for uniformity, we have our own way of tracking creds that's different from packers, so if we use the packer docker image we can make it consistent, and also support multiple regions b/c right now that's hard coded in packer.

bin/easy_cass_lab create-ami should create the ami and it would be pretty awesome if it could automatically set the default AMI on init. I haven't looked at the code that proxies the docker output to standard out in a while, but if we could look for the ami output at the end we could parse it and store it locally.

==> Wait completed after 17 minutes 15 seconds

==> Builds finished. The artifacts of successful builds are:
--> cassandra.amazon-ebs.ubuntu: AMIs were created:
us-west-2: ami-0123127f33dd3f

add command to list versions

Now that we have the ability to do custom builds that use arbitrary names, we should have a way of listing them. The yaml doc at /etc/cassandra_versions is the source of truth for this, and can even tell us additional details about it. I'm thinking something like list-versions gives a simple list and the -v flag tells us all the details. It would be nice if this was in a tabular format.

I wrote something for easy-cass-stress we could probably take and repurpose: https://github.com/rustyrazorblade/easy-cass-stress/blob/main/src/main/kotlin/com/rustyrazorblade/easycassstress/SingleLineConsoleReporter.kt#L12

Add sidecar to AMI

The sidecar should be made available in the AMI so we can start doing bulk analytics.

We'll want a systemd service to manage it as well.

clean up grafana dashboards

Remove DB selector, add server name and cluster version to top

support Azure

While we're currently doing things only with AWS, there's not that many places where we are tightly coupled.

Init generates terraform. This could be init-aws or init-azure easily, and we could even allow the user to pick what init does by default.

We'd need to rename the current terraform config classes to AWS Terraform, then make an azure specific one.

Startup checks should maybe ask which cloud provider you wanted to use, or ask for credentials for all of them.

I'm guessing the terraform state reading code might have to change a little, but it might not.

Support editable configuration

I think as part of the use command we should download the entire conf directory to the local directory so it can be edited.

Seeds should be populated automatically.

What's a reasonable way to have the config be edited then pushed up with the node specific variables substituted? Maybe when it's uploaded we pull the yaml into easy-cass-lab, set the listen address and then send it the modified version to the node.

In the working directory for the cluster, I'm thinking we have a conf/<version> directories, that get populated via the easy-cass-lab pull-configs command.

easy-cass-lab push-configs to push everything up.

set CASSANDRA_LOG_DIR in systemd

Environment=CASSANDRA_LOG_DIR=/mnt/cassandra/logs

Rethink custom builds

There's some leftover stuff for doing custom C* builds. I think this should be moved into the packer side of things, but I'm not sure. Need to figure out what this should look like from a user perspective.

Add systemd service to manage C*

Depends on #8

https://github.com/apache/cassandra/blob/trunk/redhat/cassandra

support more than us-west-2

The packer config is hard coded to us-west-2.

I think this should come after #22, but it might not have to. I don't know packer well enough yet to be confident about implementation.

Prevent up command from regenerating terraform config

It can't run b/c it renames the security group which fails.

add easy-cass-stress to packer build

Ensure the PATH includes the packer bin dir.

The bin dir should be added here in aliases.sh.

Depends on rustyrazorblade/easy-cass-stress#10

Support EBS

we should be able to specify EBS volumes

https://registry.terraform.io/providers/aaronfeng/aws/latest/docs/resources/ebs_volume
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/volume_attachment

packer should install python versions via pyenv from cassandra_versions.yaml

Currently the versions to install are harcoded in the packer config

improve async profiler

Async profiler is added to the path. It would be nice if there was a simple wrapper that grabbed the C* pid

Depends on #15 for the start / stop for the pid.

select the right java version (allowing override)

cassandra_versions should have the java version that matches up with the C* version, and that should be set when using the use command. User should be able to override with --java. update-java-alternatives can pick the right version.

investigate flamescope

I'm not sure where the status is for being able to use async profiler w/ flamescope. If it's supported, it could be pretty awesome.

Make available in homebrew

Clean up old provisioning code

There's a ton of old code laying around from back before all the setup was part of packer. It can be cleaned up now.

Add JFR to ami

install_cassandra should accept a single version

To make testing in the docker container easier it would be nice if install_cassandra could take a single version to install.

Example:

bash install_cassandra.sh 3.0

Install fio from latest released tag

The fio available in Ubuntu is years old missing some newer features I like.

Update packer to build from source.

Support for sharing the cluster with folks not using `easy-cass-lab`

Support multiple people logging in by sharing env.sh with them. env.sh will need be modified to prompt for a private SSH key location if the sshConfig is not generated and there is no easy-cass-lab private key in its default location (secret.pem). It will output an sshConfig.

If there's going to be a bunch of people logging in, it would be nice to have a single command that takes a bunch of public keys (maybe in a directory) add handles authorized keys.

There's already a plan to have a thing to update configs in #17 , so I think it makes sense to put this as part of update-configs.

Improve directory layout

The cassandra tarballs are currently put in

/usr/local/

But I think we should use this directory structure:

/usr/local/cassandra_versions/[version]

For example:

/usr/local/cassandra_versions/3.0/

This will make it a bit easier for the command easy-cass-lab use 3.0 to work since then we can just fiddle with symlinks, and make /usr/local/cassandra point to /usr/local/cassandra_versions/3.0

I put a placeholder cassandra_versions.yaml

set CASSANDRA_HOME for systemd service

[Service]
ExecStart=/usr/local/cassandra/current/bin/cassandra -f
User=cassandra
Environment=CASSANDRA_HOME=/mnt/cassandra

@Parameter(names = "--tag", variableArity = true)
public List<String> tags = new ArrayList<>();

Then init could look like this:

bin/easy-cass-lab --tag key=value

rustyrazorblade / easy-cass-lab Goto Github PK

easy-cass-lab's Introduction

easy-cass-lab

Pre-requisites

Usage

Download A Release

Optional: Build the Project

Build the Universal AMI

Optional: Set the AMI

Read the Help

Create The Environment

Select The Cassandra Version

Optional: Modify the Configuration

Start The Cluster

Log In and Have Fun!

Shut it Down

Tools

easy-cass-lab's People

Contributors

Stargazers

Watchers

Forkers

easy-cass-lab's Issues

Recommend Projects

Recommend Topics

Recommend Org