Maximizing Storage Throughput and Performance

Errors or corrections? Contact [email protected].

Workshop Summary

In this workshop you learn how to obtain higher levels of performance with EBS, S3 and EFS.

1. Deploy AWS resources using CloudFormation

Click one of the launch links in the table below to deploy the resources using CloudFormation. To avoid errors during deployment, select a region in which you have previously created AWS resources.

Region Code	Region Name	Launch
us-west-1	US West (N. California)	Launch in us-west-1
us-west-2	US West (Oregon)	Launch in us-west-2
us-east-1	US East (N. Virginia)	Launch in us-east-1
us-east-2	US East (Ohio)	Launch in us-east-2
ca-central-1	Canada (Central)	Launch in ca-central-1
eu-central-1	EU (Frankfurt)	Launch in eu-central-1
eu-west-1	EU (Ireland)	Launch in eu-west-1
eu-west-2	EU (London)	Launch in eu-west-2
eu-west-3	EU (Paris)	Launch in eu-west-3
eu-north-1	EU (Stockholm)	Launch in eu-north-1
ap-east-1	Asia Pacific (Hong Kong)	Launch in ap-east-1
ap-northeast-1	Asia Pacific (Tokyo)	Launch in ap-northeast-1
ap-northeast-2	Asia Pacific (Seoul)	Launch in ap-northeast-2
ap-northeast-3	Asia Pacific (Osaka-Local)	Launch in ap-northeast-3
ap-southeast-1	Asia Pacific (Singapore)	Launch in ap-southeast-1
ap-southeast-2	Asia Pacific (Sydney)	Launch in ap-southeast-2
ap-south-1	Asia Pacific (Mumbai)	Launch in ap-south-1
me-south-1	Middle East (Bahrain)	Launch in me-south-1
sa-east-1	South America (São Paulo)	Launch in sa-east-1

Click Next on the Create Stack page.
Click Next.
Click Next Again. (skipping the Options and Advanced options sections)
On the Review page, scroll to the bottom and check the boxes to acknowledge that CloudFormation will create IAM resources, then click Create stack.

6. Click Events. Events will not auto refresh. You will need to manually refresh the page using the refresh button on the right side of the page.
7. Watch for StoragePerformanceWorkshop and a status of CREATE_COMPLETE

Note: Instances that are launched as part of this CloudFormation template may be in the initializing state for few minutes.

2. Connect to the EC2 Instance using EC2 Instance Connect

From the AWS console, click Services and select EC2.
Select Instances from the menu on the left.
Wait until the state of the Storage_Performance_Workshop instance shows as running and all Status Checks have completed (i.e. not in Initializing state).
Right-click on the Storage_Performance_Workshop instance and select Connect from the menu.
From the dialog box, select the EC2 Instance Connect option, as shown below:

For the User name field, enter "ec2-user", then click Connect. A new dialog box or tab on your browser should appear, providing you with a command line interface (CLI). Keep this open - you will use the command line on the instance throughout this workshop.
Verify your login prompt contains storage-workshop. If it does not the initialization task for the workshop have not yet completed. Close the SSH window and repeat the EC2 Instance Connect steps in a couple minutes.

Note: The SSH session will disconnect after a period of inactivity. If your session becomes unresponsive, close the window and repeat the steps above to reconnect.

EBS Performance Part 1

FIO was automatically launched by the CloudFormation template. We will use FIO to exhaust the burst credits on the 1 GB GP2 volume.

Run the following command to ensure FIO is running on the instance. Output should be similar to below.

  ps -ef | grep fio

2. From the AWS console, click Services and select EC2.
3. Select Instances from the menu on the left.
4. Select the instance named Storage_Performance_Workshop.

5. Click on /dev/sdb in the lower pane by Block devices.

6. In the pop up window, click the EBS ID.

7. Click on the Monitoring tab.
8. There are two graphs of interest: Read Throughput and Burst Balance. You may need to click on the graph to see any data as there is likely only one data point so far.

Read Throughput should be at 3000 OPS until the Burst Balance is not exhausted.
Burst Balance will start out at 100% and will decrease steadily over the next 30 minutes.
We will come back to EBS performance later, giving the Burst Balance a chance to be exhausted.

S3 Performance- Optimize Throughput of Large Files

In the section we will demonstrate parallelization of a large object by breaking it into smaller chunks and increasing the number of threads used to transfer the object.

In the CLI for the instance, run the following commands to setup the AWS CLI

aws configure

Leave Access Key and Secret Key blank, set the region to the region you deployed your CloudFormation template in , output format leave default.

Configure AWS CLI S3 settings, by running the following commands.

aws configure set default.s3.max_concurrent_requests 1  
aws configure set default.s3.multipart_threshold 64MB  
aws configure set default.s3.multipart_chunksize 16MB

Note
Commands starting with aws s3 will use the settings above. Any commands starting with aws s3api will not. aws s3 utilizes Transfer Manager which can optimize transfers. aws s3api simply makes the specific api call you specified.

Verify settings match screenshot below.

cat ~/.aws/config

Run the following command to create a 5 GB file.

dd if=/dev/urandom of=5GB.file bs=1 count=0 seek=5G

Upload a 5 GB file to the S3 bucket using 1 thread. Record time to complete.

time aws s3 cp 5GB.file s3://${bucket}/upload1.test

Upload a 5 GB file to the S3 bucket using 2 threads. Record time to complete.

aws configure set default.s3.max_concurrent_requests 2  
time aws s3 cp 5GB.file s3://${bucket}/upload2.test

Upload a 5 GB file to the S3 bucket using 10 threads. Record time to complete.

aws configure set default.s3.max_concurrent_requests 10  
time aws s3 cp 5GB.file s3://${bucket}/upload3.test

Upload a 5 GB file to the S3 bucket using 20 threads. Record time to complete.

aws configure set default.s3.max_concurrent_requests 20   
time aws s3 cp 5GB.file s3://${bucket}/upload4.test

At some point the AWS CLI will limit the performance that can be achieved. This is likely the case if you didn't see any performance increase between 10 and 20 threads. This is a limitation of the CLI, increasing thread count to 100's using other software will continue to increase performance.

Run the following command to create a 1 GB file.

dd if=/dev/urandom of=1GB.file bs=1 count=0 seek=1G

Data can also be segmented into multiple pieces. The next step will demonstrate moving 5 GB of data using multiple source files.

Upload 5 GB of data to S3 by uploading five 1 GB files in parallel. The -j flag is the number of concurrent jobs to run. This results in 100 total threads (20 from aws configure times 5 jobs). Record time to complete.

time seq 1 5 | parallel --will-cite -j 5 aws s3 cp 1GB.file s3://${bucket}/parallel/object{}.test

Note

These exercises showed that workloads can parallelized by breaking up a large object into chunks or by having smaller files.
One trade off to keep in mind is each PUT is billed at $0.05 per 1,000 requests. When you break up a 5GB file into 16MB chunks, that results in 313 PUTs instead of 1 PUT. Essentially you can view it as paying a toll to go faster.

S3 Performance- Optimize the Sync command

This exercise will use the aws s3 sync command to move 2,000 files totaling 2 GB of data.

In the CLI for the instance, perform the sync using 1 thread. Record time to complete.

aws configure set default.s3.max_concurrent_requests 1  
time aws s3 sync /ebs/tutorial/data-1m/ s3://${bucket}/sync1/

Perform the sync using 10 threads. Record time to complete.

aws configure set default.s3.max_concurrent_requests 10  
time aws s3 sync /ebs/tutorial/data-1m/ s3://${bucket}/sync2/

S3 Performance- Optimize Small File Operations

This exercise will demonstrate how to increase the transactions per second(TPS) while moving small objects.

In the CLI for the instance, create a text file that represents a list of object ids.

seq 1 500 > object_ids  
cat object_ids

Create a 1 KB file.

dd if=/dev/urandom of=1KB.file bs=1 count=0 seek=1K

Upload 500 1KB files to S3 using 1 thread. Record time to complete.

time parallel --will-cite -a object_ids -j 1 aws s3 cp 1KB.file s3://${bucket}/run1/{}

Upload 500 1KB files to S3 using 10 threads. Record time to complete.

time parallel --will-cite -a object_ids -j 10 aws s3 cp 1KB.file s3://${bucket}/run2/{}

Upload 500 1KB files to S3 using 50 threads. Record time to complete.

time parallel --will-cite -a object_ids -j 50 aws s3 cp 1KB.file s3://${bucket}/run3/{}

Upload 500 1KB files to S3 using 100 threads. Record time to complete.

time parallel --will-cite -a object_ids -j 100 aws s3 cp 1KB.file s3://${bucket}/run4/{}

Note
Going from 50 to 100 threads likely didn't result in higher performance. For ease of demonstration we are using multiple instances of the AWS CLI to show a concept. In the real world developers would create thread pools that are much more efficient than our demonstration method. It is reasonable to assume that added threads should continue to add performance until another bottleneck like running out of CPU occurs.

S3 Performance- Optimize Copying Objects to Different Locations

In this exercise we will demonstrate how to copy files from one location in S3 to another more efficiently.

In the CLI for the instance, run this command to download a 5 GB file from the S3 bucket that was uploaded in an earlier test and the upload to a different prefix. Record time to complete.

time (aws s3 cp s3://$bucket/upload1.test 5GB.file; aws s3 cp 5GB.file s3://$bucket/copy/5GB.file)

Use PUT COPY(copy-object) to move the file. Record time to complete.

time aws s3api copy-object --copy-source $bucket/upload1.test --bucket $bucket --key copy/5GB-2.file

Copy the file between S3 using a single command between locations. Record time to complete.

time aws s3 cp s3://$bucket/upload1.test s3://$bucket/copy/5GB-3.file

Note
The first command required to GET data from S3 back to the EC2 instance and then PUT the data back to S3 from the EC2 instance. The second command uses a PUT COPY but is only single threaded. The third command uses a PUT COPY as well, but also uses Transfer Manager which is multi-threaded depending on the AWS CLI configurations. Both the second and third command perform the copy between S3 locations internal to S3. This results in only API calls being made from the EC2 host and the data transfer bandwidth is done inside of S3 only.

EFS Performance- Optimize IOPS

In this exercise we will demonstrate different methods of creating 1,024 files and compare the performance of each method.

In the CLI for the instance, run this command to generate 1,024 zero byte files. Record time to complete.

directory=$(echo $(uuidgen)| grep -o ".\\{6\\}$")    
mkdir -p /efs/tutorial/touch/${directory}  
time for i in {1..1024}; do  
touch /efs/tutorial/touch/${directory}/test-1.3-$i;  
done;

Run this command to generate 1,024 zero by files using multiple threads. Record time to complete.

directory=$(echo $(uuidgen)| grep -o ".\\{6\\}$")    
mkdir -p /efs/tutorial/touch/${directory}    
time seq 1 1024 | parallel --will-cite -j 128 touch /efs/tutorial/touch/${directory}/test-1.4-{}

Run this command to generate 1,024 zero by files in multiple directories using multiple threads. Record time to complete.

directory=$(echo $(uuidgen)| grep -o ".\\{6\\}$")   
mkdir -p /efs/tutorial/touch/${directory}/{1..32}  
time seq 1 32 | parallel --will-cite -j 32 touch /efs/tutorial/touch/${directory}/{}/test1.5{1..32}

Note
The best way to leverage the distributed data storage design of Amazon EFS is to use multiple threads and inodes in parallel.

EFS Performance- I/O Sizes and Sync Frequency

In this exercise we will demonstrate how different I/O sizes and sync frequencies affects throughput to EFS.

In the CLI for the instance, Write a 2GB file to EFS using 1MB block size and sync once after each file. Record time to complete.

time dd if=/dev/zero of=/efs/tutorial/dd/2G-dd-$(date +%Y%m%d%H%M%S.%3N) bs=1M count=2048 status=progress conv=fsync

Write a 2 GB file to EFS using 16MB block size and sync once after each file. Record time to complete.

time dd if=/dev/zero of=/efs/tutorial/dd/2G-dd-$(date +%Y%m%d%H%M%S.%3N) bs=16M count=128 status=progress conv=fsync

Write a 2GB file to EFS using 1MB block size and sync after each block. Record time to complete.

time dd if=/dev/zero of=/efs/tutorial/dd/2G-dd-$(date +%Y%m%d%H%M%S.%3N) bs=1M count=2048 status=progress oflag=sync

Write a 2 GB file to EFS using 16MB block size and sync after each block. Record time to complete.

time dd if=/dev/zero of=/efs/tutorial/dd/2G-dd-$(date +%Y%m%d%H%M%S.%3N) bs=16M count=128 status=progress oflag=sync

Note
Syncing after each block will dramatically decrease performance of the filesystem. Best performance will be obtained by syncing after each file. Block size has little impact to performance.

EFS Performance- Multi-threaded Performance

This exercise will demonstrate how multi-threaded access improves throughput and IOPS.

Each command will write 2 GB of data to EFS using 1 MB block size.
Write to EFS using 4 threads and sync after each block. Record time to complete.

time seq 0 3 | parallel --will-cite -j 4 dd if=/dev/zero of=/efs/tutorial/dd/2G-dd-$(date +%Y%m%d%H%M%S.%3N)-{} bs=1M count=512 oflag=sync

Write to EFS using 16 threads and sync after each block. Record time to complete.

time seq 0 15 | parallel --will-cite -j 16 dd if=/dev/zero of=/efs/tutorial/dd/2G-dd-$(date +%Y%m%d%H%M%S.%3N)-{} bs=1M count=128 oflag=sync

Note The distributed data storage design of EFS means that multi-threaded applications can drive substantial levels of aggregate throughput and IOPS. By parallelizing your writes to EFS by increasing the number of threads, you can increase the overall throughput and IOPS to EFS.

EFS Performance- Compare File Transfer Tools

In this section we will compare the performance of different file transfer utilities and EFS.

Review the data to transfer. 2,000 files and 2 GB of data. Record time to complete.

du -csh /ebs/tutorial/data-1m/  
find /ebs/tutorial/data-1m/. -type f | wc -l

Transfer files from EBS to EFS using rsync

sudo su  
sync && echo 3 > /proc/sys/vm/drop_caches  
exit  
time rsync -r /ebs/tutorial/data-1m/ /efs/tutorial/rsync/

Transfer files from EBS to EFS using cp

sudo su  
sync && echo 3 > /proc/sys/vm/drop_caches  
exit  
time cp -r /ebs/tutorial/data-1m/* /efs/tutorial/cp/

Set the $threads variable to 4 threads per CPU

threads=$(($(nproc --all) * 4))  
echo $threads

Transfer files from EBS to EFS using fpsync

sudo su  
sync && echo 3 > /proc/sys/vm/drop_caches  
exit  
time fpsync -n ${threads} -v /ebs/tutorial/data-1m/ /efs/tutorial/fpsync/

Transfer files from EBS to EFS using cp + GNU Parallel

sudo su  
sync && echo 3 > /proc/sys/vm/drop_caches  
exit  
time find /ebs/tutorial/data-1m/. -type f | parallel --will-cite -j ${threads} cp {} /efs/tutorial/parallelcp

Note
Not all file transfer utilities are created equal. File systems are distributed across an unconstrained number of storage servers and this distributed data storage design means that multithreaded applications like fpsync, mcp, and GNU parallel can drive substantial levels of throughput and IOPS to EFS when compared to single-threaded applications.

EBS Performance Part 2

From the AWS console, click Services and select EC2.
Select Instances from the menu on the left.
Select the instance named Storage_Performance_Workshop.

4. Click on /dev/sdb in the lower pane by Block devices.

5. In the pop up window, click the EBS ID.

Click on the Monitoring tab.
There are two graphs of interest: Read Throughput and Burst Balance.
1. Read Throughput should be at 100 OPS now.
2. Burst Balance should be at 0%.
In the CLI for the instance, Look at output of fio job to see current IOPs.

sudo screen -r

In the AWS console, click Actions, then select Modify Volume
In the pop up window configure the following values:

Volume Type: Provisioned IOPS SSD (IO1)
Size: 100GB
Iops: 5000
Click Modify.
Click Yes.
Click Close.
Go back to your SSH session. Check output of fio, you should see the increase in IOPS. ctrl-a d to exit screen.
Verify larger volume is seen by instance

lsblk

Resize the filesystem. The increase in IOPs is available right away, but you would need to resize the filesystem to use the added capacity.

sudo umount /ebsperftest  
sudo resize2fs /dev/nvme1n1  
sudo mount /ebsperftest

Verify filesystem is using 100 GB volume.

df -h

Run the following command to ensure FIO is running on the instance. Output should be similar to below.

ps -ef | grep fio

Allow fio to continue to run for several more minutes to allow graphs to update.
Click on the Monitoring tab.
Refresh graphs. Burst balance will no longer report any values due to the volume being IO1 now. Read Throughput should be at 5,000 IOPS

Clean Up Resources

To ensure you don't continue to be billed for services in your account from this workshop follow the steps below to remove all resources created during the workshop.

In the CLI for the instance, remove objects from the S3 bucket.

aws configure set default.s3.max_concurrent_requests 20  
aws s3 rm s3://${bucket} --recursive

From the AWS console, click Services and select CloudFormation.
Select StoragePerformanceWorkshop.
Click Delete.
Click Delete stack.
It will take a few minutes to delete everything. Refresh the page to see an updated status. StoragePerformanceWorkshop will be removed from the list if everything has been deleted correctly.

aws-samples / maximizing-storage-throughput-and-performance Goto Github PK