Cleaning AWS (python tool)

3 min readApr 25, 2021

Foreword
This article’s auditory
Use case examples
How-to (TL/DR jump here)
Code

Foreword

The objective of cleaning is not just to clean, but to feel happiness living within that environment. (c)Marie Kondo

To achieve scalability and process’ automation one must have a well defined and tidy environment. Even while managing infrastrcure using automatic tools (Terraform, Cloudformation…) there exists a wide field for sub optimal or dangerous configurations.

Think of “infrastructure as a code” as a code (sorry for the tautology):

In the coding world- in the infrastructure management world
Functions- modules’ specifications (instances, lbs, lambdas, R53 Records etc.)
Arguments- Values provided to these modules
Compiler- running the automation code on modules and values
Compiler optimizations- Oops, nothing thorough out of the box here.

Sometimes things break (Terraform apply, delete etc…), people make manual changes, POCs and “quick-wins” are left there in AWS services’ as configuration appendicitis.

After running the infrastructure optimization scripts manually I’ve organized them into this ecosystem.

This article’s auditory

You must remember what Dev in DevOps stands for!

These tools were written for private uses so I performed very minimal (inevitable) steps to deliver it for a wide audience usage. However, I would be glad to help you setting it up if you got stuck or answer any questions via gmail: alexey.beley

Use case examples

Taken from horey/aws_api/docs/README_CLEANUP.md

Some of them (or all of them, I haven’t check) already exist as ready solutions. My intentions were to create scalable proprietary reports’ system. This way product and company specific task can be merged into existing generic system.

Currently I generate following reports:

Unused EC2 Network Interfaces
S3- large and old objects. Yes, you can see large buckets in AWS console, but you view the dynamics- how much (size and amount) objects were written monthly and yearly. This data helps to estimate your growth speed expected expenses’ raise. (!) Please see warnings section before using this feature.
Cloudwatch- large and old streams. Since Cloudwatch logs storage is very expensive, this data can spare some light on the black wholes of your budget- implementing better retention times or alternative log storage systems. (!)Please see warnings section before using this feature.
Lambdas: large size (>100m), security group with ingress open ports, missing lambdas’ logs in Cloudwatch log group.
EC2 Load Balancers (classic and ALB)- no instances, listeners or target groups associated. Target groups with bad health.
EC2 Security Groups- LBs’ associated groups with a port mismatch- listens on closed port or closed for a listening port. Unused groups or to open (permit any any).
IAM policies and roles- unused or dangerous.
Route53- generating a mapping of redirections. The end report provides list of unknown destinations. e.g. Known destinations- ec2 private DNS address, Cloudfront DNS addresses. Unknown destinations- raw IP address, external DNS names’ redirections (dDOS, CDN, Analytics etc.)

How-to

Setting up the environment

For step by step explanation go-to https://github.com/AlexeyBeley/horey/blob/main/aws_api/README.md

Connecting to AWS- We must list the accounts we want to manage and the steps to be performed to access each account. Next step is choosing single account to perform the tasks on.

Cached objects- I save the source metadata once prior to running multiple cleanups on it for time and AWS throttling economy.

Running cleanups- Each cleanup method needs different objects initiated before being triggered. You can know exactly what objects to initiate from “horey/aws_api/tests/test_aws_api_cleanup.py” file. See examples here: https://github.com/AlexeyBeley/horey/blob/main/aws_api/docs/README_CLEANUP.md

(!)Warnings(!)

Warranty: This code is alive- I do refactor it from time to time. Unit test coverage is very limited. Though the ideas and techniques can be easily reused in standalone projects.

Time and space considerations: currently there are 2 cleanups facing large amounts of data- S3 objects and Cloudwatch streams.

e.g.

Cloudwatch: 700,000,000 streams’ metadata being downloaded for 30 days.

S3:

Input: 600 TB stored data, 2,000,000,000 objects’ keys, average object size 150kb.
Processing: Metadata downloaded for 10 days, generated cache size- 856G, generated 23,000 files, average single cached object representation size- 500 bytes, cleanup running time > 1.5 days.

Code

The main logic of this article resides in the horey.aws_api package: https://github.com/AlexeyBeley/horey/tree/main/aws_api

Please use README.md from this package to get started

Global repository’s horey Makefile is used to install the horey.aws_api package: https://github.com/AlexeyBeley/horey

Several minor packages used by horey.aws_api being installed recursively with the Makefile.