Find Hidden AWS Resources With Effective Wordlists

You’ve got yourself a target AWS account to hack and you want to map all the stuff in it. At least pretend you do so I can tell my boss people actually read my content.

It’s not like you can ask the account owner to give an IAM access key so you can rummage around in there and map it from the inside out… Well maybe you can because you are probably lawful-good. Regardless, role play a bounty hunter or a red teamer who doesn’t have that option. What now?

You could just put the account number into Awseye and see what comes up but that has some downsides:

The Awseye mapping probably ran some time in the past so it’s somewhat stale
Awseye only returns up to 100 resources
In order to run the mapping, Awseye needs to save the account ID—maybe you don’t want to hand it over
If you want to write your own automation, you need to DIY

If those downsides are unacceptable, you’ll need a plan. This blog post is your plan.

Preconditions for resource enumeration

Most people haven’t considered that AWS resources can be enumerated. It’s a strange concept. Why should knowledge of an account ID entitle you to figure out what’s in that account?

It wasn’t intentional. It just so happens that certain AWS services have quirks that make it possible. The preconditions for resources to be enumerable are:

Resource names are not globally unique (i.e. not generic S3 buckets) - If resources are in a global namespace it’s difficult to attribute them to an account. Even if a bucket has an account ID in its name, anyone could have created that bucket.
Resources are addressed by user supplied name (e.g. EventBus) - This is not strictly true, the names just need to be predictable in some way. In practice that means they are set by the user because automatically generated names are typically random and therefore unpredictable.
A method exists to check for existence outside account (e.g. SQS queue) - These methods are the quirks. Newer services are more likely not to have these quirks as error messages and API responses have been standardised and don’t unintentionally reveal whether a resource exists.

Candidates for enumeration

I don’t know all the service-resource combinations that meet these conditions but here are the ones I do know. I’m always looking for more, so please get in touch if you can extend this list.

IAM principals (roles, users, groups)
SQS queues

Resources that are enumerable but not attributable to an account ID:

S3 buckets (auto generated with an account ID in the name)

Examples of resources that have predictable names but don’t appear to be enumerable outside the account:

SNS topics
Lambda functions

Process for building a wordlist

There are many approaches to building AWS wordlists. Each has their pros and cons. In the past we have relied heavily on scraping documentation which has been really successful but very time consuming. The one we use now is optimised to take advantage of Awseye data. The general process is described below and applied specifically to IAM users as a worked example.

1. Export service mentions

“Mentions” are references to AWS resources Awseye found in various places on the internet, like Github and Pypi. Exporting them generates a large pool of candidate resource names.

<snip> arn:aws:iam::294599468799:user/christophe arn:aws:iam::832344679060:user/shaktimaan arn:aws:iam::444455556666:user/user1-22222a31-17 arn:aws:iam::053121068929:user/users/test-iam-us arn:aws:iam::000000000000:user/hashicorp </snip>

2. Normalize account IDs and regions

Replace any references to an account ID or region with some kind of token, like [ACCOUNT] and [REGION].

<snip> arn:aws:iam::[ACCOUNT]:user/christophe arn:aws:iam::[ACCOUNT]:user/shaktimaan arn:aws:iam::[ACCOUNT]:user/user1-22222a31-17 arn:aws:iam::[ACCOUNT]:user/users/test-iam-us arn:aws:iam::[ACCOUNT]:user/hashicorp </snip>

IAM ARNs don’t have a region so it’s just account IDs that are normalized.

3. Count and sort

Count the number of times each normalized identifier appears and sort the list from most to least. It’s not perfect but this provides an indication of how common a resource name might be. If it appears hundreds of times It’s likely a common naming convention, if it appears once, it might be unique to a project.

<snip> 800 arn:aws:iam::[ACCOUNT]:root 157 arn:aws:iam::[ACCOUNT]:user/mystack 139 arn:aws:iam::[ACCOUNT]:user/TestAcc 111 arn:aws:iam::[ACCOUNT]:user/Example 111 arn:aws:iam::[ACCOUNT]:user/hashicorp </snip>

After counting and sorting we see the most commonly mentioned IAM users.

4. Test accounts sample

Awseye has data on well over 300,000 AWS accounts. To test whether a particular resource name is a good wordlist candidate, run it across a big diverse set of accounts (5,000?) and popular regions (5?). If there are hits across two or more accounts, the word makes the cut and moves into the permanent wordlist. A single account hit might suggest the word is not generally applicable, while setting the threshold higher than two might cause uncommon but valid words to be missed.

<snip> Found arn:aws:iam::111111111111:root Found arn:aws:iam::222222222222:root Found arn:aws:iam::222222222222:user/Example Found arn:aws:iam::333333333333:root Found arn:aws:iam::333333333333:user/Example Found arn:aws:iam::444444444444:root Found arn:aws:iam::444444444444:user/hashicorp </snip>

The “root” user appears in every account so naturally it makes it into the wordlist. “Example” is less common but appears in two accounts so it also makes it. “hashicorp” appears to be unique to account 444444444444 so it doesn’t make it into the wordlist.

5. Manual review

Mostly we’re looking for weird stuff and trying to understand why it’s there. For example, is there a random ID in a word that got lots of hits? Where did it come from and why is it random-looking but not random?

The user list makes sense but given we only tested four accounts, and “hashicorp” is a very popular software company, maybe it’s a good candidate to run across a larger number of accounts.

Pros and cons

This approach is really powerful in finding resource names that end up in code. It finds weird quirky stuff a human might eyeball and exclude as random or not-realistic. It’s data driven and can be automated. However, it also misses things that don’t end up in public code or are pseudo-private. An example of this is resources used by vendors of cloud source software and services, which might only live in fancy PDFs.

Alternative approaches to building wordlists

If you enjoy geeking out over wordlist generation, here are some alternative approaches worth exploring:

AWS documentation parsing - Many (most?) of the most common resource names are described in AWS documentation. Jonathan Walker created a tool to download AWS docs and used it to find dodgy S3 buckets.
Sourcegraph queries - The public Sourcegraph instance provides a search interface and search CLI to the most popular public Github repos. It’s the most common method used by security researchers because it’s easy, lightning fast, and produces generally good results. Here is an example query that will produce a list of IAM roles after some parsing and here it is being used in anger to find vulnerable Github integrations.
AI generation - ChatGippity and others have been trained on more data than any one researcher could hope to scrape, so why not use it? Whatever resource type you are building a wordlist for, you could ask it to generate a list of resource names that fall into a bucket like (common, vendor, well known, open source, etc.). The list will likely have a lot more junk but you can test it on a sample of accounts to filter it.
Web searches - People and companies often put resource names in PDFs, presentations, web pages etc. in a effort to document how their stuff works. These might not turn up in Github and AI may not accurately remember them, so manually searching might prove a high-accuracy low-speed way to generate words. If you’ve got some spare dollars, you could also attempt to query the giant Common Crawl dataset.

Plerion open source wordlists

As a reward for reading this far, we’ve got a gift for you. We’re releasing our wordlists generated using the process described in this article, completely free for any use, including commercial use, as long as you attribute them to Plerion with a link to our website.

Download them here:

They will be automatically generated and updated once a week, so they will never go out of style, like Crocs.

Enjoy.

‍