2023: A Space Odyssey — How I broke the build (without touching a line of code)

Prepare for Launch

At Pearl Health, we use CircleCI to manage our builds. We like the extensible, YAML-based configuration approach, and the pricing is startup-friendly.

However, no technology is perfect. On January 4th, CircleCI issued a security alert to all of their customers. They didn’t include a lot of detail (at first), but they did recommend immediate action. Here’s the relevant section:

Out of an abundance of caution, we strongly recommend that all customers … immediately rotate any and all secrets stored in CircleCI.

Since we do store secrets in CircleCI, this impacted us. We immediately sprung into action. Our incident policy helped us quickly identify this as a Severity 1 incident, the highest level of urgency. Our team jumped onto a Slack huddle, and over the next few hours, we completed the task of rotating every secret that we store in CircleCI.

A Stowaway in a Secret Space

The final secret we modified is our API key for the World Health Organization’s ICD API. We use it to translate machine-readable ICD-10 codes into human-friendly descriptions. For example, J00 is the ICD-10 code for acute nasopharyngitis (a.k.a. the common cold).

This API key seemed pretty harmless. In fact, if an attacker got their hands on it, there’s not much damage they could do. The ICD API doesn’t provide any sensitive information, and the WHO gives out API keys for free. The worst case would be someone launching a DoS attack and making it look like we were the attackers.1

Just the same, we rotated this key too. Here’s a screenshot of the Client Keys management page. You can see the blue highlight, right before I copied the text to the clipboard:

2023: A Space Odyssey — How I broke the build (without touching a line of code)
Don’t worry, this isn’t our real API key; it’s a temp that I deleted right after I grabbed the screen shot.

This is where things went wrong.

See the problem? No? Let’s zoom in a bit:

2023: A Space Odyssey — How I broke the build (without touching a line of code)
When I copied the ClientSecret value, I accidentally included a leading space character. And that set a chain of events in motion that impacted the team in surprising ways.

Voyage of a Space Character

We store most of our secrets in AWS Secrets Manager, including API keys. But some of them are stored in a shared 1Password vault,2 and this is one of them. So the next step was to open up 1Password and paste in the new value:
2023: A Space Odyssey — How I broke the build (without touching a line of code)

If I’d been looking really closely, I probably would have seen the extra space. And if I’m feeling generous, I could probably say that I was rushing because of the incident. Either way, the damage was done.

At this point, I handed the reins to one of our engineers. He logged into our shared vault and copied the new API key into his own clipboard. But he never saw the actual value because the UI masked it (as a good password manager should):

2023: A Space Odyssey — How I broke the build (without touching a line of code)

When he clicked Copy, the leading space came along for the ride. And when he saved the new environment variable in CircleCI, the leading space was saved too. At this point, the value was completely obfuscated. CircleCI masks all environment variables, except for the last 4 characters:

2023: A Space Odyssey — How I broke the build (without touching a line of code)

At this point, our fate was sealed.

Houston, we have a problem…

The next time someone merged in a pull request (about an hour later), the build failed with a very confusing error:

"docker build" requires exactly 1 argument.

This was surprising for several reasons:

The only inputs that aren’t version-controlled are — you guessed it — the environment variables. So even though something had changed, there was nothing obvious in CircleCI to point our team in the right direction.

The Vastness of Space

But why did this break our build?

Was the extra space really such a big deal?

The answer has to do with Docker build-time variables and the --build-arg parameter.

Try this at home. You can use any Dockerfile you like:

$ docker build --build-arg foo=bar ./my-docker-project

Normally, it’ll succeed. But then, add a leading space, like this:

$ docker build --build-arg foo= bar ./my-docker-project

It’ll fail, with a familiar-looking error:

"docker build" requires exactly 1 argument.

Here’s what’s going on: when Docker says it wants “1 argument”, what it really means is that it wants 1 argument after the build options: the location of your Dockerfile. In this fictional example, that one argument is ./my-docker-project. But when we add the leading space, Docker treats bar as its own argument, and not part of the --build-arg option. So instead of one argument, we get two: bar and ./my-docker-project.

This is exactly what happened with our CircleCI build. The command that was failing looked like:

docker build --build-arg WHO_ICD_CLIENT_SECRET=${WHO_ICD_CLIENT_SECRET}

But with the leading space, that evaluated to:

docker build --build-arg
WHO_ICD_CLIENT_SECRET= YJMfALrAyiTfYAoT...

Unfortunately for us, CircleCI environment variables are expanded “under the covers”. The real value of ${WHO_ICD_CLIENT_SECRET} is never shown in CircleCI’s build output. So even with the detailed build logs, we still couldn’t spot the problem.

So, how did our team find the root cause? It turns out that you can configure Bash to print the fully-expanded version of each command it runs, using set -x. Once they ran this, it was easy to spot the extra whitespace.

Returning to Mission Control

There are a few steps we can take to prevent this from happening again.

  1. We’re planning to change our code to look up the ICD API key at runtime (from AWS Secrets Manager) instead of injecting it into the Docker image. Extra spaces could still cause problems, but they’ll be easy to identify because they’ll specifically affect ICD API requests.
  2. If we want to be extra cautious, then while we’re in there, we might also add code to remove leading and trailing whitespace.
  3. More importantly, this event reminded me that it’s always helpful to pair with another engineer, even for the small stuff. If I’d been sharing my screen, then maybe a colleague would have spotted the copy-and-paste mistake.

Interested in learning more about Engineering at Pearl? Check out opportunities on our Careers Page.

Our Technology

Platform and services that empower providers to deliver better quality care at a lower cost
  1. The DoS would be serious, of course. But any attacker can get their own API key for free, and launch an attack within minutes. They don’t need to steal a key from someone else. The worst case would be either (a) the attack is successful, and the World Health Organization’s response is delayed a little because of the misdirection, or (b) the WHO shuts off our API key, and some of our app functionality is disrupted.
  2. Technically, we could have caused the exact same problem using AWS Secrets Manager. I know because I tried it — the leading space is preserved.
Matt Solnit

Matt Solnit

Chief Technology Officer, Pearl Health