Prepare for Launch
At Pearl Health, we use CircleCI to manage our builds. We like the extensible, YAML-based configuration approach, and the pricing is startup-friendly.
However, no technology is perfect. On January 4th, CircleCI issued a security alert to all of their customers. They didn’t include a lot of detail (at first), but they did recommend immediate action. Here’s the relevant section:
Out of an abundance of caution, we strongly recommend that all customers … immediately rotate any and all secrets stored in CircleCI.
Since we do store secrets in CircleCI, this impacted us. We immediately sprung into action. Our incident policy helped us quickly identify this as a Severity 1 incident, the highest level of urgency. Our team jumped onto a Slack huddle, and over the next few hours, we completed the task of rotating every secret that we store in CircleCI.
A Stowaway in a Secret Space
The final secret we modified is our API key for the World Health Organization’s ICD API. We use it to translate machine-readable ICD-10 codes into human-friendly descriptions. For example, J00 is the ICD-10 code for acute nasopharyngitis (a.k.a. the common cold).
This API key seemed pretty harmless. In fact, if an attacker got their hands on it, there’s not much damage they could do. The ICD API doesn’t provide any sensitive information, and the WHO gives out API keys for free. The worst case would be someone launching a DoS attack and making it look like we were the attackers.1
Just the same, we rotated this key too. Here’s a screenshot of the Client Keys management page. You can see the blue highlight, right before I copied the text to the clipboard:
This is where things went wrong.
See the problem? No? Let’s zoom in a bit:
Voyage of a Space Character
If I’d been looking really closely, I probably would have seen the extra space. And if I’m feeling generous, I could probably say that I was rushing because of the incident. Either way, the damage was done.
At this point, I handed the reins to one of our engineers. He logged into our shared vault and copied the new API key into his own clipboard. But he never saw the actual value because the UI masked it (as a good password manager should):
When he clicked Copy, the leading space came along for the ride. And when he saved the new environment variable in CircleCI, the leading space was saved too. At this point, the value was completely obfuscated. CircleCI masks all environment variables, except for the last 4 characters:
At this point, our fate was sealed.
Houston, we have a problem…
The next time someone merged in a pull request (about an hour later), the build failed with a very confusing error:
"docker build" requires exactly 1 argument.
This was surprising for several reasons:
- The Docker build parameters looked fine.
- Our GitHub commit history didn’t show any relevant changes. In fact, even the CircleCI YAML file is stored in GitHub, and we could clearly see it was the same as before the error.
- Our build doesn’t even invoke Docker directly. Instead, we use CircleCI’s “orb” (plugin) for interfacing with Amazon Elastic Container Registry.
The only inputs that aren’t version-controlled are — you guessed it — the environment variables. So even though something had changed, there was nothing obvious in CircleCI to point our team in the right direction.
The Vastness of Space
But why did this break our build?
Was the extra space really such a big deal?
The answer has to do with Docker build-time variables and the --build-arg
parameter.
Try this at home. You can use any Dockerfile you like:
$ docker build --build-arg foo=bar ./my-docker-project
Normally, it’ll succeed. But then, add a leading space, like this:
$ docker build --build-arg foo= bar ./my-docker-project
It’ll fail, with a familiar-looking error:
"docker build" requires exactly 1 argument.
Here’s what’s going on: when Docker says it wants “1 argument”, what it really means is that it wants 1 argument after the build options: the location of your Dockerfile. In this fictional example, that one argument is ./my-docker-project
. But when we add the leading space, Docker treats bar
as its own argument, and not part of the --build-arg option
. So instead of one argument, we get two: bar
and ./my-docker-project
.
This is exactly what happened with our CircleCI build. The command that was failing looked like:
docker build --build-arg WHO_ICD_CLIENT_SECRET=${WHO_ICD_CLIENT_SECRET}
But with the leading space, that evaluated to:
docker build --build-arg
WHO_ICD_CLIENT_SECRET= YJMfALrAyiTfYAoT...
Unfortunately for us, CircleCI environment variables are expanded “under the covers”. The real value of ${WHO_ICD_CLIENT_SECRET}
is never shown in CircleCI’s build output. So even with the detailed build logs, we still couldn’t spot the problem.
So, how did our team find the root cause? It turns out that you can configure Bash to print the fully-expanded version of each command it runs, using set -x
. Once they ran this, it was easy to spot the extra whitespace.
Returning to Mission Control
There are a few steps we can take to prevent this from happening again.
- We’re planning to change our code to look up the ICD API key at runtime (from AWS Secrets Manager) instead of injecting it into the Docker image. Extra spaces could still cause problems, but they’ll be easy to identify because they’ll specifically affect ICD API requests.
- If we want to be extra cautious, then while we’re in there, we might also add code to remove leading and trailing whitespace.
- More importantly, this event reminded me that it’s always helpful to pair with another engineer, even for the small stuff. If I’d been sharing my screen, then maybe a colleague would have spotted the copy-and-paste mistake.
Interested in learning more about Engineering at Pearl? Check out opportunities on our Careers Page.
Our Technology
- The DoS would be serious, of course. But any attacker can get their own API key for free, and launch an attack within minutes. They don’t need to steal a key from someone else. The worst case would be either (a) the attack is successful, and the World Health Organization’s response is delayed a little because of the misdirection, or (b) the WHO shuts off our API key, and some of our app functionality is disrupted.
- Technically, we could have caused the exact same problem using AWS Secrets Manager. I know because I tried it — the leading space is preserved.