St. Peter holding a key and a printed book, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1043265

Blessing your SSH at Lyft

Chris Steipp
Lyft Engineering

--

Like many organizations, Lyft continually looks for ways to address critical risks identified in our organization. Last summer, Lyft’s Security Team identified a lack of two-factor authentication on our ssh logins as an area of concern. Today, we’re sharing our experience implementing two-factor for SSH at Lyft, and announcing our open-source client for interacting with Netflix’s BLESS.

Background

Over the past few years, Lyft has developed a system for quickly getting newly hired engineers access into our infrastructure. Engineers are often able to push changes to Lyft services on their first day of employment. This involves provisioning or linking accounts in AWS, Github, pagerduty, etc on their first day. For SSH, this has previously required provisioning a private SSH key for the user and pushing out the corresponding public key to all servers in our fleet.

Lyft’s has a strong DevOps philosophy. We practice this by giving teams full control, and responsibility, to manage any service they write. While everyone in the organization agrees that protecting our data and infrastructure is extremely important, teams have to balance security with operational considerations. They need to quickly bring up and manage new services, debug those services in production, and respond efficiently during outages. This meant that any 2FA control that the Security Team put in place needed to be un-oppressive and work without failure, potentially while other systems in our environment were down. It also had to “Just Work” with the development and maintenance work of a fairly small security team (we have 5 members currently, and are hiring of course).

The team considered a few solutions:

  • Yubikeys are often used for two-factor authentication. Yubikeys have many benefits, but are costly and require a compatible authentication service to verify codes. This means that we would either need to rely on an external company (Yubico’s authentication service) for authentication, or run an authentication service internally with near 100% uptime. Using Yubikeys for SSH would also require installing a PAM module on every server, maintained by the security team.
  • The team also considered DUO, since the login flow is a great two-factor experience. But this again put us in the position of needing to install a PAM module on every server, and needing to have every server contact an external service.

Both of these solutions had some undesirable operational tradeoffs. We also had a use case where some engineers need to SSH into every server in an ASG, or our entire fleet, in response to an emergency. This is often thousands of servers at a time. We needed a solution where this was feasible, and performance was no worse than SSH-ing with a fixed private key.

Netflix’s BLESS

When looking for other options, the team came across Netflix’s “Bastion’s Lambda Ephemeral SSH Service” (BLESS), which had recently been open sourced. BLESS uses SSH certificates, added to OpenSSH in 5.4. SSH certificates allow a certificate authority to sign a user’s public key, along with a list of constraints; the user presents this certificate to the server during authentication. The server only needs to trust the CA, and does not need previous knowledge of the user’s public key. BLESS is designed to run as an AWS Lambda, Amazon’s infrastructure to run code without servers, and uses Amazon’s Key Management Service (KMS) to encrypt all secrets.

This approach appealed to the team because:

  • It was a simple configuration change to servers, instead of installing new code
  • BLESS leveraged existing AWS infrastructure (KMS and Lambda), which minimized responsibility for the Security Team
  • The cryptographic work to authenticate a user was the same for the user’s laptop, ensuring that performance impact would be negligible
  • Certificates can expire quickly, so compromised SSH keys are useless after a short time

Additionally, having our servers trust a CA instead of individual user keys improved our key-management process by allowing users to manage their own keys. Users could regenerate their key as often as they liked, with no coordination from Security or other infrastructure teams.

BLESS on the Endpoints

One drawback of Netflix’s BLESS is that Netflix relies on users first SSH-ing to a bastion server where the user is strongly authenticated. Lyft’s goal was to use SSH certificates for all SSH connections including the connection to our bastion servers, so we explored moving the SSH certificate to our engineer’s laptops and using strong, cryptographic assertions to prove to the BLESS Lambda that the user was previously authenticated with an approved second authentication factor. To this end, Lyft developed blessclient, a simple python client that would run on an engineer’s laptop, allowing the user to authenticate to AWS, prove their identity to the BLESS Lambda, receive an SSH certificate, and setup the user’s normal OpenSSH client to use the certificate for authentication.

kmsauth

Since the user would be telling the Lambda that they had been appropriately authenticated, we knew we needed a cryptographic token to prove the user’s identity and make an assertion about how the user was authenticated. Lyft already uses a similar process to prove a user or service’s identity to our secret management system, Confidant. Confidant uses Amazon’s KMS and IAM to generate and validate kmsauth tokens, which are short tokens encrypted by KMS where the encryption context is set to include the user’s identity. The KMS key has policy set on it to only allow an IAM user to encrypt using the key if their username is in the encryption context’s “from” field, and if they have authenticated to AWS using multi-factor authentication.

{
"Action": "kms:Encrypt",
"Effect": "Allow",
"Resource": [
"arn:aws:kms:us-east-1:123456789011:key/12345678-abab-cdcd-efef-123456789011",
],
"Condition": {
"StringEquals": {
"kms:EncryptionContext:to": [
"bless-production"
],
"kms:EncryptionContext:user_type": "user",
"kms:EncryptionContext:from": "${aws:username}"
},
"Bool": {
"aws:MultiFactorAuthPresent": "true"
}
}
}

If KMS is able to decrypt the token using the appropriate key, then the validating service knows that only the correct user could have generated the token, and only if they used MFA to authenticate to AWS recently.

Using Blessclient

When establishing an SSH connection, blessclient will prompt the user for an MFA code and then use KMS to generate the kmsauth token. Blessclient assumes a role in a separate AWS account (we use a separate account to prevent admins in our main account from compromising our CA) and invokes the BLESS Lambda, passing the kmsauth token to prove the user’s identity along with the public key the user wants signed. The Lambda returns a certificate. Blessclient saves alongside the user’s private key and loads the identity into the user’s ssh agent. The user’s SSH client can then establish connections until the certificate expires.

In most cases this all happens in less than a second, however the latency of three calls to AWS from the user’s laptop is perceptible. We mitigate this by caching enough information that blessclient makes about 1 call to AWS every 30 minutes, and only prompts for the MFA code once every 18 hours. These parameters are configurable in blessclient, so users can adjust for their risk.

To install blessclient onto our users’ laptops, we again wanted to reuse existing infrastructure as much as possible. Users clone a git repo, then run a set of bash scripts with make. After seven days, blessclient will pull the git repo and reinstall the client, so users are always using an updated version of the client.

Server Authentication

Building on our experience with BLESS and blessclient, we thought it would be a good time to address server authentication at Lyft. Typically, when users first SSH to a new server, the SSH client displays the server’s public key fingerprint to the user, then asks the user to accept it. If an attacker later tries to impersonate the server, the public key will not match the known fingerprint and the client will warn the user that something may be wrong.

In an AWS environment where we use autoscale groups that provision and terminate hundreds of servers each day, this can lead to security warning fatigue for users. Even more likely, users will simply tell their SSH client to accept all server fingerprints without question. To improve this, we implemented a Lambda and server-side agent to issue a certificate to each server when it starts. We then configure our users’ SSH clients to trust this CA on their laptops. This eliminates nearly all prompts about the server’s key fingerprint for our users, leaving only those where a legitimate issue may exist.

# soblessed

Lyft has been using blessclient in production since September 2016. See the talk Lyft engineers gave at BSides SF for more details. Blessclient is open source and ready for other organizations to use!

Interested in open source work and having a big impact? Lyft is hiring! Apply through our application system, or drop me a note at csteipp@lyft.com.

--

--