Monday, August 31, 2009

How to Keep Your AWS Credentials on an EC2 Instance Securely

If you've been using EC2 for anything serious then you have some code on your instances that requires your AWS credentials. I'm talking about code that does things like this:
  • Attach an EBS volume (requires your X.509 certificate and private key)
  • Download your application from a non-public location in S3 (requires your secret access key)
  • Send and receive SQS messages (requires your secret access key)
  • Query or update SimpleDB (requires your secret access key)
How do you get the credentials onto the instance in the first place? How can you store them securely once they're there? First let's examine the issues involved in securing your keys, and then we'll explore the available options for doing so.

Potential Vulnerabilities in Transferring and Storing Your Credentials

There are a number of vulnerabilities that should be considered when trying to protect a secret. I'm going to ignore the ones that result from obviously foolish practice, such as transferring secrets unencrypted.
  1. Root: root can get at any file on an instance and can see into any process's memory. If an attacker gains root access to your instance, and your instance can somehow know the secret, your secret is as good as compromised.
  2. Privilege escalation: User accounts can exploit vulnerabilities in installed applications or in the kernel (whose latest privilege escalation vulnerability was patched in new Amazon Kernel Images on 28 August 2009) to gain root access.
  3. User-data: Any user account able to open a socket on an EC2 instance can see the user-data by getting the URL http://169.254.169.254/latest/user-data . This is exploitable if a web application running in EC2 does not validate input before visiting a user-supplied URL. Accessing the user-data URL is particularly problematic if you use the user-data to pass in the secret unencrypted into the instance - one quick wget (or curl) command by any user and your secret is compromised. And, there is no way to clear the user-data - once it is set at launch time, it is visible for the entire life of the instance.
  4. Repeatability: HTTPS URLs transport their content securely, but anyone who has the URL can get the content. In other words, there is no authentication on HTTPS URLs. If you specify an HTTPS URL pointing to your secret it is safe in transit but not safe from anyone who discovers the URL.
Benefits Offered by Transfer and Storage Methods

Each transfer and storage method offers a different set of benefits. Here are the benefits against which I evaluate the various methods presented below:
  1. Easy to do. It's easy to create a file in an AMI, or in S3. It's slightly more complicated to encrypt it. But, you should have a script to automate the provision of new credentials, so all of the methods are graded as "easy to do".
  2. Possible to change (now). Once an instance has launched, can the credentials it uses be changed?
  3. Possible to change (future). Is it possible to change the credentials that will be used by instances launched in the future? All methods provide this benefit but some make it more difficult to achieve than others, for example instances launched via Auto Scaling may require the Launch Configuration to be updated.

How to Put AWS Credentials on an EC2 Instance


With the above vulnerabilities and benefits in mind let's look at different ways of getting your credentials onto the instance and the consequences of each approach.

Mitch Garnaat has a great set of articles about the AWS credentials. Part 1 explores what each credential is used for, and part 2 presents some methods of getting them onto an instance, the risks involved in leaving them there, and a strategy to mitigate the risk of them being compromised. A summary of part 1: keep all your credentials secret, like you keep your bank account info secret, because they are - literally - the keys to your AWS kingdom.

As discussed in part 2 of Mitch's article, there are a number of methods to get the credentials (or indeed, any secret) onto an instance:

1. Burn the secret into the AMI

Pros:
  • Easy to do.
Cons:
  • Not possible to change (now) easily. Requires SSHing into the instance, updating the secret, and forcing all applications to re-read it.
  • Not possible to change (future) easily. Requires bundling a new AMI.
  • The secret can be mistakenly bundled into the image when making derived AMIs.
Vulnerabilities:
  • root, privilege escalation.
2. Pass the secret in the user-data

Pros:
  • Easy to do. Putting the secret into the user-data must be integrated into the launch procedure.
  • Possible to change (future). Simply launch new instances with updated user-data. With Auto Scaling, create a new Launch Configuration with the updated user-data.
Cons:
  • Not possible to change (now). User-data cannot be changed once an instance is launched.
Vulnerabilities:
  • user-data, root, privilege escalation.

Here are some additional methods to transfer a secret to an instance, not mentioned in the article:

3. Put the secret in a public URL
The URL can be on a website you control or in S3. It's insecure and foolish to keep secrets in a publicly accessible URL. Please don't do this, I had to mention it just to be comprehensive.

Pros:
  • Easy to do.
  • Possible to change (now). Simply update the content at that URL. Any processes on the instance that read the secret each time will see the new value once it is updated.
  • Possible to change (future).
Cons:
  • Completely insecure. Any attacker between the endpoint and the EC2 boundary can see the packets and discover the URL, revealing the secret.
Vulnerabilities:
  • repeatability, root, privilege escalation.
4. Put the secret in a private S3 object and provide the object's path
To get content from a private S3 object you need the secret access key in order to authenticate with S3. The question then becomes "how to put the secret access key on the instance", which you need to do via one of the other methods.

Pros:
  • Easy to do.
  • Possible to change (now). Simply update the content at that URL. Any processes on the instance that read the secret each time will see the new value once it is updated.
  • Possible to change (future).
Cons:
  • Inherits the cons of the method used to transfer the secret access key.
Vulnerabilities:
  • root, privilege escalation.

5. Put the secret in a private S3 object and provide a signed HTTPS S3 URL
The signed URL must be created before launching the instance and specified somewhere that the instance can access - typically in the user-data. The signed URL expires after some time, limiting the window of opportunity for an attacker to access the URL. The URL should be HTTPS so that the secret cannot be sniffed in transit.

Pros:
  • Easy to do. The S3 URL signing must be integrated into the launch procedure.
  • Possible to change (now). Simply update the content at that URL. Any processes on the instance that read the secret each time will see the new value once it is updated.
  • Possible to change (future). In order to integrate with Auto Scaling you would need to (automatically) update the Auto Scaling Group's Launch Configuration to provide an updated signed URL for the user-data before the previously specified signed URL expires.
Cons:
  • The secret must be cached on the instance. Once the signed URL expires the secret cannot be fetched from S3 anymore, so it must be stored on the instance somewhere. This may make the secret liable to be burned into derived AMIs.
Vulnerabilities:
  • repeatability (until the signed URL expires), root, privilege escalation.
6. Put the secret on the instance from an outside source, via SCP or SSH
This method involves an outside client - perhaps your local computer, or a management node - whose job it is to put the secret onto the newly-launched instance. The management node must have the private key with which the instance was launched, and must know the secret in order to transfer it. This approach can also be automated, by having a process on the management node poll every minute or so for newly-launched instances.

Pros:
  • Easy to do. OK, not "easy" because it requires an outside management node, but it's doable.
  • Possible to change (now). Have the management node put the updated secret onto the instance.
  • Possible to change (future). Simply put a new secret onto the management node.
Cons:
  • The secret must be cached somewhere on the instance because it cannot be "pulled" from the management node when needed. This may make the secret liable to be burned into derived AMIs.
Vulnerabilities:
  • root, privilege escalation.

The above methods can be used to transfer the credentials - or any secret - to an EC2 instance.

Instead of transferring the secret directly, you can transfer an encrypted secret. In that case, you'd need to provide a decryption key also - and you'd use one of the above methods to do that. The overall security of the secret would be influenced by the combination of methods used to transfer the encrypted secret and the decryption key. For example, if you encrypt the secret and pass it in the user-data, providing the decryption key in a file burned into the AMI, the secret is vulnerable to anyone with access to both user-data and the file containing the decryption key. Also, if you encrypt your credentials then changing the encryption key requires changing two items: the encryption key and the encrypted credentials. Therefore, changing the encryption key can only be as easy as changing the credentials themselves.

How to Keep AWS Credentials on an EC2 Instance

Once your credentials are on the instance, how do you keep them there securely?

First off, let's remember that in an environment out of your control, such as EC2, you have no guarantees of security. Anything processed by the CPU or put into memory is vulnerable to bugs in the hypervisor (the virtualization provider) or to malicious AWS personnel (though the AWS Security White Paper goes to great lengths to explain the internal procedures and controls they have implemented to mitigate that possibility) or to legal search and seizure. What this means is that you should only run applications in EC2 for which the risk of secrets being exposed via these vulnerabilities is acceptable. This is true of all applications and data that you allow to leave your premises. But this article is about the security of the AWS credentials, which control the access to your AWS resources. It is perfectly acceptable to ignore the risk of abuse by AWS personnel exposing your credentials because AWS folks can manipulate your account resources without needing your credentials! In short, if you are willing to use AWS then you trust Amazon with your credentials.

There are three ways to store information on a running machine: on disk, in memory, and not at all.

1. Keeping a secret on disk
The secret is stored in a file on disk, with the appropriate permissions set on the file. The secret survives a reboot intact, which can be a pro or a con: it's a good thing if you want the instance to be able to remain in service through a reboot; it's a bad thing if you're trying to hide the location of the secret from an attacker, because the reboot process contains the script to retrieve and cache the secret, revealing its cached location. You can work around this by altering the script that retrieves the secret, after it does its work, to remove traces of the secret's location. But applications will still need to access the secret somehow, so it remains vulnerable.

Pros:
  • Easily accessible by applications on the instance.
Cons:
  • Visible to any process with the proper permissions.
  • Easy to forget when bundling an AMI of the instance.
Vulnerabilities:
  • root, privilege escalation.
2. Keeping the secret in memory
The secret is stored as a file on a ramdisk. (There are other memory-based methods, too.) The main difference between storing the secret in memory and on the filesystem is that memory does not survive a reboot. If you remove the traces of retrieving the secret and storing it from the startup scripts after they run during the first boot, the secret will only exist in memory. This can make it more difficult for an attacker to discover the secret, but it does not add any additional security.

Pros:
  • Easily accessible by applications on the instance.
Cons:
  • Visible to any process with the proper permissions.
Vulnerabilities:
  • root, privilege escalation.
3. Do not store the secret; retrieve it each time it is needed
This method requires your applications to support the chosen transfer method.

Pros:
  • Secret is never stored on the instance.
Cons:
  • Requires more time because the secret must be fetched each time it is needed.
  • Cannot be used with signed S3 URLs. These URLs expire after some time and the secret will no longer be accessible. If the URL does not expire in a reasonable amount of time then it is as insecure as a public URL.
  • Cannot be used with externally-transferred (via SSH or SCP) secrets because the secret cannot be pulled from the management node. Any protocol that tries to pull the secret from the management node can be also be used by an attacker to request the secret.
Vulnerabilities:
  • root, privilege escalation.
Choosing a Method to Transfer and Store Your Credentials

The above two sections explore some options for transferring and storing a secret on an EC2 instance. If the secret is guarded by another key - such as an encryption key or an S3 secret access key - then this key must also be kept secret and transferred and stored using one of those same methods. Let's put all this together into some tables presenting the viable options.

Unencrypted Credentials

Here is a summary table evaluating the transfer and storage of unencrypted credentials using different combinations of methods:

Transferring and Keeping Unencrypted Credentials

Some notes on the above table:
  • Methods making it "hard" to change credentials are highlighted in yellow because, through scripting, the difficulty can be minimized. Similarly, the risk of forgetting credentials in an AMI can be minimized by scripting the AMI creation process and choosing a location for the credential file that is excluded from the AMI by the script.
  • While you can transfer credentials using a private S3 URL, you still need to provide the secret access key in order to access that private S3 URL. This secret access key must also be transferred and stored on the instance, so the private S3 URL is not by itself usable. See below for an analysis of using a private S3 URL to transfer credentials. Therefore the Private S3 URL entries are marked as N/A.
  • You can burn credentials into an AMI and store them in memory. The startup process can remove them from the filesystem and place them in memory. The startup process should then remove all traces from the startup scripts mentioning the key's location in memory, in order to make discovery more difficult for an attacker with access to the startup scripts.
  • Credentials burned into the AMI cannot be "not stored". They can be erased from the filesystem, but must be stored somewhere in order to be usable by applications. Therefore these entries are marked as N/A.
  • Credentials transferred via a signed S3 URL cannot be "not stored" because the URL expires and, once that happens, is no longer able to provide the credentials. Thus, these entries are marked N/A.
  • Credentials "pushed" onto the instance from an outside source, such as SSH, cannot be "not stored" because they must be accessible to applications on the instance. These entries are marked N/A.
A glance at the above table shows that it is, overall, not difficult to manage unencrypted credentials via any of the methods. Remember: don't use the Public URL method, it's completely unsecure.

Bottom line: If you don't care about keeping your credentials encrypted then pass a signed S3 HTTPS URL in the user-data. The startup scripts of the instance should retrieve the credentials from this URL and store them in a file with appropriate permissions (or in a ramdisk if you don't want them to remain through a reboot), then the startup scripts should remove their own commands for getting and storing the credentials. Applications should read the credentials from the file (or directly from the signed URL is you don't care that it will stop working after it expires).

Encrypted Credentials

We discussed 6 different ways of transferring credentials and 3 different ways of storing them. A transfer method and a storage method must be used for the encrypted credentials and for the decryption key. That gives us 36 combinations of transfer methods, and 9 combinations of storage methods, for a grand total of 324 choices.

Here are the first 54, summarizing the options when you choose to burn the encrypted credentials into the AMI:



As (I hope!) you can see, all combinations that involve burning encrypted credentials into the AMI make it hard (or impossible) to change the credentials or the encryption key, both on running instances and for future ones.

Here are the next set, summarizing the options when you choose to pass encrypted credentials via the user-data:



Passing encrypted credentials in the user-data requires the decryption key to be transferred also. It's pointless from a security perspective to pass the decryption key together with the encrypted credentials in the user-data. The most flexible option in the above table is to pass the decryption key via a signed S3 HTTPS URL (specified in the user-data, or specified at a public URL burned into the AMI) with a relatively short expiry time (say, 4 minutes) allowing enough time for the instance to boot and retrieve it.

Here is a summary of the combinations when the encrypted credentials are passed via a public URL:



It might be surprising, but passing encrypted credentials via a public URL is actually a viable option. You just need to make sure you send and store the decryption key securely, so send that key via a signed S3 HTTPS URL (specified in the user-data on specified at a public URL burned into the AMI) for maximum flexibility.

The combinations with passing the encrypted credentials via a private S3 URL are summarized in this table:



As explained earlier, the private S3 URL is not usable by itself because it requires the AWS secret access key. (The access key id is not a secret). The secret access key can be transferred and stored using the combinations of methods shown in the above table.

The most flexible of the options shown in the above table is to pass in the secret access key inside a signed S3 HTTPS URL (which is itself provided in the user-data or at a public URL burned into the AMI).

Almost there.... This next table summarizes the combinations with encrypted credentials passed via a signed S3 HTTPS URL:



The signed S3 HTTPS URL containing the encrypted credentials can be specified in the user-data or specified behind a public URL which is burned into the AMI. The best options for providing the decryption key are via another signed URL or from an external management node via SSH or SCP.

And, the final section of the table summarizing the combinations of using encrypted credentials passed in via SSH or SCP from an outside management node:



The above table summarizing the use of an external management node to place encrypted credentials on the instance shows the same exact results as the previous table (for a signed S3 HTTPS URL). The same flexibility is achieved using either method.

The Bottom Line

Here's a practical recommendation: if you have code that generates signed S3 HTTPS URLs then pass in two signed URLs into the user-data, one containing the encrypted credentials and the other containing the decryption key. The startup sequence of the AMI should read these two items from their URLs, decrypt the credentials, and store the credentials in a ramdisk file with the minimum permissions necessary to run the applications. The start scripts should then remove all traces of the procedure (beginning with "read the user-data URL" and ending with "remove all traces of the procedure") from themselves.

If you don't have code to generate signed S3 URLs then burn the encrypted credentials into the AMI and pass the decryption key via the user-data. As above, the startup sequence should decrypt the credentials, store them in a ramdisk, and destroy all traces of the raw ingredients and the process itself.

This article is an informal review of the benefits and vulnerabilities offered by different methods of transferring credentials to and storing credentials on an EC2 instance. In a future article I will present scripts to automate the procedures described. In the meantime, please leave your feedback in the comments.

Tuesday, August 18, 2009

Amazon S3 Gotcha: Using Virtual Host URLs with HTTPS

Amazon S3 is a great place to store static content for your web site. If the content is sensitive you'll want to prevent the content from being visible while in transit from the S3 servers to the client. The standard way to secure the content during transfer is by https - simply request the content via an https URL. However, this approach has a problem: it does not work for content in S3 buckets that are accessed via a virtual host URL. Here is an examination of the problem and a workaround.

Accessing S3 Buckets via Virtual Host URLs


S3 provides two ways to access your content. One way uses s3.amazonaws.com host name URLs, such as this:

http://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey

The other way to access your S3 content uses a virtual host name in the URL:

http://mybucket.mydomain.com.s3.amazonaws.com/myObjectKey

Both of these URLs map to the same object in S3.

You can make the virtual host name URL shorter by setting up a DNS CNAME that maps mybucket.mydomain.com to mybucket.mydomain.com.s3.amazonaws.com. With this DNS CNAME alias in place, the above URL can also be written as follows:

http://mybucket.mydomain.com/myObjectKey

This shorter virtual host name URL works only if you setup the DNS CNAME alias for the bucket.

Virtual host names in S3 is a convenient feature because it allows you to hide the actual location of the content from the end-user: you can provide the URL http://mybucket.mydomain.com/myObjectKey and then freely change the DNS entry for mybucket.mydomain.com (to point to an actual server, perhaps) without changing the application. With the CNAME alias pointing to mybucket.mydomain.com.s3.amazonaws.com, end-users do not know that the content is actually being served from S3. Without the DNS CNAME alias you'll need to explicitly use one of the URLs that contain s3.amazonaws.com in the host name.

The Problem with Accessing S3 via https URLs

https encrypts the transferred data and prevents it from being recovered by anyone other than the client and the server. Thus, it is the natural choice for applications where protecting the content in transit is important. However, https relies on internet host names for verifying the identity certificate of the server, and so it is very sensitive to the host name specified in the URL.

To illustrate this more clearly, consider the servers at s3.amazonaws.com. They all have a certificate issued to *.s3.amazonaws.com. ["Huh?" you say. Yes, the SSL certificate for a site specifies the host name that the certificate represents. Part of the handshaking that sets up the secure connection ensures that the host name of the certificate matches the host name in the request. The * indicates a wildcard certificate, and means that the certificate is valid for any subdomain also.] If you request the https URL https://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey, then the certificate's host name matches the requested URL's host name component, and the secure connection can be established.

If you request an object in a bucket without any periods in its name via a virtual host https URL, things also work fine. The requested URL can be https://aSimpleBucketName.s3.amazonaws.com/myObjectKey. This request will arrive at an S3 server (whose certificate was issued to *.s3.amazonaws.com), which will notice that the URL's host name is indeed a subdomain of s3.amazonaws.com, and the secure connection will succeed.

However, if you request the virtual host URL https://mybucket.mydomain.com.s3.amazonaws.com/myObjectKey, what happens? The host name component of the URL is mybucket.mydomain.com.s3.amazonaws.com, but the actual server that gets the request is an S3 server whose certificate was issued to *.s3.amazonaws.com. Is mybucket.mydomain.com.s3.amazonaws.com a subdomain of s3.amazonaws.com? It depends who you ask, but most up-to-date browsers and SSL implementations will say "no." A multi-level subdomain - that is, a subdomain that has more than one period in it - is not considered to be a proper subdomain by recent Firefox, Internet Explorer, Java, and wget clients. So the client will report that the server's SSL certificate, issued to *.s3.amazonaws.com, does not match the host name of the request, mybucket.mydomain.com.s3.amazonaws.com, and refuse to establish a secure connection.

The same problem occurs when you request the virtual host https URL https://mybucket.mydomain.com/myObjectKey. The request arrives - after the client discovers that mybucket.mydomain.com is a DNS CNAME alias for mybucket.mydomain.com.s3.amazonaws.com - at an S3 server with an SSL certificate issued to *.s3.amazonaws.com. In this case the host name mybucket.mydomain.com clearly does not match the host name on the certificate, so the secure connection again fails.

Here is what a failed certificate check looks like in Firefox 3.5, when requesting https://images.mydrifts.com.s3.amazonaws.com/someContent.txt:


Here is what happens in Java:
javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching media.mydrifts.com.s3.amazonaws.com found.
at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1591)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:187)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:181)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:975)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:123)
at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:516)
at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:454)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:884)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1096)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1123)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1107)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:405)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:166)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching media.mydrifts.com.s3.amazonaws.com found.
at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:193)
at sun.security.util.HostnameChecker.match(HostnameChecker.java:77)
at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:264)
at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:250)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:954)

And here is what happens in wget:
$ wget -nv https://media.mydrifts.com.s3.amazonaws.com/someContent.txt
ERROR: Certificate verification error for media.mydrifts.com.s3.amazonaws.com: unable to get local issuer certificate
ERROR: certificate common name `*.s3.amazonaws.com' doesn't match requested host name `media.mydrifts.com.s3.amazonaws.com'.
To connect to media.mydrifts.com.s3.amazonaws.com insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

Requesting the https URL using the DNS CNAME images.mydrifts.com results in the same errors, with the messages saying that the certificate *.s3.amazonaws.com does not match the requested host name images.mydrifts.com.

Notice that the browsers and wget clients offer a way to circumvent the mis-matched SSL certificate. You could, theoretically, ask your users to add an exception to the browser's security settings. However, most web users are scared off by a "This Connection is Untrusted" message, and will turn away when confronted with that screen.

How to Access S3 via https URLs

As pointed out above, there are two forms of S3 URLs that work with https:

https://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey

and this:

https://simplebucketname.s3.amazonaws.com/myObjectKey

So, in order to get https to work seamlessly with your S3 buckets, you need to either:
  • choose a bucket whose name contains no periods and use the virtual host URL, such as https://simplebucketname.s3.amazonaws.com/myObjectKey or
  • use the URL form that specifies the bucket name separately, after the host name, like this: https://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey.
Update 25 Aug 2009: For buckets created via the CreateBucketConfiguration API call, the only option is to use the virtual host URL. This is documented in the S3 docs here.

Friday, August 14, 2009

Elastic Load Balancer: An Elasticity Gotcha

If you use AWS's Elastic Load Balancer to allow your EC2 application to scale, like I do, then you'll want to know about this gotcha recently reported in the AWS forums. By all appearances, it looks like something that should be fixed by Amazon. Until it is, you can reduce (but not eliminate) your exposure to this problem by keeping a small TTL for your ELB's DNS CNAME entry. Read on for details.

The Gotcha

As your ELB-balanced application experiences an increasing load, some of the traffic received by your back-end instances may be traffic that does not belong to your application. And, after your application experiences a sustained heavy load and then traffic subsides, some of your application's traffic may be lost or misdirected to other EC2 instances that are not yours.

Why it Happens

In my article about how ELB works, I describe how ELB resolves its DNS name to a pool of IP addresses, and that this pool increases and decreases in size according to the load placed on the service. Each of the IP addresses in the pool is a "virtual appliance", acting as a load balancer to distribute the connections among your back-end instances. This gives ELB two levels of elasticity: the pool of virtual appliance IP addresses, and your pool of back-end instances.

Before they are assigned to a specific ELB, the virtual appliance IP addresses are available for use by any ELB, waiting in a global pool. When an ELB needs to increase its pool of virtual appliances due to load, it gets a new IP address from the global pool and begins resolving the ELB DNS name to that IP address in addition to the ones it already uses. And when an ELB notices decreasing load, it releases one of its virtual appliance IP addresses back to the global pool, and no longer returns that IP address when resolving the ELB DNS name. According to testing performed by AWS forum user wizardofcrowds, ELB scales up under sustained load by increasing its pool of IP addresses at the rate of one additional address every 5 minutes. And, ELB scales down by relinquishing an IP address at the rate of one every 2 hours. Thus it is possible that a single ELB virtual appliance IP address can be in service to a number of different ELBs over the course of a few hours.

The problem is that DNS resolution is cached at many layers across the internet. When the ELB scales up and gets a new virtual appliance IP address from the global pool, some client somewhere might still be using that IP address as the resolution of a different ELB's DNS name. This other ELB might not even belong to you. A few hours ago, another ELB with a different DNS name returned that IP address from a DNS lookup. Now, that IP address is serving your ELB. But some client somewhere may still be using that IP address to attempt to reach an application that is not yours.

The flip side occurs when the ELB scales down and releases a virtual appliance IP address back to the global pool. Some client somewhere might continue resolving your ELB's DNS name to the now-relinquished IP address. When the address is returned to the pool, that client's attempts to connect to your service will fail. If that same virtual appliance IP is then put into service for another ELB, then the client working with the cached but no-longer-current DNS resolution for your ELB DNS name will be directed to the other ELB virtual appliance, and then onward to back-end instances that are not yours.

So your application served by ELB may receive traffic destined for other ELBs during increasing load, and may experience lost traffic during decreasing load.

What is the Solution?


Fundamentally, this issue is caused by badly-configured DNS implementations. Some DNS servers (including those of some major ISPs) ignore the TTL ("time to live") setting of the original DNS record, and thus end up resolving DNS names to an expired IP address. Some DNS clients (browsers such as IE7, and Java programs by default) also ignore DNS TTLs, causing the same problem. Short of fixing all the misconfigured DNS servers and patching all the IE and Java VMs, however, the issue cannot be solved. So a workaround is the best we can hope for.

You, the EC2 user, can minimize the risk that a well-behaved client will experience this issue. Set up your DNS CNAME entries for the ELB to have a small TTL - 120 seconds is good. This will help for clients whose DNS honors the TTL, but not for clients that ignore TTLs or for clients using DNS servers that ignore TTLs.

Amazon can work around the problem on their end. When an ELB needs to scale up and use a new virtual appliance IP address, that address could remain "reserved" for its use for a longer time. When the ELB scales down and releases the virtual appliance IP address, that address would not be reused by another ELB until the reservation period has expired. This would prevent "recent" ELB virtual appliance IP address from being reused by other ELBs, and reduce the risk of misdirecting traffic.

It should be noted that DNS caching and TTLs influence all load balancing solutions that rely on DNS (such as round-robin DNS), so this issue is not unique to ELB. Caching DNS entries is a good thing for the internet in general, but not all implementations honor the TTL of the cached DNS records. Services relying on DNS for scalability need to be designed with this in mind.

Friday, August 7, 2009

Mount an EBS Volume Created from Snapshot at Startup

There are many posts about how to mount an EBS volume to your EC2 instance during the startup process. But requiring an instance to use a specific EBS volume has limitations that make the technique unsuitable for large-scale use. In this article I present a more flexible technique that uses an EBS snapshot instead.

Limitations of Mounting an EBS Volume on Instance Startup


Mounting an EBS volume at startup is relatively straightforward (see the above-referenced posts for details). The main features of the procedure are:
  • The instance uses the EC2 API tools to attach the specified EBS volume. These tools, in turn, require Java and your EC2 credentials - your certificate and private key.
  • Ideally, the AMI contains a hook to allow the EBS volume ID to be specified dynamically at startup time, either as a parameter in the user-data or retrieved from S3 or SimpleDB.
  • The AMI should already contain (or its startup scripts should create) the appropriate references to the necessary locations on the mounted EBS volume. For example, if the volume is mounted at /vol, then /var/lib/mysql (the default MySQL data directory) might be soft-linked (ln -s) or mount --binded to /vol/var/lib/mysql. Alternatively, the applications can be configured to use the locations on the mounted volume directly.
There are many benefits to mounting an EBS volume at instance startup:
  • Avoid the need to burn a new AMI when the content on the instance's disks changes.
  • Gain the redundancy provided by an EBS volume.
  • Gain the point-in-time backups provided by EBS snapshots.
  • Avoid the need to store the updated content into S3 before instance shutdown.
  • Avoid the need to copy and reconstitute the content from S3 to the instance during startup.
  • Avoid paying for an instance to be always-on.
But mounting an EBS volume at startup also has important limitations:
  • Instances must be launched in the same availability zone as the EBS volume. EBS volumes are availability-zone specific, and are only usable by instances running in the same availability zone. Large-scale deployments use instances in multiple availability zones to mitigate risk, so limiting the deployment to a single availability zone is not reasonable.
  • There is no way to create multiple instances that each have the same EBS volume attached. When you need multiple instances that each have the same data, one EBS volume will not do the trick.
  • As a corollary to the previous point, it is difficult to create Auto Scaling groups of AMIs that mount an EBS volume automatically because each instance needs its own EBS volume.
  • It is difficult to automate the startup of a replacement instance when the original instance still has the EBS volume attached. Large-scale deployments need to be able to handle failure automatically because instances will fail. Sometimes instances will fail in a not-nice way, leaving the EBS volume attached. Detaching the EBS volume may require manual intervention, which is something that should be avoided if at all possible for large-scale deployments.
These limitations make the technique of mounting an EBS volume at startup unsuitable for large-scale deployments.

The Alternative: Mount an EBS Volume Created from a Snapshot at Startup

Instead of specifying an EBS volume to mount at startup, we can specify an EBS snapshot. At startup the instance creates a new EBS volume from the given snapshot and attaches the new volume to itself. The basic startup flow looks like this:
  1. If there is a volume already attached at the target location, do nothing - this is a reboot. Otherwise, continue to step 2.
  2. Create a new EBS volume from the specified snapshot. This requires the following:
    • Java
    • The EC2 API tools
    • The EC2 account's certificate and private key
    • The EBS snapshot ID
  3. Attach the newly-created EBS volume and mount it to the mount point.
  4. Restore any filesystem pointers, if necessary, to point to the proper locations beneath the EBS volume's mount point.
Like the technique of mounting an EBS volume, this technique should ideally support specifying the snapshot ID dynamically at instance startup time, perhaps via the user-data or retrieved from S3 or SimpleDB.

Why Mount an EBS Volume Created from a Snapshot at Startup?

As outlined above, the procedure is simple and it offers the following benefits:
  • Instances need not be launched in the same availability zone as the EBS volume. However, instances are limited to using EBS snapshots that are in the same region (US or EU).
  • Instances no longer need to rely on a specific EBS volume being available.
  • Multiple instances can be launched easily, each of which will automatically configure itself with its own EBS volume made from the snapshot.
  • Costs can be reduced by allowing "duplicate" EBS volumes to be provisioned only when they are needed by an instance. "Duplicate" EBS volumes are created on demand, and can also (optionally) be deleted during instance termination. Previously, you needed to keep around as many EBS volumes as the maximum number of simultaneous instances you would use.
  • Large-scale deployments requiring content on an EBS volume are easy to build.
Here are some cool things that are made possible by this technique:
  • MySQL replication slave (or cluster member) launching can be made more efficient. By specifying a recent snapshot of the master database's EBS volume, the new MySQL slave instance will begin its life already containing most of the data. This slave will demand fewer resources from the master instance and will take less time to catch-up to the master. If you do plan to use this technique for launching MySQL slaves, see Eric Hammond's article on EBS snapshots of a MySQL slave database in EC2 for some sage words of advice.
  • Auto Scaling can launch instances that mount content stored in EBS at startup. If the auto-scaled instances all need to access common content that is stored in EBS, this technique allows you to duplicate that content onto each auto-scaled instance automatically. And, if the instance gets the snapshot ID from its user-data at startup, you can easily change the snapshot ID for auto-scaled instances by updating the launch configuration.
I am currently exploring how to combine this technique with the one discussed in my article about how to boot the entire instance from an EBS volume. Combining these approaches could provide the ability to "boot from a snapshot", allowing you relate to bootable snapshots the same way you think about AMIs. Stay tuned to this blog for an article on this approach.

Caveats

Sounds great, huh? Despite these benefits, this technique can introduce a new problem: too many EBS volumes. As you may know, AWS limits the number of EBS volumes you can create to 20 (initially, and you can request a higher limit). This technique creates a new EBS volume each time an instance starts up, so your account will accumulate many EBS volumes. Plus, each volume will be almost indistinguishable from the others, making them difficult to track.

One potential way to distinguish the EBS volumes would be the ability to tag them via the API: Each instance would tag the volume upon creation, and these tags would be visible in the management interface to provide information about the origin of the volume. Unfortunately the EC2 API does not offer a way to tag EBS volumes. Until that feature is supported, use the ElasticFox Firefox extension to tag EBS volumes manually. I find it helpful to tag volumes with the creating instance's ID and the instance's "tag" security groups (see my article on using security groups to tag instances). ElasticFox displays the snapshot ID from which the volume was created and its creation timestamp, which are also useful to know.

As already hinted at, you will still need to think about what to do when the newly-created EBS volumes are no longer in use by the instance that created them. If you know you won't need them, have a script to detach and delete the volume during instance shutdown (but not shutdown-before-reboot). Be aware that if an instance fails to terminate nicely the attached EBS volume may still exist and you will be charged for it.

In any case, make sure you keep track of your EBS volumes because the cost of keeping them around can add up quickly.

How to Mount an EBS Volume Created from a Snapshot on Startup

Now for the detailed instructions. Please note that the instructions below have been tested on Ubuntu 8.04, and should work on Debian or Ubuntu systems. If you are running a Red Hat-based system such as CentOS then some of the procedure may need to be adjusted accordingly.

There are four parts of getting set up:
  • Setting up the Original Instance with an EBS volume
  • Creating the EBS Snapshot
  • Preparing the AMI
  • Launching a New Instance
In the last step the new instance will create a new volume from the specified snapshot and mount it during startup.

Setting Up the Original Instance with an EBS volume

[Note: this section is based on the fine article about runing MySQL with EBS by Eric Hammond.]

Start out with an EC2 instance booted from an AMI that you like. I recommend one of the Alestic Ubuntu Hardy 8.04 Server AMIs. The instance will be assigned an instance ID (in this example i-11111111) and a public IP address (in this example 1.1.1.1).

ec2-run-instances -z us-east-1a --key MyKeypair ami-0772946e

ec2-describe-instances i-11111111

Once the ec2-describe-instances output shows that the instance is running, continue by creating an EBS volume. This command creates a 1 GB volume in the us-east-1a availability zone, which is the same zone in which the instance was launched. The volume will be assigned a volume ID (in this example vol-00000000).

ec2-create-volume -z us-east-1a -s 1

ec2-describe-volumes vol-0000000

Once the ec2-describe-volumes output shows that the volume is available, attach it to the instance:

ec2-attach-volume -d /dev/sdh -i i-11111111 vol-0000000

Next we can log into the instance and set it up. The following will install MySQL and the XFS filesystem drivers, and then mount and format the EBS volume. When prompted, specify a MySQL root password. If you are running a Canonical Ubuntu AMI you need to change the ssh username from root to ubuntu in these commands.

ssh -i id_rsa-MyKeypair root@1.1.1.1

sudo apt-get update && sudo apt-get upgrade -y
sudo apt-get install -y xfsprogs mysql-server
sudo modprobe xfs
sudo mkfs.xfs /dev/sdh
sudo mount -t xfs -o noatime /dev/sdh /vol
sudo mkdir /vol
sudo mount /vol

The EBS volume is now attached and formatted and MySQL is installed, so now we configure MySQL to use the EBS volume for its data, configuration, and logs:

sudo /etc/init.d/mysql stop
# Due to a minor MySQL bug this may be necessary - does not hurt
sudo killall mysqld_safe
export EBS_MOUNT_DIR=/vol
export EBS_EXPORTS="/etc/mysql /var/lib/mysql /var/log/mysql"
for i in $EBS_EXPORTS
do
EBS_MOUNTED_EXPORT_DIR="$EBS_MOUNT_DIR""$i"
sudo mkdir -p `dirname "$EBS_MOUNTED_EXPORT_DIR"`
sudo mv $i `dirname "$EBS_MOUNTED_EXPORT_DIR"`
sudo mkdir $i
sudo mount --bind
"$EBS_MOUNTED_EXPORT_DIR" "$i"
done
sudo /etc/init.d/mysql start
# Needed later to hold our credentials for bundling an AMI
sudo -H mkdir ~/.ec2

Before we go on, we'll make sure the EBS volume is being used by MySQL. The data directory on the EBS volume is /vol/var/lib/mysql so we should expect new databases to be created there.

mysql -u root -p -e create database db_on_ebs"
ls -l /vol/var/lib/mysql/

The listing should show that the new directory db_on_ebs was created. This proves that MySQL is using the EBS volume for its data store.

Creating the EBS Snapshot

All the above steps prepare the original instance and the EBS volume for being snapshotted. The following procedure can be used to snapshot the volume.

On the instance perform the following to stop MySQL and unmount the EBS volume:

sudo /etc/init.d/mysql stop
sudo umount /etc/mysql
sudo umount /var/log/mysql
sudo umount /var/lib/mysql
sudo umount /vol

Then, from your local machine, create a snapshot as follows. Remember the snapshot ID for later (snap-00000000 in this example).

ec2-create-snapshot vol-00000000

The snapshot is in progress and you can check its status with the ec2-describe-snapshots command.

Preparing the AMI

At this point in the procedure we have the following set up already:
  • an instance that uses an EBS volume for MySQL files.
  • an EBS volume attached to that instance, having the MySQL files on it.
  • an EBS snapshot of that volume.
Now we are ready to prepare the instance for becoming an AMI. This AMI, when launched, will be able to create a new EBS volume from the snapshot and mount it at startup time.

First, from your local machine, copy your credentials to the EC2 instance:

scp -i id_rsa-MyKeypair pk-whatever1234567890.pem \
cert-whatever1234567890.pem root@1.1.1.1:~/.ec2/

Back on the EC2 instance install Java (skipping the annoying interactive license agreement) and the EC2 API tools:

export DEBIAN_FRONTEND=noninteractive
echo sun-java6-jdk shared/accepted-sun-dlj-v1-1 select true \
| sudo /usr/bin/debconf-set-selections
echo sun-java6-jre shared/accepted-sun-dlj-v1-1 select true \
| sudo /usr/bin/debconf-set-selections
sudo -E apt-get install -y unzip sun-java6-jdk
sudo -H wget -O ~/ec2-api-tools.zip \
http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip \
&& cd ~ && unzip ec2-api-tools.zip && \
ln -s ec2-api-tools-1.3-36506 ec2-api-tools

Note: Future versions of the EC2 API tools will have a different version number, and the above command will need to change accordingly.

Next, set up the script that does the create-volume-and-mount-it magic at startup. Download it from here with the following command:

sudo curl -Lo /etc/init.d/create-ebs-vol-from-snapshot-and-mount \
https://sites.google.com/site/shlomosfiles/clouddevelopertips/\
create-ebs-vol-from-snapshot-and-mount?attredirects=0

The script has a number of items to customize:
  • The EC2 account credentials: Put a pointer to your private key and certificate file into the script in the appropriate place. If you followed the above instructions these will be in /root/.ec2. Make sure the credentials are located on the instance's root partition in order to ensure the keys are bundled into the AMI.
  • The snapshot ID. This too can either be hard-coded into the script or, even better, provided as part of the user-data. It is controlled by the EBS_VOL_FROM_SNAPSHOT_ID setting. See below for an example of how to specify and override this value via the user-data.
  • The JAVA_HOME directory. This is the location of the Java installation. On most linux distributions this should point to /usr/lib/jvm/java-6-sun .
  • The EC2_HOME directory. This is the location where the EC2 API tools are installed. If you followed the procedure above this will be /root/ec2-api-tools .
  • The device attach point for the EBS volume. This is controlled by the EBS_ATTACH_DEVICE setting, and is /dev/sdh in these instructions.
  • The filesystem mount directory for the EBS volume. This is controlled by the EBS_MOUNT_DIR setting, and is /vol in these instructions.
  • The directories to be exported from the EBS volume. These are the directories that will be "mapped" to the root filesystem via mount --bind. These are specified in the EBS_EXPORTS setting.
  • If you are creating an AMI for the EU region, uncomment the line export EC2_URL=https://eu-west-1.ec2.amazonaws.com by removing the leading #.
Once you customize the script, set it up to run upon startup as follows:

sudo chmod +x /etc/init.d/create-ebs-vol-from-snapshot-and-mount
sudo update-rc.d create-ebs-vol-from-snapshot-and-mount start S 89 .

As mentioned above, if you do not want the newly-created EBS volume to persist after the instance terminates you can configure the script to be run on shutdown, allowing it to delete the volume. One way of doing this is to create the AMI with a shutdown hook already in place. To do this:

sudo ln -s /etc/init.d/create-ebs-vol-from-snapshot-and-mount /etc/rc0.d/K32create-ebs-vol-from-snapshot-and-mount

Alternatively, you can defer this decision to instance launch time, by passing in the above command via a user-data script - see below for more on this.

Remember: Running this script as part of the shutdown process as described above will delete the EBS volume. If you do not want this to happen automatically, don't execute the above command. If you mistakenly ran the above command you can fix things as follows:

sudo rm /etc/rc0.d/K32create-ebs-vol-from-snapshot-and-mount

Next is a little cleanup before bundling:

# New instances need their own host keys generated at first boot
chmod +x ec2-ssh-host-key-gen
# New instances should not contain leftovers from this instance
sudo rm -f /root/.*hist*
sudo rm -f /var/log/*.gz
sudo find /var/log -name mysql -prune -o -type f -print | \
while read i; do sudo cp /dev/null $i; done

The instance is now ready to be bundled into an AMI, uploaded, and registered. The commands below show this process. For more about the considerations when bundling an AMI see the this article by Eric Hammond.

bucket=com.mybucket.images
prefix=ubuntu-hardy-32bit-attach-snapshot-at-startup-for-mysql-20090805
export AWS_USER_ID=MY-USER-ID
export AWS_ACCESS_KEY_ID=MY-ACCESS-KEY
export AWS_SECRET_ACCESS_KEY=MY-SECRET-ACCESS-KEY
arch=i386


sudo -E ec2-bundle-vol \
-r $arch \
-d /mnt \
-p $prefix \
-u $AWS_USER_ID \
-k ~/.ec2/pk-*.pem \
-c ~/.ec2/cert-*.pem \
-s 10240 \
-e /mnt,/tmp,/root/.ssh
ec2-upload-bundle \
-b $bucket \
-m /mnt/$prefix.manifest.xml \
-a $AWS_ACCESS_KEY_ID \
-s $AWS_SECRET_ACCESS_KEY

Once the bundle has uploaded successfully, register it from your local machine as follows:

bucket=com.mybucket.images
prefix=ubuntu-hardy-32bit-attach-snapshot-at-startup-for-mysql-20090805
ec2-register $bucket/$prefix.manifest.xml

The ec2-register command displays an AMI ID (ami-012345678 in this example).

We are finally ready to test it out!

Launching a New Instance

Now we are ready to launch a new instance that creates and mounts an EBS volume from the snapshot. The snapshot ID is configurable via the user-data payload specified at instance launch time. Here is an example user-data payload showing how to specify the snapshot ID:
EBS_VOL_FROM_SNAPSHOT_ID=snap-00000000
Note that the format of the user-data payload is compatible with the Running User-Data Scripts technique - just make sure the first line of the user-data payload begins with a hashbang #! and that the EBS_VOL_FROM_SNAPSHOT_ID setting is located somewhere in the payload, at the beginning of a line.

Launch an instance of the AMI with the user-data specifying the snapshot ID, in a different availability zone. The instance will be assigned an instance ID (in this example i-22222222) and a public IP address (in this example 2.2.2.2).

ec2-run-instances -z us-east-1c --key MyKeypair \
-d "EBS_VOL_FROM_SNAPSHOT_ID=snap-00000000" ami-012345678

ec2-describe-instances i-22222222

Once the ec2-describe-instances output shows that the instance is running, check for the new EBS volume that should have been created from the snapshot (in this example, vol-22222222) in the new availability zone.

ec2-describe-volumes

Finally, ssh into the instance and verify that it is now working from the new EBS volume:

ssh -i id_rsa-MyKeypair root@2.2.2.2

mysql -u root -p -e "show databases"

You should see the db_on_ebs database in the results. This demonstrates that the startup sequence successfully created a new EBS volume, attached and mounted it, and set itself up to use the MySQL data on the EBS volume.

Cleaning Up

Don't forget to clean up the pieces of this procedure when you no longer need them:

# the original instance
ec2-terminate-instances i-11111111
# the original EBS volume
ec2-delete-volumes vol-00000000
# the instance that created a new volume from the snapshot
ec2-terminate instances i-22222222

If you set up the shutdown hook to delete the EBS volume then you can verify that this works by checking that the ec2-describe-volumes output no longer contains the new EBS volume. Otherwise, delete it manually:

# the new volume created from the snapshot
ec2-delete-volumes vol-22222222

And don't forget to un-register the AMI and delete the files from S3 when you are done. These steps are not shown.

Making Changes to the Configuration

Now that you have a configuration using EBS snapshots which is easily scalable to any availability zone, how do you make changes to it?

Let's say you want to add a web server to the AMI and your web server's static content to the EBS volume. (I generally don't recommend storing your web-layer data in the same place as your database storage, but this example serves as a useful illustration.) You would need to do the following:
  1. Launch an instance of the AMI specifying the snapshot ID in the user-data.
  2. Install the web server on the instance.
  3. Put your web server's static content onto the instance (perhaps from S3) and test that the web server works.
  4. Stop the web server.
  5. Move the web server's static content to the EBS volume.
  6. "mount --bind" the EBS locations to the original directories without adding entries to /etc/fstab.
  7. Restart the web server and test that the web server still works.
  8. Edit the startup script, adding entries for the web server's directories to EBS_EXPORTS.
  9. Stop the web server and unmount (umount) all the mount bind directories and the EBS volume.
  10. Remove the mount bind and /vol entries for the EBS exported directories from /etc/fstab.
  11. Perform the cleanup prior to bundling.
  12. Bundle and upload the new AMI.
  13. Create a new snapshot of the EBS volume.
  14. Change your deployment configurations to start using the new AMI and the new snapshot ID.
If you decide that you would like the automatically-created EBS volumes to be deleted when the instances terminate, you have two ways to do this:
  • Execute this command
    sudo ln -s /etc/init.d/create-ebs-vol-from-snapshot-and-mount \
    /etc/rc0.d/K32create-ebs-vol-from-snapshot-and-mount
    and rebundle the AMI.
  • Pass the above command to the instance via a user-data script. The user-data could also specify the snapshot ID, and might look like this:
    #! /bin/bash
    EBS_VOL_FROM_SNAPSHOT_ID=snap-00000000
    ln -s /etc/init.d/create-ebs-vol-from-snapshot-and-mount \
    /etc/rc0.d/K32create-ebs-vol-from-snapshot-and-mount
The technique of mounting an EBS volume created from a snapshot at startup was born of necessity: I needed a way to allow many instances across availability zones to share the same content which lives on an EBS drive. This article shows how you can apply the technique to your deployments. If you also find this technique useful, please share it in the comments!

Thanks

Eric Hammond reviewed early drafts of this article and provided valuable feedback. Thanks!