Whoa! My back-end instances work but not the ELB!
Hey! I can't get the ELB to work!
These are among the most common Elastic Load Balancer problems raised on the Amazon EC2 Discussion Forums. Inspired by Eric Hammond's indispensible article Solving "I can't connect to my server on Amazon EC2", here is a helpful guide to debugging these common ELB issues, as well as a utility to perform sanity tests on your own ELBs.
Questions to Answer
You're trying to figure out what's wrong and you need to know where to start looking. Or, you're posting your problem on the AWS forums and you want help as quickly as possible. The best way to help yourself or to get help quickly is to examine the basic facts of your situation. Here are some questions to answer for yourself and in your forum post:
- What is the output of
elb-describe-lbs elbName --show-xml? This gives the basic details of the ELB, which are critical to diagnosing any problem. If you are posting to the forums and want to keep the DNS name of the ELB private then obscure it in the output. One reason to obscure the DNS name is to prevent readers from accessing your ELB-based service. However, this precaution does not add any security because the DNS information is public, and - presumably - you are using a DNS CNAME entry to integrate the ELB into your domain's DNS.
- What is the output of
elb-describe-instance-health elbName? This provides crucial information about the health of the instances.
- What resource are you trying to access via the ELB and what tool are you using to access it and from what location? The resource will likely be a URL of the form
https://ELB-DNS-Name/index.html, or it might be "I'm running a POP server on port 1234". The tool you're using to access it is most likely a browser or HTTP client (Firefox, or wget), or possibly "Microsoft Outlook version 5.4". The location is either "my local machine" or "an EC2 instance". Also, can you access the same resource when you connect directly to a back-end instance via its public IP address or host name from a client outside EC2? A public-facing URL pointing directly to a back-end instance looks like this:
http://ec2-123-213-123-31.compute-1.amazonaws.com/index.html. And, can you access the same resource when you connect directly to a back-end instance via its private IP address or host name from another instance within EC2? Such a URL looks like this:
- Can you access the health check resource directly via the ELB DNS name, and via the back-end instance's public IP address, and via the back-end instance's private IP address? If your health check is configured with
target=HTTP:8080/check.htmlthen try to access
http://ELB-DNS-Name:8080/check.html(which is via the ELB) and
http://ec2-123-213-123-31.compute-1.amazonaws.com:8080/check.html(which is via the instance's public IP address) and
http://domU-12-31-34-00-69-B9.compute-1.internal:8080/check.html(which is via the instance's private IP address, and only accessible from within EC2).
- What are the security groups and availability zones for each instance in the ELB? This is visible in the output of
ec2-describe-instances i-11111111 i-22222222 ...As above, you might want to obscure the public and private DNS names of these instances in the output.
- Can all the back-end instances receive traffic on the instance ports of the ELB listeners and the health check? This can be checked from the output of
ec2-describe-group groupName1 groupName2 ...for all the groups shown in question 5's
- Do logs on your back-end instances show any connections from ELB?
Common ELB Problems
Okay, now that you know what information is important to diagnosing the problem, here is a look at some of the common gotchas, how to detect them, and how to fix them. These descriptions refer to the above questions by number.
Common problems and solutions include:
- Security groups on back-end instances don't allow access to the instance ports and health check port. Back-end instances must have all ports on which they receive traffic from the ELB (#1) open to CIDR 0.0.0.0/0 in one of their associated security groups (#6). Fix this by changing the permissions on the security groups associated with the instances. Note: this fix takes effect within a few seconds and does not require launching new instance or rebooting existing instances.
- Back-end instances are not healthy (
InService). When an instance fails the health check (#1) it is marked as
OutOfService(#2) and the ELB does not route traffic to it anymore. To fix this you need to determine why the ELB cannot access the health check resource. Note: there is currently a bug in ELB where instances initially are marked as
InServicewhen added to the ELB, until they fail the health check. So you'll want to make sure you've given ELB enough time to detect a failed health check.
- An availability zone is enabled on the ELB but has no healthy back-end instances. If you have an availability zone enabled for your ELB (#1) but no healthy instances in that availability zone (#5 and #2), you'll get 503 Gateway Timeout or other errors. Fix this by adding an instance in that availability zone to the ELB or disabling that availability zone for the ELB.
- You cannot see a requested resource (#3) or the health check URL (#4) using the ELB DNS name. In this case, check that the URL exists on the back-end instances and look at the back-end instance's logs (#7) to see if the ELB forwarded your connection or not. If you can see the requested resource using the public address of a back-end instance then check the instance's security groups (#6) to see that they grant access to the instance's port.
- The health check port is not the same as listener target port (#1). While this does not necessarily indicate a problem, for most ELBs the health check should use the same port as one of the listeners. Setting up your ELB to have a health check performed on a different port than the load-balanced traffic is perfectly valid, but you likely want the health check to use the same path that the load-balanced traffic takes to reach your app (and also to exercise a representative set of features used by your app).
An ELB Sanity Test Utility
If you have your thinking cap on you'll notice that detecting the first three of the common ELB problems can be automated. Here is an ELB sanity test utility for linux which automates these tests. Save it or download it as follows:
curl -o elb-sanity-test.tar.gz -L https://sites.google.com/site/shlomosfiles/clouddevelopertips/elb-sanity-test.tar.gz?attredirects=0
Next, unpack it:
tar xzf elb-sanity-test.tar.gz
Next, set up the utility with your credentials. Edit the
elb-sanity-testscript file, setting
AWS_CREDENTIAL_FILEto point to a file containing your AWS credentials in the following format:
The above is the same format that can be used to specify your AWS credentials for the ELB API Tools (see the
credential-file-path.templatefile in the ELB API Tools bundle).
To run the ELB sanity test:
Here is sample output showing an ELB that passes the sanity test:
The elb-sanity-test utility performs the following sanity tests on every ELB defined in your account:$ ./elb-sanity-test
JUnit version 4.5
Test: all instances have their Security Groups defined to allow access to the ELB listener port
Load Balancer: someLB
ELB someLB has a listener that uses instance-port 8080 and instance i-360ef05e has that TCP port open to the world.
ELB someLB has a listener that uses instance-port 8081 and instance i-360ef05e has that TCP port open to the world.
Test: all ELBs have a HealthCheck on a port that the listener directs traffic to
Load Balancer: someLB
ELB someLB has a configured HealthCheck on listener port 8080
Test: all ELBs have InService instances in each configured availability zone
Load Balancer: somLB
ELB someLB has InService instances in each configured availability zone
Tests run: 3, Failures: 0
- All instances have their security groups defined to allow access to the ELB listener port.
- All ELBs have a health check on a port that the listener directs traffic to.
- All ELBs have healthy instances in each configured availability zone.
Some notes about the elb-sanity-test bundle:
- The utility is written in Java, which is also required for the ELB tools. If you can run the ELB API Tools, you already have all the prerequisites to run this sanity test.
- The bundle includes source code and is licensed under the Apache License, Version 2.0.
- The bundle includes all dependency jars necessary to run the script. It uses the JUnit framework and the Typica library.
Getting Further Help
If you still have an ELB issue after trying the above advice and the elb-sanity-test utility, please post in the AWS EC2 forum. Questions about the elb-sanity-test utility specifically or about this article are welcome in the comments below.
Update 15 September 2009: Ylastic integrated my elb-sanity-test script into their EC2 management dashboard.
Update 11 October 2009: elb-sanity-test has been released as part of the open-source ec2-elb-tests project hosted on Google Code. And, if you use this utility, please subscribe to the ec2-elb-tests Google Group.