Sunday, July 12, 2009

The "Elastic" in "Elastic Load Balancing": ELB Elasticity and How to Test it

Elastic Load Balancing is a long-anticipated AWS feature that aims to ease the deployment of highly-scalable web applications. Let's take a look at how it achieves elasticity, based on experience and based on the information available in the AWS forums (mainly this thread). The goal is to understand how to design and test ELB deployments properly.

ELB: Two Levels of Elasticity

ELB is a distributed system. An ELB virtual appliance does not have a single public IP address. Instead, when you create an ELB appliance, you are given a DNS name such as Amazon encourages you to set up a DNS CNAME entry pointing (say) to the ELB-supplied DNS name.

Why does Amazon use a DNS name? Why not provide an IP address? Why encourage using a CNAME? Why can't we use an Elastic IP for an ELB? In order to understand these issues, let's take a look at what happens when clients interact with the ELB.

Here is the step-by-step flow of what happens when a client requests a URL served by your application:
  1. The client looks in DNS for the resolution of your web server's name, Because you have set up your DNS to have a CNAME alias pointing to the ELB name, DNS responds with the name
  2. The client looks in DNS for the resolution of the name This DNS entry is controlled by Amazon since it is under the domain. Amazon's DNS server returns an IP address, say
  3. The client opens a connection with the machine at the provided IP address The machine at this address is really an ELB virtual appliance.
  4. The ELB virtual appliance at address passes through the communications from the client to one of the EC2 instances in the load balancing pool. At this point the client is communicating with one of your EC2 application instances.
As you can see, there are two levels of elasticity in the above protocol. The first scalable point is in Step 2, when Amazon's DNS resolves the ELB name to an actual IP address. In this step, Amazon can vary the actual IP addresses served to clients in order to distribute traffic among multiple ELB machines. The second point of scalability is in Step 4, where the ELB machine actually passes the client communications through to one of the EC2 instances in the ELB pool. By varying the size of this pool you can control the scalability of the application.

Both levels of scalability, Step 2 and Step 4, are necessary in order to load-balance very high traffic loads. The Step 4 scalability allows your application to exceed the maximum connections per second capacity of a single EC2 instance: connections are distributed among a pool of application instances, each instance handling only some of the connections. Step 2 scalability allows the application to exceed the maximum inbound network traffic capacity of a single network connection: an ELB machine's network connection can only handle a certain rate of inbound network traffic. Step 2 allows the network traffic from all clients to be distributed among a pool of ELB machines, each appliance handling only some fraction of the network traffic.

If you only had Step 4 scalability (which is what you have if you run your own software load balancer on an EC2 instance) then the maximum capacity of your application is still limited by the inbound network traffic capacity of the front-end load balancer: no matter how many back-end application serving instances you add to the pool, the front-end load balancer will still present a network bottleneck. This bottleneck is eliminated by the addition of Step 2: the ability to use more than one load balancer's inbound network connection.

[By the way, Step 2 can be replicated to a limited degree by using Round-Robin DNS to serve a pool of IP addresses, each of which is a load balancer. With such a setup you could have multiple load-balanced clusters of EC2 instances, each cluster sitting behind its own software load balancer on an EC2 instance. But Round Robin DNS has its own limitations (such as the inability to take into account the load on each load-balanced unit, and the difficulty of dynamically adjusting the pool of IP addresses to distribute), from which ELB does not suffer.]

Behind the scenes of Step 2, Amazon maps an ELB DNS name to a pool of IP addresses. Initially, this pool is small (see below for more details on the size of the ELB IP address pool). As the traffic to the application and to the ELB IP addresses in the pool increases, Amazon adds more IP addresses to the pool. Remember, these are IP addresses of ELB machines, not your application instances. This is why Amazon wants you to use a CNAME alias for the front-end of the ELB appliance: Amazon can vary the ELB IP address served in response to the DNS lookup of the ELB DNS name.

It is technically possible to implement an equivalent Step 2 scalabililty feature without relying on DNS CNAMEs to provide "delayed binding" to an ELB IP address. However, doing so requires implementing many features that DNS already provides, such as cached lookups and backup lookup servers. I expect that Amazon will implement something along these lines when it removes the limitation that ELBs must use CNAMEs - to allow, for example, an Elastic IP to be associated with an ELB. Now that would be cool.

How ELB Distributes Traffic

As explained above, ELB uses two pools: the pool of IP addresses of ELB virtual appliances (to which the ELB DNS name resolves), and the pool of your application instances (to which the ELBs pass through the client connection). How is traffic distributed among these pools?

ELB IP Address Distribution

The pool of ELB IP addresses initially contains one IP address. More precisely, this pool initially consists of one ELB machine per availability zone that your ELB is configured to serve. This can be inferred from the following pages in the ELB documentation:
  • This page, which states that the ELB will "route traffic equally amongst all the enabled Availability Zones".
  • This page, which says that ELB "evenly distributes requests across all its registered Availability Zones that contain instances. As a result, you must ensure that your LoadBalancer is appropriately scaled for each registered Availability Zone."
I posit that the above behavior is implemented by having each ELB machine serve the EC2 instances within a single availability zone. Then, multiple availability zones are supported by the ELB having in its pool at least one ELB machine per enabled availability zone.

Back to the scalability of the ELB machine pool. According to AWS folks in the forums, this pool is grown in response to increased traffic reaching the ELB IP addresses already in the pool. No precise numbers are provided, but a stream of gradually-increasing traffic over the course of a few hours should cause ELB to grow the pool of IP addresses behind the ELB DNS name. ELB grows the pool proactively in order to stay ahead of increasing traffic loads.

How does ELB decide which IP address to serve to a given client? ELB varies the chosen address "from time to time". No more specifics are given. However, see below for more information on making sure you are actually using the full pool of available ELB IP addresses when you test ELB.

Back-End Instance Connection Distribution

Each ELB machine can pass through client connections to any of the EC2 instances in the ELB pool within a single availability zone. According to user reports in other forum posts, clients from a single IP address will tend to be connected to the same back-end instance.

AWS folks say that ELB does round-robin among the least-busy back-end instances, keeping track of approximately how many connections (or requests) are active at each instance but without monitoring CPU or anything else on the instances. I'm not sure how to reconcile this with the user reports mentioned above.

How much variety is necessary in order to cause the connections to be fairly distributed among your back-end instances? AWS says that "a dozen clients per configured availability zone should more than do the trick". Additionally, in order for the full range of ELB machine IP addresses to be utilized, "make sure each [client] refreshes their DNS resolution results every few minutes."

How to Test ELB

Let's synthesize all the above into guidelines for testing ELB:
  1. Test clients should use the ELB DNS name, ideally via a CNAME alias in your domain. Make sure to perform a DNS lookup for the ELB DNS name every few minutes. If you are using Sun's Java VM you will need to change the system property to be different than the default value of -1, which causes DNS lookups to be cached until the JVM exits. 120 (seconds) is a good value for this property for ELB test clients. If you're using IE 7 clients, which cache DNS lookups for 30 minutes by default, see the Microsoft-provided workaround to set that cache timeout to a much lower value.
    Update 14 August 2008: If your test clients cache DNS lookups (or use a DNS provider that does this) beyond the defined TTL, traffic may be misdirected during ramp-up or ramp-down. See my article for a detailed explanation.
  2. One test client equals one public IP address. ELB machines seem to route all traffic from a single IP address to the same back-end instance, so if you run more than one test client process behind a single public IP address, ELB regards these as a single client.
  3. Use 12 test clients for every availability zone you have enabled on the ELB. These test clients do not need to be in different availability zones - they do not even need to be in EC2 (although it is quite attractive to use EC2 instances for test clients). If you have configured your ELB to balance among two availability zones then you should use 24 test clients.
  4. Each test client should gradually ramp up its load over the course of a few hours. Each client can begin at (for example) one connection per second, and increase its rate of connections-per-second every X minutes until the load reaches your desired target after a few hours.
  5. The mix of requests and connections that each test client performs should represent a real-world traffic profile for your application. Paul@AWS recommends the following:
    In general, I'd suggest that you collect historical data or estimate a traffic profile and how it progresses over a typical (or perhaps extreme, but still real-world) day for your scenario. Then, add enough headroom to the numbers to make you feel comfortable and confident that, if ELB can handle the test gracefully, it can handle your actual workload.

    The elements of your traffic profile that you may want to consider in your test strategy may include, for example:
    • connections / second
    • concurrent connections
    • requests / second
    • concurrent requests
    • mix of request sizes
    • mix of response sizes
Use the above guidelines as a starting point to setting up a testing environment that exercises you application behind an ELB. This testing environment should validate that the ELB, and your application instances, can handle the desired load.

A thorough understanding of how ELB works can go a long way to helping you make the best use of ELB in your architecture and toward properly testing your ELB deployment. I hope this article helps you design and test your ELB deployments.


  1. Nice, saves me a lot of digging around!

  2. Very informational post about Elastic Load Balancing.

    Specific Setup Question:
    When I'm trying to use this load balancer with a medium-sized Sharepoint farm (2 web servers, 1 sql server), the end-user is not able to be authenticated by the domain controller. If only one of the web servers is connected to the load balancer, then the user is authenticated and it doesn't matter which server is the one connected. But as soon as the second server is added, authentication fails.

    Does this have anything to do with using a CNAME vs a regular A entry?

  3. @Bruhaha

    I don't believe your issue has anything to do with DNS CNAME or A records. If it did, the problem would happen even when only one back-end server is in service.

    I'm not too familiar with Sharepoint, so I'm sorry I can't help on that.

  4. @Shlomo
    OK, that sounds right.

    Thanks for the help.

  5. Awesome description of working of ELB.

  6. I was bit this morning by a "feature" of elastic load balancing. If you run in multiple availability zones, be sure to have at least 1 working instance in each, otherwise you'll get some failing requests because the round robin DNS check doesn't eliminate zones that have no instances running.

  7. @Egg

    Yes - as I mentioned in my comment on your blog, that is actually a documented characteristic of ELB.

  8. A lookup table for client IP addresses in the ELB might explain why a client connection could appear to have an affinity/stickiness to a backend server. The ELB might only incur the expense of determining to which server to route a request, if it had not recently seen the connecting IP address. This would allow higher throughput for the ELB.

    It's a reasonable shortcut if all backend servers process requests equally quickly and connections/session, when averaged, last roughly the same amount of time on each server. You might get interesting results by mixing small and extra large instances behind the same ELB.

  9. Hi Shlomo,

    Thank you for this wonderful article!!! I am a newbie to ELB. I would like to clarify certain things with you.

    1. According to amazon each ELB will have a DNS name instead of an elastic ip which we should configure by adding a CNAME entry. could you please explain how to make it technically possible of attaching an elastic ip to ELB?

    2. Regarding the step 2 scalability, once the inbound network network traffic capacity has reached, will a new load balancer be automatically created? if so will those instances that were registered with the first load balancer be automatically registerd with the new one?

    3. could you let me know any kind of testing tool that will check HOW MANY REQUESTS can an instance handle at a time? ie at what point will the load balancer share the load from one instance to the next one?

    Thanks in advance!!!!!



  10. @vinod

    There is no way to associate an Elastic IP with an ELB today.

    Regarding question #2 about inbound network capacity and the back-end instances: An ELB has a pool of back-end instances. The ELB automatically scales itself to handle the incoming traffic and distribute it to the back-end instances. Once you have created an ELB and assigned it a pool of back-end instances, you do not have to create any more ELBs for those back-end instances - the ELB will automatically scale by itself to handle the incoming traffic. This does not involve a new ELB, just some new behind-the-scenes stuff inside Amazon. The behind-the-scenes stuff is what I described in this article.

    Regarding question #3: There is no general answer to "how many requests can an instance handle at a time". This depends on the type of instance (because each instance type has different CPU and network characteristics) and on the applications that you are running on the instance. You need to test this yourself to find the answer for your application and instance type. There are many many tools available to help you measure this. Google is your friend.

    Regarding "share the load from one instance to the next": ELB spreads out the incoming requests approximately evenly among the back-end instances, not in a "cascading" style that you describe. The only thing you need to consider is how many back-end instances to give to the load balancer in order to handle the incoming load. One approach is to simply do the math (max desired traffic / max traffic per instance = # back-end instances to put into the ELB). The more cost-effective approach is to use Auto Scaling to automatically launch (and terminate) back-end instances for the ELB according to demand. For more details about Auto Scaling please consult the docs here:

  11. Does anyone have the problem that a domain name root cannot be a CNAME (e.g., to the load balancer)?

    so, how do you get to proxy the AWS loadbalancer which requires a CNAME? Point to an IP address and redirect that traffic to WWW ?

    I'm no network engineer, but 1 static IP representing a single instance taking millions of hits in a few seconds, even if only redirects, sounds like a bottleneck!

    Is there something I'm missing?

  12. @visualplant,

    The only solutions available today for using ELB with the root domain are workarounds.

    You can spread the traffic hitting the root domain (for redirection to by using round-robin DNS pointing to multiple machines (each of which performs the redirect).

    There is another intriguing workaround suggested by M. David Peterson here:
    I haven't tried it, though.

  13. hello Shlomo,
    A very informative and well written post, I am very thankful for that.

    I just need to know you views on the following

    We are trying to achieve something of this order... 1 Million Requests per sec (1 KB size per request) == 1 gByte/sec.
    * Can Amazon allow such a limit or in specific can it possible to achieve that with step 2 (ofcourse with proper backend on step4.)
    * Can this be achieved with a single availability zone?
    * Also we want to check on the concurrent connection limits. What is the limit of concurrency that Amazon supports?
    * Can step 2 help us scale that as well.

    Hoping to hear your comments on this.. will surely be of much help to me.

    with regards


  14. @Ramchandra,

    1 GB/sec is not a lot and it is definitely possible to do with ELB. I've seen ELB handle more. I don't know what the limit is - if there is one, because there might not be a limit on the ELB side. The limit might be the number of back-end instances you can launch.

    There's no problem with putting all the back-end instances in the same availability zone.

    Concurrent connections are also not really limited by the ELB, but by the back-end instances.

    If your traffic patterns match those that ELB was designed for (i.e. gradually ramping-up) then it should be able to scale to handle that traffic.

  15. Excellent blog, very informative.

    I however, being an EC2 newbie, would like to ask a quick question regarding bandwidth.

    Your explanation on how requests are handed off to instances (ie the client is eventually connected directly with an instance behind the ELB) suggests that max bandwidth should be increased when increasing the number of instances behing the ELB. Does this sound correct?

    What concerns me is that I've read several forum posts etc that seem to suggest this is not the case. And that bandwidth is limited to the bandwidth of the ELB itself and the number of instances behind is irrelevant. This doesn't sound right to me, hopefully they're wrong.

    Bandwidth is going to be very important for me, as my app will have a very heavy streaming element. So I am trying to determine whether or not I can increase bandwidth by simply increasing the number of instances I have behind my ELB.

  16. @Matthew,

    I did not mean to imply that connections are "handed off" to back-end instances - they're not, the traffic is "passed through" to the back-end instances via the ELB. Please let me know where I might clarify the article.

    The theoretical bandwidth of ELB is unlimited - it will keep on scaling as long as the traffic keeps ramping up. So the overall bandwidth that your system will be able to handle is a direct result of the number of instances you put behind the ELB. As I described above, you'd need to test ELB carefully to make sure you're actually reproducing the conditions under which ELB was designed to scale. It's possible (and I've suggested as much in the forum) that the forum posts describing bandwidth limitations did not test the ELB properly and therefore hit a "faux" limit.

    Unlike ELB, software load balancers running on an EC2 instance (and hardware load balancers in a data center) cannot scale beyond the bandwidth of the network connection feeding into them.

  17. "4.The ELB virtual appliance at address passes through the communications from the client to one of the EC2 instances in the load balancing pool. At this point the client is connected with one of your EC2 application instances"

    This is the part which I originally took to mean connections are handed off. But on reading it again after your reply, it makes a little more sense.

    The main thing is that it sounds like as long as I configure things correctly, the ELB will suit my requirements. Thanks again for the article!

  18. @Matthew,

    Thanks for pointing that out - I've edited it to say "at this point the client is communicating with one of your EC2 application instances". I hope that makes it clearer.

  19. Hello Shlomo,
    This might be a unrelated question to this post, but since you are a EC2 expert, hope you can shed light on this.

    I dont know is it only my observation or in general people have seen this.

    I was configuring some clusters on the EC2 and most importantly was putting up a typical mysql master-slave replication based cluster. As I was doing it, I saw that the every instance created was given a private IP from different subnets. When I did some basic network tests including ping round-trip test, I saw the network was fluctuating a lot and at time was miserable. This also reflected on the performance of the cluster, because they were made to communicate over private IPs.

    So wanted to know:
    1. I guess there is no way you can ask for private IPs to be from the same subnet (except on VPC).
    2. What I am seeing here, is it a normal phenomena on the Amazon?
    3. If it is normal, how does one mitigate these issues. Cause with these basic network issues, performance at the backend is not at all good.

    Hoping that you will shed some light on this.

  20. @Ramchandra,

    You ask interesting questions.

    1. There is no way to request a specific subnet for your instance's private IP.

    2. It's normal for traffic between instances to experience fluctuation, especially when measuring ping time: The EC2 network de-prioritizes ICMP communications (as per the comment from Cindy@AWS in this thread: ). Note that the SLA does not guarantee a minimum bandwidth or network latency.

    3. First we need to determine if network latency is improved or not between instances on the same subnet. I did a little experiment and launched 20 instances in a single request. The private IPs I was assigned were scattered across three different subnets.

    We can run tests to determine the average network latency & speed across instances on the same and on different subnets. If these tests indicate that network is better within the same subnet then you can launch four or five times as many instances as you need, and terminate those that are not in the same subnet.

    Please share the results of any tests you do to determine the common subnet's effect on network latency.