Microsoft: How to fix Load Balancer not working in Round-robin fashion for your Cloud Service
I recently came across a case where customer has an issue that the Load balancer of his cloud service is not working in round robin basis. He confirmed that by seeing more number of requests going to few instances whereas the other instances are getting very less number of requests. To analyse the issue, we have collected the IIS logs from all the instances( Customer has 7 instances) and when we parsed the logs, we could be able to confirm the same that 2 instances are getting very less number of requests than the other 5 instances.
How Load Balancer works?
Load balancer uses 5 tuple algorithm to distribute client requests by default. This algorithm is a 5 tuple (source IP, source port, destination IP, destination port, protocol type) hash to map traffic to available servers. It provides stickiness only within a transport session. Packets in the same TCP or UDP session will be directed to the same datacenter IP (DIP) instance behind the load balanced endpoint. When the client closes and re-opens the connection or starts a new session from the same source IP, the source port changes and causes the traffic to go to a different DIP endpoint.
If most of the load goes to a single instance, the number one reason is due to the testing client creating and reusing the same TCP connections. The Azure loadbalancer does round robin load balancing for new incoming TCP connections, not for new incoming HTTP requests. So when a client makes the first request to the cloudapp.net URL, the LB sees an incoming TCP connection and routes it to the next instance in the LB rotation, and then the TCP connection is established between the client and the server. Depending on the client app, all future HTTP traffic from that client will may go over the same TCP connection or a new TCP connection.
via the fine folks at Microsoft