What is Load Balancing?
In today’s digital era, effective management of the high-volume traffic in websites/web servers has become a very crucial factor. Every web server has its processing and memory limitations. If a website/web application receives too many requests, it just becomes overloaded.
The only remedy to this problem is to maximise the number of web servers and to ensure that the traffic gets distributed equally. Such a uniform distribution of the web traffic among various servers expertly eliminates the performance bottlenecks. Load Balancers or load balancing servers play the primary role for such traffic distribution.
A modern high-traffic website process millions of concurrent requests from users or clients and return the correct images, video, text or application data in a rapid and an effective manner. The load balancing architecture primarily includes a pool of backend servers. The incoming network traffic is effectually distributed across this group of backend servers by a load balancer. This backend server group is referred to as the server pool.
A load balancer acts as a master and settles in front of the servers. It expertly routes the incoming request(s) amongst various servers that are capable of processing those requests and ensures to maximise the speed and capacity utilization. It also makes sure that no server is over-worked which could hamper the performance. If any single server goes down, the load balancer automatically routes the traffic to the remaining up-servers. If a new server is added to the group, the load balancer starts sending requests to it too.
In this manner, the load balancing servers perform the below functions and significantly work to minimize the incoming network traffic.
- Distribute the incoming requests or network load effectually among multiple servers and work to enhance the performance.
- Ensures that the websites always remain available to accept requests. It also makes sure that the requests are transferred only to the online servers.
- It also guarantees flexibility to add or subtract servers as per demand.
Load Balancing Techniques
There are numerous load balancing techniques and load balancing algorithms used to effectively distribute the client traffic among multiple servers. The technique employed depends on the type of application being used and the status of the network at the time of request.
- The Least Connection Method: A new request is transferred to the server having the least current connections to clients. The relative computing capacity of each server is determined to figure out the server with the least connections.
- The Round Robin Method: In this method, the load balancer transfers requests to the server groups in sequential order. It sends a request to a server and then moves on to the next server. This process is repeated in a round-robin manner. This method is highly useful if all the servers have the same kind of processors and memories.
- The Least Response Time Method: In this method, both the connections and response time are taken into account. Here the server with the least connection and the lowest average response time is given the top priority. This method is highly useful when the processing power and memory resources of the servers are uneven. The most powerful server is utilized first while depending on the weaker servers in case of high volume requests.
- IP Hash Method: In this method, the load balancer selects the server based on the visitor’s IP address hash. This method uses the source and destination addresses of the client and the server and generates a unique hash key. Using this key, the client request is transferred to a particular server. Even if the session is broken, the client request is sent to the same server it was using previously. This method is useful if there exists a need to connect to the same session even after disconnection. For example, if there is a need to retain items in a shopping cart between sessions.
- The Custom Load Method: In this method, the load balancer selects a server that is not processing any active transactions. If all the servers in the load balancing setup are processing active transactions, the server with the least load is selected.
- Weighted Round-Robin Method: The weighted round-robin method is an enhanced model of the simple round-robin method. In this method, a static numerical rating is assigned to every single server in the server pool. The server which holds the higher ratings gets more requests assigned to them.
- Weighted Least Connection: This method is a modified version of the least connection method. Similar to the weighted round-robin method each server is assigned a numerical value. The load balancer assigns requests to the server based on this value. If two servers have a similar number of active connections, then the server with the highest weighting will be assigned the new request.
- Agent-Based Adaptive Load Balancing: Every server in the server pool has an agent assigned which reports the server’s respective load to the load balancer. This real-time input is used to decide the best possible server to transfer a request. This is used in conjunction with other balancing methods like the weighted round-robin and weighted least connection methods.
- Chained Failover: In this method, the servers are configured in a chain in a predetermined order. All requests are initially transferred to the first server in the chain. When the first server can no longer accept more requests, then the second server is chosen. When the second server becomes full, the third one is selected, and so on.
- Weighted Response Time: The response information provided by a health check is used to find out the server that is responding fastest at a particular period. The subsequent client requests will be transferred to that server. This method ensures that a heavily loaded server and a server that responds slowly are not forwarded the new requests. This ensures that the load gets evened out over time in all servers present in the server pool.
- Layer 7 Content Switching: This method is also known as URL Rewriting and it uses data from the application layer to combine lower-level network switching operations. The load balancer makes use of the application layer information and routes the packets in real-time to the server that is best suited to process the requests.
- Global Server Load Balancing (GSLB): GSLB does not manage the traffic but balances the GSLB requests. It balances the requests by using algorithms such as round-robin, weighted round-robin, fixed weighting, real server load, location-based proximity, etc. High availability is ensured by the presence of multiple data centres. If the primary site is down, the traffic is routed to the disaster recovery site. Clients can connect to the fastest performing geographically closest data centre. Application health checks ensure that offline services or unavailable data centres are not visible to clients.
- AD Group-Based Traffic Steering: When client traffic is initiated by the users, it can be steered to individual Real Servers in a Virtual Service (VS) based on Active Directory (AD) group membership. For example, if a virtual service has 4 Real Servers where 2 real servers are configured to have an alliance with the AD Group 1 and the other 2 servers with the AD Group 2. When a user attempts to access the VS, their group membership is checked and the request is transferred to the appropriate server. If real servers are chosen on the basis of group membership are unavailable, the default scheduling method for the VS is used.
- Software Defined Networking (SDN) Adaptive: In this method, the knowledge of the upper networking layers is combined with the information about the state of the network at lower layers. The knowledge about the data in layers 4 & 7 is combined with the information about the data in layers 2 & 3 of the network. The requests are processed based on these details. Information about the server status, the status of the applications running on them, the health of the network and the congestion level in the network play a crucial part in the decision-making process of the load balancing.
Load Balancer NGINX
NGINX is a popular web server software that is installed to enhance the server’s resource availability and efficiency. In any load balancer, NGINX normally acts as a single entry point to a distributed web application executing on several separate servers.
To configure NGINX as a load balancer in the HTTP section, it is required to specify a set of backend servers with an upstream block. The next step is to send the above requests to the servers.
NGINX uses a couple of strategies to select a server to transfer requests. Round Robin algorithm is used by default to decide which server needs to be selected to transfer requests. It also employs other strategies like least_conn strategy, ip_hash strategy, etc. Any strategy other than the default is employed by simply including its name in the upstream block.
NGINX also passively performs a health check when requests are transferred to the servers. If a server fails to deliver a response, NGINX automatically stops sending requests to this server for a certain time. The number of consecutive unsuccessful attempts within a certain period can be determined by setting a predefined value to the parameter max_fails. By default, this value is set to 1. If this value is set to 0, no health check is performed to that particular server. If this value is greater than 1, subsequent fails must occur within a specific time frame to count the number of failed attempts.
This time-frame is controlled by the fail_timeout value. The fail_timeout also determines how long the server should be retained as failed. The default value is 10 seconds for this parameter. After a server is prescribed failed and the time set in the fall_timeout has passed, NGINX will automatically start transferring request to the server. If the server returns a positive response, it is again marked live and once again considered as a normal server in the load balancing process.
Such health check lets the server back-end to easily adapt to the current requirements by powering up or down hosts as per demand. Execution of the newly added servers during heavy traffic enhances the application performance and the new resources get automatically augmented to the load balancer.
Load Balancing Router
The process of load balancing and sharing of load in a network with several internet connectivity options or network link resources is achieved by using a load balancing router. It augments the cumulative bandwidth speed of multiple connections to deliver a unified internet connection, thereby minimizing latency while transferring, sharing and shuffling network bandwidth.
The overall performance and the network bandwidth speed are optimized by the load balancing router by using specific methods such as bandwidth aggregation. Such methods effectively combine the bandwidth capacity of DSL, cable, T1 or any other internet connection.
Either the overall traffic is dynamically distributed across each connection or each connection is manually configured. This type of load balancer configuration is implemented in the router interface which augments a specific service with a single internet connection. The load balancing router also ensures redundancy by diverting internet connections between networks in case of a failed internet connection. Also, certain load balancing routers have the ability to learn, track, use and transfer between the best available network paths.
AWS Load Balancer
Amazon Web Services has developed Elastic Load Balancer (ELB) that effectually distributes incoming traffic requests across several targets such as Amazon EC2 instances, containers, IP addresses and Lambda functions. The ELB processes the varying load of the traffic requests in a single Availability Zone or across multiple availability zones. There are three types of load balancers available in ELB that include the high availability, automatic scaling and effective security features that help to make your applications more tolerant.
- Application Load Balancer: Application Load Balancer is most ideal for the load balancing of HTTP and HTTPS traffic. It operates at the individual request level (Layer 7) and diverts traffic to targets within Amazon Virtual Private Cloud (Amazon VPC) based on the content of the request.
- Network Load Balancer: Network Load Balancer is suited for load balancing of Transmission Control Protocol (TCP), User Datagram Protocol and Transport Layer Security (TLS) which demands maximum performance. It operates at the connection level (Layer 4) and diverts traffic to targets within Amazon Virtual Private Cloud (Amazon VPC). It has the capacity to handle millions of requests per seconds while retaining ultra-low latencies. It is also capable of managing sudden and volatile traffic patterns.
- Classic Load Balancer: Classic Load Balancer performs preliminary load balancing features across multiple Amazon EC2 instances and executes at both the request level and connection level. It is mostly built for applications that were formulated within the EC2-Classic network.