Sunday, February 9, 2014

Demystify HLB and DNS Load Balancing - Lync 2013 Topology with High Availability (POOLs, DNS LB vs HLB)


Lync 2013 Topology with High Availability (POOLs, DNS LB vs HLB)


© 2014 - Thomas Pött, MVP Lync
(modified 2015.08.18)

Download PDF: here

The most misunderstood topic in Lync is the high availability. This is regardless of server count, positioning or DNS versus hardware load balancing.

Note:
This document is neither a sizing nor a configuration guide. You should use this document only for your environment planning’s purposes.

In my first chapter I need to drive into the different understandings of high availability.


Client-to-Server and Server-to-Server traffic high availability:


First I need to explain some generic settings bout load balancing.

And one word before you read this article:
My personal recommended opinion is the DNS+HLB 1-armed solution, this makes trouble shooting much more simplified and do not require so much network knowledge for the Lync admins. On the Edge server recommend only 1-armed and purely HLB. 

Client-to-Server

The only part, were the both option DNS+HLB and HLB come into place is for the Client to Server traffic ONLY.

 

Server-to-Server

Lync Server traffic is much different. As we setup the server first in the topology, Lync servers are aware about the entire topology setup with in the CMS or XDS database. This source will be queried to load balance server-to-server traffic fully automatically. That’s one part we have later to consider with LB 2-armed solution.
 

Lync Server Roles entitled for High Availability:

Lync Server 2013 can be made high available by using redundant servers, which means we have to deploy multiple server in an array. This array in Lync topology is named POOL. A pool can have assigned 1 up to 12 server, regardless if we are talking about the different server roll, which are directors, frontends, mediations, group chat or edge servers. Beside Lync server, where the Lync application is installed on, we also have several other components, which I describe later in this passage.

Note:
You should also understand the difference between High Availability and Resiliency. Please do not mess around with this differences
 

POOL Servers

All four server rolls described next are entitled for POOL based high availability.

Director Server

The incoming, general SIP requests from external and internal Clients will be redirected to its responsible Frontend system, most likely a Pool. Authentication is part of this server too.

The Director Server host both, Lync Application and IIS for Web Services.
 

Frontend Server

The core component of Lync. It hosts the users and every of their responsible functions Lync offers. They also will always have collocate the Conferencing part. Beside this, Archiving and Monitoring truly its hosted as well.

Optional, if it makes sense, the Meditation feature could be added to this system too.
 

Edge Server:

Lync component for external SIP related traffic. This is one of the only servers, which are not responsible for any kind of Web Services Lync has to offer.

But what’s about this server? Well, Lync Edge Server is a standalone, Non-Domain Joined System, designed to act as a true Application Proxy and Application Reverse Proxy.
 

Let me repeat some basic requirement:
You MUST have 2x NICs and 2x dedicated/ separated/ isolated Networks and the internal DNS Suffix of your Active Directory Domain must be present, else Edge cannot work!
 

Mediation Server:

As described, the Meditation server can also be separated from the Frontend Server. It is responsible for translating SIP to ISDN (also SIP Trunk), second it also doing transcoding from Lync codecs into G.711.
 

Other Server:

All of the server named above are pool servers, which can be configured under a single, addressable name space. All other system Lync need to operate with are different and will have different requirements to operate in high availability mode.
 

SQL

The SQL server, also called “Lync Backend” can be operated in two high availability modes. One is the well-known cluster, where we have two SQL servers in an active-passive mode. The other option the better choice, SQL server mirroring. Mirroring exists with two SQL server, where the primary server hosts the active database and do log shipping of all its transaction to the second, independent SQL server. Therefor a logical independent database copy exists.

Why this is even better than a cluster? First, we have a second database copy, independent from the other and second, if you only operate the mirror server for Lync, you do not need to pay the SQL server license.
 

File Share

Lync also need file storage for several features. Here we are able to utilize the most system opportunities. We can simply cluster the file storage, or we make use of Distributed File System (DFS) and its replication technology.

Other third party solutions as NetAPP or EMC can be used to. But you need to ensure the correct Lync requirements are full filled.
 

WAC

Office Web Application (former Office Web Companion) the server rendering in Lync 2013 all Power Point presentations into HTML5. Regarding high availability, we make use of “FARMS”, a farm are multiple WAC server installed and joined the farm, receiving the identical configuration. Since it is a Web based service, we can load balance this servers. The two methods are: hardware load balancing, or windows load balancing. (Note: this is the only system, which is entitled for Windows Load Balancing)
 

Gateways

At last, we have the Audio transcoding and protocol changes to ISDN or others. This is for some the most complex systems if we are talking about high availability.

Since the Gateway cannot be deployed in a Lync related pool, we need to create another kind of system collection.

In Lync 2013 we have several components/ elements in the IP flow. Starting with the mediation server, the mediation server has two sites and one of them is associated with the ISDN gateway. This does not mean we need two dedicated NICs, but it is in some circumstances possible or even necessary. In between of this path, we associate both sites with a Lync 2013 TRUNK. This are the three components involved.

Now we have to opportunity for high availability with a backup gateway (alternative gateway), the other option native in Lync if we deploy multiple paths and the third option is a load balanced solution between the mediation server and the gateways as bidirectional communication flow.

 

Lync High Availability:


POOL:

Every Lync pool servers are capable of both load balancing solutions, as long a supported client is used. The supported clients are Lync 2010, Lync 2013 and the App Store, as well the mobile clients.

Before I dig into the both scenarios, lets have a quick discussion about the two relevant components in Lync. Beside the Lync applications, we have web related services. Those service are e.g. Address Book Distribution, Distribution Group resolution, Meet and Dialin page, or Lync Mobile Clients Autodiscover and others. We remember the setup of our external facing systems, in our case the Reverse Proxy. Here a special mapping of the related http port 80 and 443 is needed within the redirection process, the port mapping will be 80 -> 8080 and 443 -> 4443. Why is this so?

The Lync Web Services are running inside the IIS, due to security reasons, Lync segregates all web traffic onto internal and external web sites. Since we have only a single IP address on the Lync servers, the only possibility is if its using different ports. Therefore the external connections run on the Lync external site under port 8080 and 4443.

After this excurse let’s come back to the main topics. In other words, we talk about SIP, server function and web services in Lync. All of them have to be high available. While SIP can be considered as a so call “persistent” protocol, if we use simple load balancing mechanism like DNS Round Robin, a TCP connection will with the “supported” Lync client always stay with its initiated Lync server. Well this is a very simple and rudimentary explanation, but should be solvent enough for this article. I more like to high light the word “persistent”, means continuously established IP session. Web Services in general are different. Web service must not and cannot be persistent. This is because of the IP session will always be reestablished after an http request is finished. In Lync we need to submit web requests on a session basis, but how this can be done, if we e.g. use a round robin process? Each time a client would start a new request and the server is not aware that within the same session and we would hit another server in a pool, the information stream within this session would have got lost. Therefore web services must stick to the same server. To do this and enable the client hitting the same server again, a supporting technology must be in place. Here we have several options, cookie based sessions or source IP address affinity. Now, since we understand this requirements, the recommendation for hardware load balancing is clearer. In the same way we can now understand why Windows Load Balancing cannot be used with Lync pool. Windows load balancing is based on shared information within the WLB cluster and communicated via multi/ unicast. This is never sufficient for the requirements of Lync.
 

NOTE:
Windows Load Balancing can neither be used for Web service, nor Lync application services, even due to those services exist on the same server and cannot be separated.
 

Count of servers in a pool: 1, 2 or 3+ server in a pool

Very often I’m asked by other consultants: Why Microsoft recommends 3+ server in a pool.

I want to have a look what is happened to the pool with different server counts.

But this discussion is only related to Lync Frontend server pools, other pool follow the simple principals of at least 2 servers and +1 for each high availability user count. Say you have 10.000 users , so you need 2+1 Director server, since 1 server is allowed to fail until the load cannot be handled anymore. If you add more than +1 you increase the high availability but more related to SLAs since more server can fail.
Additional, the Pool Server make use of the Windows Fabric and Lync as its Fabric Groups combined into groups of THREE (3) servers. Therefore taking into consideration, the Windows Fabric also has its requirement auf 3 Lync Frontend server in a single Pool.
 

Lync Frontend server are different, for user load and associated quorum. Truly the server to user count is quite similar. If you have 30.000 user for a single pool, you need at least 3 server +1 or more for high availability.

What is if you say, we have only 7.000 users on a single pool?

1.) Single Frontend server in a pool
this single server is fully sufficient for 7k user and will run with this load. Also the necessary quorum is simple, it’s a single node system equivalent to what we understand from single node clusters.

2.) Dual Frontend server
this is a working configuration, even if its not recommended. This also does not mean it would be unsupported. BUT:
If we assign this solution, we need to keep in mind that under some circumstances we might run into issues. Those issues are probably happened, if one server fails and the left one is restarting. In this scenario when the server is restarted, the server is in an inconsistent state and will need up to 1 hour before the services will have started. The Lync Frontend quorum need to be rested.
Command: Reset-CsPoolRegistrarState –ResetType QuorumLossRecovery
As we can see, if we only install a pool of Frontend servers with 2 nodes, we need to be prepared handling cases like described.

3.) 3+ more Frontend server
This is the recommended setup for high availability. If you are planning so for customers you will have the best in class solution with all necessary automated processes and less impact during system failtures.
Disadvantage is still the cost factor, since the server license count is higher. And this solution is Windows Fabric compliant.

I will now have the look into the high availability technology required to support those setups.


Hardware Load Balancing


We identified three components, the both IIS web site and the Lync application. Therefore a full hardware load balancing solution will cover all components for Lync. Sure the setup of HLB is a little bit more complex and requires more port which need to be load balanced. Also the load balancer will have more load compared with a setup of only web services. Since the Lync application do not need to off-load TLS encryption to the HLB, the impact is quite less.

I personally prefer this way of setup due to a more clearer setup and we know also in case of issues with load balancing, where to look for issues. The support process is straighter forward.

Other to this is, if we setup a fully load balance solution, we are able to implement 1- or 2-amred setup of load balancers. 

Note:
Another advantage of hardware load balancing is, if you need to take a pool server offline, this is purely done on the HLB and do not require DNS changes, which also need time to be replicated.
 

DNS Load Balancing:

If you decide for DNS based load balancing, this setup make use of Lync application only. Due to the point we discussed more early, even if we decide for DNS based load balancing, the web services required a hardware load balancer.

For session based load balancing, it is a must that we have a look into the HTTPS data stream, which than require a SSL off-load to be configured.

If we do so, the shared pool server certificate must be copied and used on the hardware load balancer.

NOTE:
Web Services must be HLB, it is necessary to change the INTERNAL and EXTERNAL Web Services to a dedicated name. Example: PoolName: FEPOOL01.<ad fqdn>, internal Web Services: intFEPOOL01.<sip domain>, external Web Services: extFEPOOL01.<sip domain>

  • Use TCP idle timeout of 1800 seconds

Web Services External
  • Ports 8080 and 4443 for external web traffic
  • set cookie-based persistence on a per port basis
    • Cookies must not be marked httpOnly.
    • Cookies must not have an expiration time.
    • Cookies must be named MS-WSMAN.
Web Services Internal
  • Ports 80 and 443 for internal web traffic
  • set source_addr persistence
  • Exception for internal Lync Mobile clients » use cookie persistence instead
 

Specialties

Regardless with load balancing solution we chose, there are still more consideration we have to take in place. I separated this into three chapters
 

Generic Information Lync Mobility Services
Cookie-based persistence required for Lync Mobilityservices
  • Marked httpOnly, named MS-WSMAN and no expiration


Edge server

Edge pool server can also either be load balanced on hardware basis or via DNS. Important are here two points:

  1. If you decide for one of the both possibilities, you MUST stick to the same setup on both site of the Edge pool. Which means, the external and the internal site (NICs) must make use of the identical process.
  2.  DNAT Load Balancing (half-NAT) for A/V Edge (external) transparency
  3. If you decide for DNS based load balancing you have some restrictions:

This restriction apply if the “core” server of the pool will fail.
  • Exchange Unified Messaging:
If EUM is deployed and Exchange 2010 prior SP1 is used, the user will have no access to their voice mail
  • Office 365 and Public Instant Messaging (PIC):
  • Federation with partners with Office Communication Server 2007 / 2007 R2
The communication is broken, since both systems do not support DNS load balanced Edge pool. DNS load balancing was not available with OCS
 

Mixed of DNS+HLB and HLB

Beside of the provided information for Edge servers and the internal and external site, you are entitled to make use of both solutions with in your Lync topology.

As a simple example:
If you have a topology with 2x Edge, 2x Director and 3x Frontend servers, you can do a setup of DNS LB for the Edge server, DNS LB+HLB for the Directors and HLB for the Frontend server.

Also this scenarios are supported
 

1-Armed or 2-Armed solution:

Load balancer setup can vary in two different deployment, either 1-armed or 2-armed. The difference between both is how the IP flow has to run.

While in a 1-armed solution the IP traffic will first hit the LB and needs to be NATed so the Pool Server will reply to the Load Balancer (Source IP must be hidden, non-transparency).
IP traffic is NEVER a direct communication between the source and target.


In a 2-amred setup the IP flow is different, the source will always communicate with the LB and the LB than starts the communication, based on the algorism again, with the target pool server. Therefor the Load Balancer must be the Gateway assigned to the load balanced servers.
Here we can keep the Source IP Address, a transparent configuration.

A specialized setup if a LB between the PSTN site of a Mediation server and its associated gateways. I will describe this later.
 

Routing vs redirect
What is now this difference between a redirected and a routed setup? If we decide for the 1-armed solution, we enable the source for a direct traffic exchange with the target and we do not need a isolated network. The setup is more simplified compared with the 2-armed solution. Here we need the IP flow to be routed to its target.

This has a direct impact on the high availability setup we can chose. If a 2-armed is the choice, we cannot use DNS LB, this is because we can use multiple A records in DNS pointing to the different pool servers, because we can address them directly due to the LB with acts as a router.

 

 

If 2-armed only than u make use of server load analysis
Say, we decided for this option. We have a huge advantage coming from the LB. We are now able to fully control the traffic flowing through the LB. Since the LB continuously know the amount of data routed to the target pool server. The LB is aware of its load.

In scenarios, where we have to consider higher load on the pool, this might be the best choice, since we are able to redirect traffic much more efficient.

Say, one pool server host a huge conference and it load is now quite high, the LB is aware of this traffic and able to redirect other traffic to member server in this pool, which have lesser load.

Truly, this is not described in the Microsoft deployment guides, since this is comes from the networking prospective.
 

NOTE:
If you are using a 2-armed solution, you can use DNS load balancing and you need to ensure, that server-to-server communication is possible and the HLB works for this fully transparent.


 

Mediation server and gateways

The most complex setup is a load balancer in between of the Mediation server and the gateway. We have reduced ports we are utilizing here, mostly a single port 506x. But we need to plan the Mediation server, load balancer and gateway positioning in conjunction with the Lync TRUNK configuration. So it’s a more advanced config which provides the highest availability and routing of call to ISDN.
 
 
  • SNAT Load Balancing (Full-NAT) for gateway/PBX side of Mediation Server Pool
  • Use if Gateway doesn’t support DNS LB to simplify Gateway/PBX configuration 

Resiliency:

I leave this chapter a bit shorter since it should be only understood as generic knowledge. If you design Resiliency it requires another longer article….
 

Backup Pool:

Some consultants try to avoid HLB balancers at the customers and recommend a two standard edition server deployment, where the second standard server servers as a backup registrar. Sure you can consider, but this is not a god choice and requires much more knowledge for the customer as compared to the little knowledge of HLB.


SBS/SBA:

Its really the most recommended solution, if you have branch offices with no admins and you preferable need resiliency for VOICE.
 

Pool:

Assuming you have two datacenter, possibly far away from each other. Here you have the best option setting up this both dedicated pool in a relationship, acting as backup for each other. Combined with GEO-LB this is a solution of amazing high availability.


Technical requirements and ports for load balancing

More early I wrote an article about load balancing, you might consider this in your learning path too.


 

Certificates:

Certificate requirements can become complex in mind of security. If you need to separate the certificates assigned in pools, you have so many choices how to define the certs for all services. But what I want to highlight here are the certificate for Web services.

You are able to assign each Web service an individual certificate, this means internal and external site of the IIS. If you do so or not, you MUST be able to export the certificate to the HLB for SSL OFF-LOADING, this also requires you to have the SAME certificate with the same private key assigned onto all pool servers.

Please note about this else you will run into serious issues.

OAuth protocol is not mentioned here, since this not an option to be load balanced. Just keep in mind, you should have assigned the same OAuth certificate on all pool server as well.
 

Ports:


External Port Settings Required for Scaled Consolidated Edge, Hardware Load Balanced: External Interface Virtual IPs

Role/Protocol/TCP or UDP/Port
Source IP address
Destination IP address
Notes
XMPP/TCP/5269
Any
XMPP Proxy service (shares IP address with Access Edge service)
XMPP Proxy service accepts traffic from XMPP contacts in defined XMPP federations
XMPP/TCP/5269
XMPP Proxy service (shares IP address with Access Edge service) Any XMPP Proxy service sends traffic to XMPP contacts in defined XMPP federations
Access/SIP(TLS)/TCP/443
Any
Access Edge service public VIP address
Client-to-server SIP traffic for external user access
Access/SIP(MTLS)/TCP/5061
Any Access Edge service public VIP address SIP signaling, federated and public IM connectivity using SIP
Access/SIP(MTLS)/TCP/5061
Access Edge service public VIP address
Federated partner
SIP signaling, federated and public IM connectivity using SIP
Web Conferencing/PSOM(TLS)/TCP/443
Any Edge Server Web Conferencing Edge service public VIP address Web Conferencing media
A/V/STUN,MSTURN/UDP/3478
Any
Edge Server A/V Edge service public VIP address
STUN/TURN negotiation of candidates over UDP/3478
A/V/STUN,MSTURN/TCP/443
Any Edge Server A/V Edge service public VIP address STUN/TURN negotiation of candidates over TCP/443

Firewall Summary for Scaled Consolidated Edge, Hardware Load Balanced: Internal Interface Virtual IPs

Role/Protocol/TCP or UDP/Port
Source IP address
Destination IP address
Notes
Access/SIP(MTLS)/TCP/5061
Any (can be defined as Director, Director pool virtual IP address, Front End Server or Front End pool virtual IP address)
Edge Server Internal VIP interface
Outbound SIP traffic (from Director, Director pool virtual IP address, Front End Server or Front End pool virtual IP address)to Internal Edge VIP
Access/SIP(MTLS)/TCP/5061
Edge Server Internal VIP interface Any (can be defined as Director, Director pool virtual IP address, Front End Server or Front End pool virtual IP address) Inbound SIP traffic (to Director, Director pool virtual IP address, Front End Server or Front End pool virtual IP address) from Edge Server internal interface
SIP/MTLS/TCP/5062
Any (can be defined as Front End Server IP address, or Front End pool IP address or any Survivable Branch Appliance or Survivable Branch Server using this Edge Server)
Edge Server Internal VIP interface
Authentication of A/V users (A/V authentication service) from Front End Server or Front End pool IP address or any Survivable Branch Appliance or Survivable Branch Server using this Edge Server
STUN/MSTURN/UDP/3478
Any Edge Server Internal VIP interface Preferred path for A/V media transfer between internal and external users
STUN/MSTURN/TCP/443
Any
Edge Server Internal VIP interface
Fallback path for A/V media transfer between internal and external users if UDP communication cannot be established, TCP is used for file transfer and desktop sharing
STUN/MSTURN/TCP/443
Edge Server Internal VIP interface Any Fallback path for A/V media transfer between internal and external users if UDP communication cannot be established, TCP is used for file transfer and desktop sharing

Director Ports and Protocols for Firewall Definitions (DNS-HLB)

Role/Protocol/TCP or UDP/Port
Source IP address
Destination IP address
Notes
HTTP/TCP 8080
Reverse proxy internal interface
Director Hardware Load Balancer VIP
Initially received by the external side of the reverse proxy, the communication is sent on to the Director HLB VIP and Front End Server web services.
HTTPS/TCP 4443
Reverse proxy internal interface Director Hardware Load Balancer VIP Initially received by the external side of the reverse proxy, the communication is sent on to the Director HLB VIP and Front End Server web services.
HTTPS/TCP 444
Director
Front End pool or Front End Server
Inter-server communication between the Director HLB VIP and the Front End Server or Front End Servers.
HTTP/TCP 80
Internal Clients Director Hardware Load Balancer VIP The Director provides web services to internal as well as external clients.
HTTPS/TCP 443
Internal Clients
Director Hardware Load Balancer VIP
The Director provides web services to internal as well as external clients.
SIP/MTLS/TCP 5061
Edge Server internal interface Director SIP communication from the Edge Server to the Director, as well as the Front End Servers.
MTLS/TCP/50001
Any
Director
Centralized Logging Service controller (ClsController.exe) or agent (ClsAgent.exe)commands and log collection
MTLS/TCP/50002
Any Director Centralized Logging Service controller (ClsController.exe) or agent (ClsAgent.exe)commands and log collection
MTLS/TCP/50003
Any
Director
Centralized Logging Service controller (ClsController.exe) or agent (ClsAgent.exe)commands and log collection
 

Director Ports and Protocols for Firewall Definitions (HLB)

Role/Protocol/TCP or UDP/Port
Source IP address
Destination IP address
Notes
HTTP/TCP 8080
Reverse proxy internal interface
Director Hardware Load Balancer VIP
Initially received by the external side of the reverse proxy, the communication is sent on to the Director HLB VIP and Front End Servers web services
HTTPS/TCP 4443
Reverse proxy internal interface Director Hardware Load Balancer VIP Initially received by the external side of the reverse proxy, the communication is sent on to the Director HLB VIP and Front End Servers web services
HTTPS/TCP 444
Director
Front End Server or Front End pool
Inter-server communication between the Director HLB VIP and the Front End Servers
HTTP/TCP 80
Internal Clients Director Hardware Load Balancer VIP The Director provides web services to internal as well as external clients.
HTTPS/TCP 443
Internal Clients
Director Hardware Load Balancer VIP
The Director provides web services to internal as well as external clients.
SIP/MTLS/TCP 5061
Edge Server internal interface Director Hardware Load Balancer VIP SIP communication from the Edge Server to the Director, and Front End Servers.
MTLS/TCP/50001
Any
Director
Centralized Logging Service controller (ClsController.exe) or agent (ClsAgent.exe)commands and log collection
MTLS/TCP/50002
Any Director Centralized Logging Service controller (ClsController.exe) or agent (ClsAgent.exe)commands and log collection
MTLS/TCP/50003
Any
Director
Centralized Logging Service controller (ClsController.exe) or agent (ClsAgent.exe)commands and log collection

 

Frontend Hardware Load Balancer Ports if Using Only Hardware Load Balancing (HLB)

Load Balancer
Port
Protocol
Front End Server load balancer
5061
TCP (TLS)
Front End Server load balancer
444 HTTPS
Front End Server load balancer
135
DCOM and remote procedure call (RPC)
Front End Server load balancer
80 HTTP
Front End Server load balancer
8080
TCP - Client and device retrieval of root certificate from Front End Server – clients and devices authenticated by NTLM
Front End Server load balancer
443 HTTPS
Front End Server load balancer
4443
HTTPS (from reverse proxy)
Front End Server load balancer
5072 TCP
Front End Server load balancer
5073
TCP
Front End Server load balancer
5075 TCP
Front End Server load balancer
5076
TCP
Front End Server load balancer
5071 TCP
Front End Server load balancer
5080
TCP
Front End Server load balancer
448 TCP
Mediation Server load balancer
5070
TCP
Front End Server load balancer (if the pool also runs Mediation Server)
5070 TCP
Director load balancer
443
HTTPS
Director load balancer
444 HTTPS
Director load balancer
5061
TCP
Director load balancer
4443 HTTPS (from reverse proxy)

 Frontend Hardware Load Balancer Ports if Using DNS Load Balancing (DNS+HLB)
Load Balancer
Port
Protocol
Front End Server load balancer
80
HTTP
Front End Server load balancer
443 HTTPS
Front End Server load balancer
8080
TCP - Client and device retrieval of root certificate from Front End Server – clients and devices authenticated by NTLM
Front End Server load balancer
4443 HTTPS (from reverse proxy)
Director load balancer
443
HTTPS
Director load balancer
444 HTTPS
Director load balancer
4443
HTTPS (from reverse proxy)

References


1 comment:

  1. Maybe you know how things work, but this rambling is terrible

    ReplyDelete