Tuesday, August 26, 2014

Lync Edge Server or Pool Server have uncommon call drops (30sec) and other IM issues

I came across a funny problem.
I customer complained if a federated call was initiated the call always dropped after exactly 31 seconds.
What we could figure out was this error message.

SIP/2.0 504 Server time-out
ms-client-diagnostics: 52085;reason="Dialog does not exist"

So I continued the analysis and traced the calls. With the wonderful tool SNOOPER, I was able to get a see the "Call Flow Windows", which is really a very helpful visualization of the exact package flow.
I saw now the SIP Session was initiated correctly.

BUT
I figured out, the PRACK message was not acknowledge, so the ACK 200/OK was missing. Even the message was send correctly the target host. "This is a kind of early media." The Voice stream is established, but need to be reconfirmed in case some port/ parameter should be changes/ optimized.
Since the ACK is missing, truly the Lync Server must think the call has ended and actively dropped it.

This all is happened on the EDGE Server of the affected customer.
I really struggled, as I also analyzed the HOSTS file on the EDGE Server.

So far ones I run the IPCONFIG /DISPLAYDNS command and saw something strange. After some DNS entries I have seen wired characters. Especially behind the Frontend FQDNs.

SOLUTION:
The HOSTS File was not written proper and contain invisible characters behind the FQDNs. So Lync EDGE Server was not able finding the Frontend Servers via DNS and therefore the PRACK request from the Frontend could not be ACK acknowledged.  



2 comments:

  1. Interesting article. By the way, the correct commandline switch for IPCONFIG is: "/DISPLAYDNS" and not "showdns"

    ReplyDelete
    Replies
    1. Hi Ricardo,
      tnx finding the typo. I have corrected it.
      Thomas

      Delete