Advanced Troubleshooting of Hyper-V Replica – Part 1

Table of contents

Troubleshooting issues with the Hyper-V Replica requires extensive knowledge of the technology. Not only you need to have knowledge but you must also be able to follow a troubleshooting approach which helps you fix the issues and also benefits in finding the root cause. This is what I intend to explain throughout the articles of “Advanced Troubleshooting of Hyper-V Replica” series.

Before you start reading the first part of this article series, please familiarize yourself with the Hyper-V Replica technology by reading a few Hyper-V Replica articles published on this blog. You must also know how to perform a Hyper-V replication health check. I have written a series of articles on this topic, which can be found here:

I’ve also written an in-depth series entitled Hyper-V Replica Explained.

You can encounter a number of issues with the Hyper-V Replica. For example, a virtual machine might fail to replicate its contents to Replica Server due to the network connectivity. Similarity, replication might not occur if a firewall or a network device is blocking the required network ports. This article explains what you should do if a particular condition occurs.

Connection or Authentication Issue between Primary and Replica Server

There are two types of authentication supported by the Hyper-V Replica; “Certificate-based” and “Active Directory-based”. Both Primary and Replica Server might not communicate because of authentication issues. Your first task is to test the replication connection between the Primary and Replica Server by using the Test-VMReplicationConnection PowerShell cmdlet.

Test-VMReplicationConnection offers a few parameters which you can use to perform other checks but if you need to quickly test the connection between a Primary and a Replica Server then execute the below command or as it is shown in the below screenshot:

  • Test-VMReplicationConnection <Replica Server Name with FQDN> 80 Kerberos

The above command tests the connection between the local server (which is acting as Primary Server) and Replica Server (HV-2012R2-B.Hyperv.Local) over network port 80 using the Kerberos authentication protocol (which is Active Directory-based authentication mechanism for Hyper-V Replica).

If you are using Certificate-based authentication, you can change the above command by including the “AuthenticationType” parameter and then specify “Certificate” as the authentication type for cmdlet to test the connection with Hyper-V Replica Server using certificates.

As you can see in the above screenshot, the error returned is “The operation timed out” for the Replica Server.

Test-VMReplicationConnection cmdlet does not provide specific error messages rather it provides generic error message which is “The operation timed out” in this case. The same error is reported in the Event Viewer also. If you look at the Event Viewer (expand Application and Services Logs | Microsoft | Windows | Hyper-V-VMMS, and click on Admin), you will see the event message similar to message reported by the Test-VMReplicationConnection cmdlet as shown in the below screenshot:

Your first task is to fix the connection issue between Primary and Replica Server, if any! To fix the connection issues, you can think of the following possibilities and then move ahead to troubleshoot the issue further:

1. Replica Server or Primary Server cannot communicate with Active Directory or the trust relationship for these two computer accounts has been broken. To verify the computer trust with Active Directory, you can use NLTEST with /SC_VERIFY switch. You need to run this command on both Primary and Replica Servers individually and make sure the output reported is similar to as shown in the below command window:

2. Replica Server or Primary Server is not authorized to access each other over the network. Please note both Primary and Replica Servers communicate over the network. They might fail to communicate if a Group Policy setting has been applied which prevents the network communication. You must check for “Access this computer from the network” Group Policy setting as shown in below screenshot which is taken from a Group Policy Object:

The Group Policy Setting should contain the computer accounts of Primary and Replica Servers or at least make sure to add Authenticated Users security group which contains both computer accounts.

3. Misconfigured TCP/IP settings on Primary or Replica Server. Primary Server communicates with the Replica Server using the FQDN. The FQDN is resolved by querying the DNS Server. In case of misconfigured DNS Server in the TCP/IP settings of Primary Server, the Primary Server might not be able to resolve the FQDN to an IP Address. In case of the name resolution issues, an event will be generated in the Event Viewer as shown in the below screenshot:

For Event ID 32022, it shows the Hyper-V Replica server could not be resolved. You can actually use Ping and NSLookup commands to test the connectivity and name resolution.

It is imperative to understand that it is Primary Server which needs to resolve the name of the Replica Server to an IP Address if replication is one way (e.g. From Primary to Replica Server). So for any name resolution issues in Hyper-V Replica environment, please make sure, especially, Primary Server is configured with the correct DNS Servers. Replica Server might hit with the name resolution issue only if reverse replication action is used.

4. Replica Server is not listening on the required network ports. Replica Server must listen on the required network ports. The required network ports are determined based on the authentication mechanism you have selected for the Hyper-V Replication. For Active Directory-based authentication, the Replica Server must listen on “HTTP 80” and network port “HTTPS 443” if Replica and Primary Servers communicate using a certificate.

To make sure, Replica Server is listening on the required port, you can run the “Netstat –ano” command. This command should list the Replica Server IP address with the port it should listen on as shown in the below screenshot:

As you can see in the above screenshot, Replica Server (0.0.0.0:80) is “LISTENING” over network port 80 which is ok. This indicates that the Replica Server is listening for replication traffic from Primary Server.

Once you fix the connection issue between Primary and Replica Server, run the “Test-VMReplicationConnection” cmdlet again to test the connection. If it does not report any error, the cmdlet output should report the message; “The connection to the specified Replica Server with the specified parameters was successful.” as shown the below PowerShell window:

5. Time difference. If there is a too great time difference between Replica and Primary Server, the replication will fail. Please ensure to configure Primary and Replica Servers to sync time from a reliable time source. The reliable time source could be a domain controller running in your production environment or an external NTP Server.

Conclusion

This article focused on how to troubleshoot issues with the Hyper-V Replica if a specific condition occurs with regards to the authentication and connectivity. We learned that Hyper-V replication could be broken because of Primary Server not able to communicate with the Replica Server due to several reasons.

There are several other reasons the communication may fail between the Primary and Replica Servers. In the next part, we are going to learn how Primary Server establishes a successful connection with the Replica Server to rule out any issues with the network ports and firewall.

 

Altaro Hyper-V Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

19 thoughts on "Advanced Troubleshooting of Hyper-V Replica – Part 1"

  • Helmut Hauser says:

    Just brilliant Nirmal.

    In addition I do recommend the “swiss army knive” WireShark, and if you are in troubleshooting mode, you might check all NTP settings.
    Just a little clock screw can have a rather bad impact.

    1 on pointing towards the certificates.
    Have a quick look to Azure.
    Some breakdowns have been caused by – guess what – Cert issues.

    Cheers,

    Helmut

  • Nirmal says:

    Thanks for commenting Helmut!

    Yes, time difference might cause authentication issue as well! I will have it added as a bullet point in the article!

    Thanks!
    Nirmal

  • Subhash says:

    Hi,

    It would be helpful if you have explained about the workgroup configuration and configuring selfsigned certificates for it.

    I’m currently stuck at Hyper-V failed to enable replication for virtual machine ‘test’: The connection with the server was terminated abnormally (0x00002EFE). (Virtual machine ID 16D3D782-92E2-4944-9E3E-B3237568B427)

    • Zoran says:

      Hi Subhash,
      There are a few reasons why you could be seeing this… most likely your replication certificate has expired and needs to be renewed. Sometimes this could be due to a network change which causes the network to become misconfigured, so check your DNS routing. If you can afford some downtime, try rebooting the server, this has been reported to help some other with the same error.

      Thanks,
      Symon Perriman
      Altaro Editor

Leave a comment or ask a question

Your email address will not be published. Required fields are marked *

Your email address will not be published. Required fields are marked *

Notify me of follow-up replies via email

Yes, I would like to receive new blog posts by email

What is the color of grass?

Please note: If you’re not already a member on the Dojo Forums you will create a new account and receive an activation email.