A critical, but often neglected, component of data protection is disaster recovery (DR). DR is confusing, can be expensive, and, in the absence of a real or perceived threat, can be disregarded as needless overhead. Many businesses that suffer complete data loss do not recover, but very few businesses ever suffer complete data loss in the first place. Hyper-V Replica addresses the complexity and expense component of DR; the remaining challenge for IT is to overcome any perceptual barriers.
The Scope and Purpose of Hyper-V Replica
Unfortunately, many do not completely understand what Hyper-V Replica (HVR) is best suited for and, consequently, use it inappropriately or skip it entirely when it could be useful. Take a few moments to think about what it can really do and what would issues would best be solved with another tool.
Remember that Hyper-V Replica is not a universally-supported technology. For instance, it is not supported to use HVR on any Microsoft Exchange role. Before using HVR with any technology, check its documentation or with support staff.
Replica is for Disaster Recovery
A replica virtual machine is a complete, ready-to-run duplicate of the source virtual machine. It can be switched on and be up and running in a few moments. When this happens, HVR assumes that the source virtual machine is lost and unrecoverable. When the replica is activated, it, in a sense, becomes the “official” copy of the virtual machine.
Replica is Not a Replacement for Clustering or Automated Failover
The primary difference between HVR and a failover cluster is that HVR always maintains two distinct copies of the same virtual machine whereas clustering uses only one. Failover Clustering is designed to rapidly relocate a protected resource – in this case, virtual machines – from an unavailable host to an available host. The reason that the source host became unavailable could be something as benign as a reboot operation. Clustering requires a steady, available underlying shared storage mechanism. Replica is designed to allow for rapid recovery of a virtual machine when one or more of the source components are lost with the assumption that a rapid recovery of those components is not possible. It does not require a common storage platform with the source; doing so would largely defeat the purpose of HVR.
Failover clustering is designed to be automated. HVR should not be automated. Unlike Failover Clustering, there is no quorum or file lock or other protection in place to prevent the source and replica from being brought online simultaneously whenever the source and replica hosts cannot contact each other. Replica operates on assumptions: if the replica is active, it must have been activated intentionally. If the source and replica virtual machines are active simultaneously, you have a “split-brain” scenario. Clients could be confused as to which to connect to. Data could be updated in two places simultaneously. This could create a situation from which you cannot recover without data loss.
Replica is not a Replacement for Backup
For maximum efficacy, backups should be placed on storage that is mobile or remote and can be taken offline. Replica hosts must be connected at all times for HVR to provide its maximum value. Backup must have some rotation capability so that it can maintain multiple copies. HVR does have some multi-copy functionality for its data, but not for the virtual machine components. Backup works best with a fairly long historical chain. Replica works best with a fairly short historical chain.
The primary difference between HVR and backup is that HVR data is potentially live at any time. Backup should be in a read-only state unless it is being overwritten by a new backup. Backup should, ideally, have some rotation methodology in which at least two distinct, unrelated copies exist. Replica can do something similar by specifying a second target for the same replica – this is called Extended Replica. However, this secondary replica is also always ready to be brought online at any time and is therefore susceptible to the same sorts of occurrences as the primary. These include storage failure, corruption transmitted from the source, data tampering, and other issues that do not impact offline media.
Replica does not Replace Checkpoints
Checkpoints are a useful feature that were initially intended to allow for rapid rollback of a virtual machine’s state. After some evolution of the technology, it also allows for a virtual machine to be exported even if it is running, allowing for complete duplicates to be made on demand.
There are several distinctions to be made between checkpoints and replica.
- You have complete control over when a checkpoint is made whereas replicas occur automatically on a set schedule.
- A checkpoint cannot exist independently of the base virtual machine, whereas a replica is a distinct copy.
- A checkpoint can be recovered far more quickly than a replica.
- The exports of a checkpoint are, like any export, inherently one-way orphans and are completely unrelated to any other exports.
- Checkpoints are unusable for disaster recovery whereas HVR is primarily intended for DR. HVR cannot quickly restore a virtual machine at all and it cannot return it to a defined point in time.
- HVR allows for VSS to be triggered to flush I/O buffers. The checkpoint system currently does not (it is to be added to the 2016 version).
These two technologies are radically different in function and purpose and cannot be used interchangeably.
Hyper-V Replica is not Universally Applicable
Some technologies are incompatible or nonsensical to use with Hyper-V Replica. Hyper-V Replica is like Failover Clustering one way: Microsoft designed a generic service that can be used with technologies that have no such service built-in. Almost every service that has its own replication capabilities is better off not using Hyper-V Replica. Two major examples:
- Active Directory Domain Services. It is possible to use Hyper-V Replica with Active Directory. Unless you’re replicating to a third-party provider, it doesn’t make very much sense to do so. You’re much better off installing a full domain controller virtual machine on the Replica host. Active Directory’s replication technology does a much better job at much lower overhead, and you have the added benefit of a live domain controller running in your replica data center.
- Microsoft Exchange. Microsoft flat-out does not support Exchange with Hyper-V Replica and there are known problems with using it. As with Active Directory, Exchange has its own replication technologies that are far superior to what Hyper-V Replica can offer it.
- Microsoft SQL Server. Under some conditions, Microsoft will support SQL Server being protected by Hyper-V Replica. However, like AD and Exchange, SQL can replicate itself better than Hyper-V.
In general, I would say that any application with its own replication abilities will not benefit from Hyper-V Replica.
Understanding the Hyper-V Replica Operation
To effectively work with HVR, it’s necessary to have an understanding of its basic flow. The following high-level overviews cover the basic processes.
Normal Replica Operations
The following steps illustrate how replica is configured and operates when all is well.
- A Hyper-V node or its cluster must first be enabled as a Replica Server. This allows it to participate in replication. There is no such thing as a “Replica Client”. In order for a server to be able to send replicas to target host, it must be able to accept incoming replicas – otherwise, failback would be impossible. While you may not choose to configure it this way, there is no technological reason that a single system could not host live virtual machines and replicas simultaneously. While you cannot, and do not want to, prevent a host from allowing incoming replicas, you can restrict which other hosts or clusters it will allow replication with.
- If desired, SSL certificates are assigned to the hosts to encrypt the replicas as they travel over the wire. This is recommended when the replica will traverse any unsecured media such as the public Internet.
- At least one initial replica is made from one server to another. This can be an offline replica in which the data is first stored on another media type. For example, you can create a replica on an external USB disk (which could happen at several megabytes per second) and then physically transport it to the remote destination where the replica host is.
- Incremental changes are sent from the source host to the target replica server. The amount of data sent will correspond to the amount of change that occurred in the source virtual machine since the previous successful transmission. These changes are packaged in files with the extension .HRL (Hyper-V Replica Log). One HRL file will exist per number of recovery points that you specify when you configure the replica.
- When the recovery point represented by an HRL expires, it is combined into the base virtual hard disk file for the virtual machine in a similar fashion to the way that a differencing hard disk is rolled back into its parent.
- Steps four and five continue for that virtual machine until replication is halted.
The Failover Process
The following steps provide a simplified description of a failover event.
- A failover event is triggered on the target host. If possible, it will notify the source host.
- The failover event is completed on the target host. This is a distinct step from the failover start with the purpose of allowing you to cancel a failover operation.
- The replica virtual machine is now the active copy.
- When the primary site is recovered or rebuilt, replica is re-established in reverse mode.
- A planned failover event makes the rebuilt replica in the primary site into the active copy again.
Best practices for Using Hyper-V Replica
Hyper-V Replica is a very adaptable technology with many possible configurations. You can use it in a low security one-to-one layout or in a complicated many-to-many cluster setup. Microsoft provides very clear documentation on building up an HVR environment from the planning phase through to the maintenance phase, including testing. It includes advanced coverage such as clustering and certificates. Read more on this TechNet article.
There are some topics not fully covered by their documentation. A few of these items:
- Licensing in a Replica environment is tricky. The replica is a virtual machine, separate from the source. By default, Microsoft requires that you have a separate OSE virtualization right for the instance each host. This cost can be avoided if you have Software Assurance on the source host’s license. Some server applications might have their own licensing particulars.
- The replica environment does not need the same hardware as the source environment.
- HVR does not require that you replicate every virtual machine. This can help you save on hardware and licensing expenses if some virtual machines are expendable.
- Replicas can be configured to automatically work with alternate IP address schemes in the event that you cannot duplicate TCP/IP networks between sites.
- For inbound replicas, you must specify a single default storage location. All new replicas will be placed there. However, once a replica is created, it can be manipulated like any other virtual machine. You can use Storage Migration to relocate its component files anywhere you wish.
- You can change the default storage location at any time. It will not affect existing replicas.
- It is not necessary to include every VHDX in the replica. For instance, you may have created a VHDX for the sole purpose of hosting a page file, or you might have a VHDX that only holds temporary data. To exclude these from replica, consult this blog post.
- Some third-party providers grant access to their datacenters as targets for your replicas. They can help you with configuration and licensing. This might be a more cost effective approach than building your own environment, especially if you don’t have your own secondary site.
More Information about Hyper-V Replica
Despite its complexity, replica is a fairly straightforward technology. Once you have it going, it should be more or less maintenance free. You do need to remember to periodically test a failover and failback as demonstrated in the Microsoft article, but other than that it should more or less take care of itself.
Ready to dive deeper into Hyper-V Replica? Here’s what’s available around the topic on our blog:
- Facts about Hyper-V Replica you need to know
- How to enable and configure Hyper-V Replica
- How Hyper-V Replica is different to Backup
- How do failover clusters work in Hyper-V Replica
- How to Check Hyper-V Replica Health
- Advanced Troubleshooting of Hyper-V Replica (3-Part series)
- If you need some help designing your environment, Microsoft provides a capacity planning tool for HVR.