Multi-Site Resilience

Hyper-V Replica, Storage Replica, Campus Clusters, and SAN Replication

April 01, 2026 (Last Modified: April 05, 2026)

Virtualization, Disaster Recovery, Windows Server

Post 13 protects your data with backups. This post protects your services with replication.

Backups recover data , you restore a VM from yesterday’s backup and accept the data loss between the backup and the failure. Replication recovers services , your VMs are already running (or can start within minutes) at a secondary site with near-zero data loss. Production environments need both, and the architecture decisions you make here determine whether a site failure is a business disruption or a page in the runbook.

Windows Server 2025 and the Hyper-V ecosystem give you multiple replication technologies, each with different RPO/RTO characteristics, complexity, and cost. The right choice depends on your requirements, not on which technology sounds most impressive.

That is especially important if your broader strategy is to leave VMware licensing behind without immediately replacing it with a new platform bill. The practical win for many organizations is not “buy everything new.” It is pairing Hyper-V with the replication approach that matches the business requirement and the storage they already trust.

In this fourteenth post of the Hyper-V Renaissance series, we’ll cover every major multi-site resilience strategy available for Hyper-V, from built-in Windows Server features to SAN-level replication, with a decision framework that maps your requirements to the right technology.

Repository: DR runbook templates, Hyper-V Replica configuration scripts, and Storage Replica deployment guides are in the series repository.

The Decision Framework , Start Here

Before diving into technology details, match your requirements to the right solution:

Multi-Site Resilience Decision Framework

Technology	RPO	RTO	Distance	Complexity	Cost	Best For
Hyper-V Replica	30s – 15min	Minutes (manual failover)	Unlimited (async only)	Low	Free (built into Windows Server)	VM-level DR without SAN replication
Storage Replica (sync)	0 (zero data loss)	Seconds – minutes	<5ms RTT (~35km)	Medium	Datacenter edition	Metro-distance zero-loss protection
Storage Replica (async)	Seconds – minutes	Minutes	Unlimited	Medium	Datacenter edition	Long-distance volume replication
Campus Clusters	0 (zero data loss)	Automatic failover	Same campus (<1ms)	Medium	Datacenter + KB5072033	Single-campus rack-level protection
SAN Replication	Vendor-dependent (near-zero to minutes)	Vendor-dependent	Vendor-dependent	Medium–High	Array licensing	Organizations with enterprise SAN investment

Hyper-V Replica , Built-In, VM-Level DR

Hyper-V Replica is the simplest path to VM-level disaster recovery. It’s built into every edition of Windows Server, requires no shared storage between sites, and works across any network connection.

How It Works

Hyper-V Replica uses Resilient Change Tracking (RCT) to track block-level changes to a VM’s virtual hard disks. Changed blocks are compressed and sent asynchronously to a replica server at a secondary site, where a copy of the VM is maintained in an offline state. The replica VM is a point-in-time copy that can be brought online during a failover event.

Architecture

Component	Primary Site	Secondary Site
VMs	Running, production workloads	Offline replicas (not consuming compute)
Storage	Production CSVs / local storage	Replica storage (can be different type/vendor)
Network	HTTP or HTTPS (port 80/443)	Same
Shared storage	Not required between sites	Not required
Cluster integration	Replica Broker role for clustered VMs	Replica Broker on target cluster

Replication Frequencies

Frequency	RPO	Bandwidth Impact	Use Case
30 seconds	~30 seconds	Highest , continuous stream of changes	Critical workloads requiring near-zero RPO
5 minutes (default)	~5 minutes	Moderate , batched changes	Most production workloads
15 minutes	~15 minutes	Lowest , larger batches, less frequent	Non-critical workloads, limited bandwidth

The Replica Broker , Cluster Integration

When your Hyper-V hosts are clustered (and they should be in production), the Replica Broker is essential. It’s a cluster role that:

Provides a stable endpoint for replication regardless of which node owns the VM
Redirects incoming replication traffic to the correct node when VMs move between cluster nodes via live migration
Without it, replication breaks every time a VM migrates to a different node

The Replica Broker must be configured on both the primary and secondary clusters.

Failover Types

Failover Type	Initiated From	Data Loss	When to Use
Test Failover	Secondary site	None , primary keeps running	DR validation testing. Creates an isolated copy on the secondary. Replication continues unaffected.
Planned Failover	Primary site	Zero , final sync before switchover	Planned site maintenance, datacenter migration. Primary shuts down gracefully, remaining changes sync, then roles switch.
Unplanned Failover	Secondary site	Up to last replication interval	Primary site is down. Replica VM starts immediately on secondary. Potential data loss equal to the replication frequency.

Failback

After the primary site recovers:

Reverse replication , synchronize changes made on the former replica back to the original primary
Planned failover , switch roles back to the original primary with zero data loss
Resume normal replication , primary produces, secondary receives

Extended Replication

Hyper-V Replica supports chain replication , the secondary site can replicate to a third site. This provides an additional layer of redundancy (primary → secondary → extended replica). The extended replica can use a different replication frequency than the primary-to-secondary link.

Limitation: Fan-out replication (primary to two separate secondaries simultaneously) is not supported. Only chain topology.

Limitations , Be Honest

Asynchronous only , no synchronous mode. There will always be some data loss potential during unplanned failover.
Replica VMs are offline , they consume storage but not compute at the secondary site until failover.
No automatic failover , failover must be initiated manually or scripted. There’s no built-in heartbeat-triggered automatic failover.
Cannot live migrate replicas , replica VMs can’t be moved between hosts at the secondary site while they’re receiving replication.
Bandwidth proportional to change rate , high-churn workloads (databases under heavy write load) generate significant replication traffic.

When Hyper-V Replica Is the Right Choice

You need VM-level DR without shared storage between sites
Your RPO tolerance is 30 seconds to 15 minutes
You don’t have SAN-level replication capabilities
Budget is constrained , Hyper-V Replica is free
You need to replicate to a remote/cloud-hosted site over WAN

Storage Replica , Volume-Level, Zero-Loss Capable

Storage Replica is a Windows Server feature that provides block-level, volume-level replication between servers, clusters, or sites. Unlike Hyper-V Replica (which operates at the VM level), Storage Replica replicates entire volumes , everything on the volume is replicated, including all VMs, files, and metadata.

Synchronous vs. Asynchronous

Mode	RPO	Latency Requirement	Data Loss	Use Case
Synchronous	0 (zero data loss)	<5ms round-trip time (~35km)	None , writes committed at both ends before acknowledged	Metro-distance protection where zero data loss is mandatory
Asynchronous	Seconds to minutes	No limit	Potential loss of in-flight writes	Long-distance protection where some data loss is acceptable

Synchronous mode deep-dive: When an application writes data, Storage Replica sends the write to both the source and destination volumes simultaneously. The write is only acknowledged to the application after both copies are committed. This guarantees zero data loss but adds write latency equal to the network round-trip time. For this reason, synchronous mode is practical only at metro distances where RTT is under 5ms.

Architecture

Storage Replica uses SMB 3.0 as its transport and requires a dedicated log volume on both source and destination (SSD recommended , faster than the data volume for optimal performance).

Deployment topologies:

Topology	Description	Failover
Stretched cluster	Single WSFC cluster spanning two sites, Storage Replica syncs data	Automatic , cluster handles failover between sites
Cluster-to-cluster	Two independent clusters replicating between them	Manual , administrator initiates failover to secondary cluster
Server-to-server	Two standalone servers	Manual

Stretched cluster is the most powerful topology , combined with synchronous Storage Replica, it provides automatic failover with zero data loss between sites. The cluster treats both sites as fault domains and uses site-aware policies to control VM placement and failover behavior.

Edition Requirements

Edition	Capability
Datacenter	Unlimited volumes, unlimited size
Standard	Single volume, maximum 2 TB

For production multi-site resilience with Storage Replica, Datacenter edition is required.

Key Behaviors

Destination volume is inaccessible during replication , it’s dismounted. You can’t use it for reads or backups.
Test-Failover cmdlet (WS2019+) , mounts a read-write snapshot of the destination for testing or backup without breaking replication.
Encryption , AES-128-GCM with Kerberos authentication. Intel AES-NI acceleration supported.

When Storage Replica Is the Right Choice

You need zero data loss (synchronous mode) between sites at metro distance
You want volume-level replication that protects everything on the volume without per-VM configuration
You’re building a stretched cluster with automatic site-level failover
You have Datacenter edition licensing

Campus Clusters , Rack-Level Protection (New in WS2025)

Campus Clusters are a new capability in Windows Server 2025 that provides rack-level fault tolerance within a single physical location. This is relevant for organizations with a single datacenter or campus that want protection against an entire rack failure , power distribution failure, top-of-rack switch failure, or physical damage to a rack.

What Campus Clusters Are

A Campus Cluster is a Storage Spaces Direct (S2D) cluster configured across exactly two rack fault domains using Rack Level Nested Mirror (RLNM). Data is mirrored between racks so that losing an entire rack doesn’t lose data or availability.

Required: Windows Server 2025 with the December 2024 cumulative update KB5072033.

How RLNM Works

Volume Type	Data Copies	Rack Survivability
Two-copy	One copy in each rack	Survives loss of one rack
Four-copy	Two copies in each rack	Survives loss of one rack plus one node in the surviving rack

A 2+2 configuration (2 nodes per rack) with four-copy volumes provides the strongest resilience , an entire rack plus a node can fail simultaneously.

Requirements and Constraints

Requirement	Details
Rack fault domains	Exactly two (no more, no less)
Network latency	<1ms between racks (LAN , same building/campus)
Node distribution	Symmetric: 1+1, 2+2, 3+3, 4+4, or 5+5 (max 10 nodes)
Storage	All capacity drives same type (flash SSD/NVMe recommended). HDDs and caching tiers not recommended.
NICs	RDMA recommended
Quorum witness	Must be in a third location separate from both racks
Edition	Datacenter

Campus Clusters vs. Stretched Clusters

Factor	Campus Cluster	Stretched Cluster
Distance	Same campus (<1ms)	Geographically separated (metro or WAN)
Storage	Single S2D pool with RLNM across racks	Two separate S2D pools or SANs with Storage Replica
Data sync	S2D handles replication natively	Storage Replica over SMB 3.0
Failover	Automatic (within the cluster)	Automatic (stretched cluster) or manual (cluster-to-cluster)
Use case	Rack-level protection within a datacenter	Site-level protection across datacenters

When Campus Clusters Make Sense

Single datacenter with two-rack infrastructure
Need protection against rack failure (power, network, physical)
Don’t have a secondary site for geographic DR
Starting fresh without existing SAN (S2D is the storage layer)

Note: Campus Clusters use S2D, which is a hyperconverged model , not three-tier SAN. If your strategy is three-tier with external storage, Campus Clusters aren’t the right fit. Use Storage Replica or SAN replication for multi-site protection with external storage.

SAN-Level Replication , Vendor-Native Protection

For organizations with enterprise SAN infrastructure , which is the core audience of this series , SAN-level replication provides the most transparent and performant multi-site protection. The replication happens at the array level, below the hypervisor, with no impact on host CPU or cluster network bandwidth.

How SAN Replication Complements Hyper-V

SAN replication protects the storage volumes that your CSVs reside on. The Hyper-V cluster at the DR site is pre-configured with the replicated volumes. During failover:

SAN replication promotes the DR volumes to read-write
The DR Hyper-V cluster imports and starts VMs from the replicated CSVs
VMs come online at the DR site

This approach is transparent to the Hyper-V layer , the VMs don’t know they’re being replicated, and there’s no per-VM replication configuration required.

Pure Storage ActiveDR (Detailed Example)

Pure Storage ActiveDR provides continuous, asynchronous, bidirectional replication built into Purity//FA 6.0+ on FlashArray systems.

How it works: When a write lands on a source volume protected by ActiveDR, it’s acknowledged to the host immediately, then forwarded continuously to the target array. Unlike traditional periodic async replication (which batches snapshots at intervals), ActiveDR streams writes continuously, minimizing replication lag.

Attribute	Details
RPO	Near-zero , typically measured in seconds, not minutes
Mechanism	Continuous write streaming (not snapshot-based batch)
Configuration	Pod-based , volumes, protection groups, and snapshots all replicate together
Direction	Bidirectional , either site can be primary
Licensing	Included with Purity 6.0+ at no additional cost
Host impact	None , replication is entirely array-to-array, no host CPU or network

Integration with Hyper-V: ActiveDR operates at the storage layer. VMs on CSVs backed by ActiveDR-protected volumes are replicated automatically. The DR site needs a pre-configured Hyper-V cluster (or standalone hosts) ready to import VMs from the replicated volumes. Failover involves promoting the DR volumes and starting VMs , not a one-click operation, but scriptable and well-documented in Pure’s Microsoft Platform Guide.

Other SAN Vendors

The replication architecture is similar across vendors , differences are in features, RPO capabilities, and automation:

Vendor	Technology	Sync Mode	Async Mode	Key Feature
Pure Storage	ActiveDR	ActiveCluster (sync)	ActiveDR (continuous async)	Near-zero RPO with continuous streaming
Dell	PowerStore Metro Volume	Synchronous (active/active)	Asynchronous	Bidirectional active/active metro replication (PowerStoreOS 3.0+)
NetApp	SnapMirror Active Sync	Symmetric active/active (ONTAP 9.15.1+)	SnapMirror async	Integration with Windows stretch clusters for zero RPO/RTO
HPE	Peer Persistence / Remote Copy	Synchronous metro	Asynchronous long-distance	Automatic Transparent Failover (ATF) , redirects host I/O on failure

Each vendor provides a Hyper-V integration guide with specific configuration steps for their array. Consult your vendor’s documentation for deployment procedures.

When SAN Replication Is the Right Choice

You have enterprise SAN infrastructure at both sites
You want transparent, below-the-hypervisor replication with no host CPU impact
Your SAN vendor provides replication at no additional cost (many do)
You need near-zero RPO without the overhead of per-VM replication configuration
You want to leverage array-level data services (snapshots, clones) at the DR site

DR Testing , Prove It Works

A DR strategy you haven’t tested is hope, not a plan. Schedule regular tests for every replication technology you deploy:

Test	Frequency	What You Verify
Hyper-V Replica test failover	Monthly	Replica VM boots at secondary, applications function, network connectivity works
Storage Replica test failover	Quarterly	`Test-SRTopology` for bandwidth verification; `Test-Failover` for volume mount validation
SAN replication failover	Semi-annually	Full site failover , promote DR volumes, start VMs, verify application functionality
Failback procedure	Semi-annually	Reverse replication and return to primary site , this is where most DR plans fail
Runbook walkthrough	Annually	Full tabletop exercise , walk through the entire DR runbook with the operations team

DR runbook templates and testing checklists are in the companion repository.

Combining Technologies

These technologies aren’t mutually exclusive. A comprehensive resilience strategy often layers them:

Layer	Technology	What It Protects Against
Backup (Post 13)	Veeam / Commvault / Rubrik	Data corruption, accidental deletion, ransomware
VM-level replication	Hyper-V Replica	Site failure (async, for VMs not on SAN replication)
Volume-level replication	Storage Replica	Site failure (sync or async, for stretched cluster scenarios)
SAN replication	ActiveDR / SnapMirror / Metro Volume	Site failure (transparent, near-zero RPO, for SAN-connected workloads)

The typical three-tier Hyper-V deployment with SAN would use:

SAN replication as the primary DR mechanism (transparent, near-zero RPO)
Hyper-V Replica for VMs that aren’t on SAN-replicated volumes (if any)
Backup (Veeam or equivalent) for data protection against non-site-failure scenarios

Next Steps

With multi-site resilience in place, your Hyper-V environment is protected against site-level failures. In the next post, Post 15: Live Migration Internals and Optimization, we’ll go behind the scenes on how live migration actually works , the memory pre-copy algorithm, dirty page tracking, WS2025 improvements, and what affects migration time.

Your data survives site failure. Let’s understand how your VMs move between hosts.

Resources

Series Repository: github.com/thisismydemo/hyper-v-renaissance

Microsoft Documentation

Vendor Documentation

Post 13: Backup Strategies for Hyper-V , data protection (backups complement replication)
Post 6: Three-Tier Storage Integration , basic storage connectivity
Post 12: Storage Architecture Deep Dive , CSV internals and storage design

Series Navigation ← Previous: Post 13 , Backup Strategies for Hyper-V → Next: Post 15 , Live Migration Internals and Optimization

The Decision Framework , Start Here

Hyper-V Replica , Built-In, VM-Level DR

How It Works

Architecture

Replication Frequencies

The Replica Broker , Cluster Integration

Failover Types

Failback

Extended Replication

Limitations , Be Honest

When Hyper-V Replica Is the Right Choice

Storage Replica , Volume-Level, Zero-Loss Capable

Synchronous vs. Asynchronous

Architecture

Edition Requirements

Key Behaviors

When Storage Replica Is the Right Choice

Campus Clusters , Rack-Level Protection (New in WS2025)

What Campus Clusters Are

How RLNM Works

Requirements and Constraints

Campus Clusters vs. Stretched Clusters

When Campus Clusters Make Sense

SAN-Level Replication , Vendor-Native Protection

How SAN Replication Complements Hyper-V

Pure Storage ActiveDR (Detailed Example)

Other SAN Vendors

When SAN Replication Is the Right Choice

DR Testing , Prove It Works

Combining Technologies

Next Steps

Resources

Microsoft Documentation

Vendor Documentation

Related Posts