Choosing A Windows Clustering Strategy in 2025

Azure Local vs. Traditional SAN Clustering vs. Storage Spaces Direct

Choosing A Windows Clustering Strategy in 2025

Introduction

In modern Windows infrastructure, there are multiple strategies for building highly available clusters. This content, originally published as a blog, was adapted into a presentation for a session at MMS MOA 2025, To Windows Server or Not: The Clustering Question, where the PowerPoint version is available. The session compares three key solutions side-by-side: Azure Local (Azure Stack HCI), Windows Server Failover Clustering with External Storage (traditional 3-tier architecture using SAN/NAS), and Windows Server Failover Clustering with Storage Spaces Direct (S2D). We explore the technical architecture of each, their pros and cons, and strategic considerations. Key factors like cost, scalability, performance, hardware needs, manageability, cloud integration, licensing, and best-fit use cases are analyzed with comparison tables for clarity. A dedicated section on demo scenarios is included to showcase strengths and differences in a lab environment. Finally, we provide a decision framework to guide choosing the right approach based on an organization’s needs.

Azure Local (formerly Azure Stack HCI)

Microsoft’s Azure Local is a leading hyperconverged infrastructure platform designed for seamless on-premises virtualization with deep Azure integration through Azure Arc. It supports clusters ranging from 1 to 16 nodes, running on the Azure Stack HCI operating system—a subscription-based Azure service. Built on Windows Server technologies, it leverages Hyper-V for virtualization, Storage Spaces Direct (S2D) for software-defined storage, and Windows Server Failover Clustering for high availability. By pooling locally attached drives (NVMe, SSD, HDD) across nodes using S2D, it ensures highly available Cluster Shared Volumes for virtual machines.

Key characteristics and features

  • Azure Integration: Azure Local is connected to Azure for management and hybrid capabilities. Clusters are registered in Azure and can be monitored and managed through the Azure Portal alongside Azure resources . This enables native use of Azure services such as Azure Monitor, Azure Backup, Azure Site Recovery, and Azure Arc for the on-prem cluster . It provides a single control plane for both cloud and on-prem, which simplifies operations and extends cloud benefits to local infrastructure.
  • Subscription Licensing Model: Unlike Windows Server, Azure Local is licensed via a monthly subscription per physical core (approximately €9 per core per month) , billed to an Azure subscription. This OPEX model means you pay for the software on an ongoing basis rather than a large upfront license. (Hardware is purchased separately through a vendor, and workload VMs still require their own OS licenses, e.g. Windows Server guest licenses .)
  • Validated Hardware and Performance: Azure Local runs on approved, validated hardware from various vendors (over 200 solutions in the Azure Local Catalog) . These are industry-standard x64 servers (often sold as integrated systems or “HCI appliances”) with requirements like high-speed RDMA networking and specific storage configurations. This ensures reliability and industry-leading performance, leveraging features like persistent NVMe caching, storage acceleration (e.g. mirror-accelerated parity), and RDMA for low-latency, high-throughput storage communication . Microsoft notes that this design can result in significant cost and performance advantages over traditional SAN-based architectures .
  • Continuous Updates and New Features: Azure Local follows a cloud cadence for updates – new feature releases roughly annually (and quality updates frequently), with the requirement to install updates within 6 months . This means the platform is always up-to-date with the latest features (for example, recent versions introduced stretched clustering for multi-site DR, improved network encryption, faster storage resync, etc.) . Notably, Azure Local has exclusive features like one-click stretched clusters for DR (since version 22H2) , support for single-node HCI deployments (for edge scenarios) , Azure Kubernetes Service (AKS) integration, and others that are not available in a standard Windows Server deployment.
  • Hypervisor-Only Workload: Azure Local is designed only to run VMs/containers on the host; you do not run other roles or third-party apps directly on the host OS. In fact, the host is not a general-purpose Windows Server and does not allow client connections or use of traditional server roles (no Active Directory, no IIS, etc., without running those inside VMs) . This is a key difference in positioning – Azure Local is essentially a specialized virtualization host environment, not a multipurpose server OS. All workloads (Windows or Linux) run as guests on the hyperconverged cluster, and if those guests are Windows Server, they require standard Windows licenses (or Windows Server Subscription Add-on via Azure) .
  • Cloud-Managed, Hybrid by Design: Administration can be done through familiar tools like Windows Admin Center (WAC) and PowerShell, but also through the Azure Portal for a unified view . For example, an admin can use the Azure Portal to monitor cluster health across sites, deploy Azure Arc-enabled services on the cluster (like AKS or SQL Managed Instance), or use Azure Policy/Security Center on the on-prem resources. Azure Local truly shines in hybrid scenarios where on-prem infrastructure needs to integrate with cloud strategy – it’s positioned by Microsoft as the best way to modernize and extend your datacenter with Azure . Many choose it for branch offices and edge deployments as well, thanks to support for small 2-node switchless clusters (which reduce hardware costs) and even single-node deployments for very small sites .
  • Hardware and Networking Requirements: Typical Azure Local nodes have 2x or more high-speed NICs (25/50 GbE with RDMA) for storage replication traffic, plus additional NICs for VM traffic and management. The storage in each node can be a mix of SSD and HDD (with SSD or NVMe used as cache tier) or all-flash NVMe setups for maximum performance. All nodes in the cluster should be fairly identical in config (CPU, memory, disks) for best results. A cluster can start with as few as 2 nodes (for full redundancy) and scale out by adding nodes up to 16. Clusters must pass strict validation and use supported components/drivers.
  • Connectivity Requirements: Because it is an Azure service, internet connectivity is required (at least intermittently). The cluster nodes must phone home to Azure at least once every 30 days for billing and status – if they cannot, the cluster will continue running but new VMs cannot be created until connectivity is restored . This means completely air-gapped environments may face challenges using Azure Local (though Azure offers an offline billing key process in some cases).

Azure Local Use cases

Azure Local shines in distributed or edge scenarios and for organizations looking for Azure hybrid integration. Microsoft in the past, recommended Azure Local for remote offices, branch offices, retail stores, manufacturing sites, and other distributed locations that may have fewer than a handful of clusters. It provides a consistent Azure experience in scenarios with latency or regulatory constraints that prevent using Azure public cloud exclusively. For example, a retailer can deploy 2-node or 4-node Azure Local clusters in dozens of stores to run local point-of-sale and inventory VMs, while centrally monitoring them through Azure. Manufacturing plants with automation systems can use Azure Local to run low-latency control workloads on-premises, with Azure Arc ensuring security policies are uniformly applied. Another key scenario is VDI (Virtual Desktop Infrastructure) and Azure Virtual Desktop on Azure Local, giving users local performance with cloud management. Because Azure Local has cloud-based licensing and requires an Azure subscription, organizations that are already heavily invested in Azure or planning a hybrid cloud strategy are the primary audience. In contrast, if an organization wants to avoid cloud dependency or already has extensive on-prem investments, they might consider the other approaches below. Azure Local can operate in a disconnected mode (no Internet) in preview as of 2025, but this is mainly for special cases – normally it expects at least intermittent connectivity to Azure (by default, clusters must check in within 30 days or they show as out-of-compliance). This is an important consideration for completely air-gapped environments. Overall, Azure Local delivers the newest technology (hyperconverged storage, hybrid integration) and is Microsoft’s strategic direction for on-premises infrastructure, but it introduces a new operational model and subscription-based cost.

In summary, Azure Local is a cutting-edge solution that blends on-premises virtualization with cloud advantages. It offers a simplified, hyperconverged architecture with guaranteed hardware compatibility and a cloud-like consumption model. It’s best suited for organizations looking to refresh or consolidate legacy infrastructure (especially aging SAN-based clusters) with something more cloud-connected , and for those who value Azure’s hybrid capabilities and continuous innovation stream.

Traditional WSFC with External SAN/NAS Storage

This is the traditional 3-tier architecture for Windows Server clusters. In this design, two or more Windows servers form a failover cluster and rely on an external shared storage subsystem (such as a Storage Area Network – SAN, or Network-Attached Storage – NAS) to hold the data. All cluster nodes connect to the shared storage via a storage network (e.g. Fibre Channel, iSCSI, or SMB3 over Ethernet) . The cluster can host roles like file servers, Hyper-V virtual machines, or database instances, and the shared disk ensures that whichever node is active has access to the same underlying data. This approach separates the compute layer (the cluster nodes running workloads) from the storage layer (the SAN/NAS providing shared disks or LUNs).

Key characteristics and features

  • Separate Storage Array: The hallmark of this design is a dedicated storage solution. Commonly this is a SAN array with dual controllers, shelves of disks, and its own RAID/data protection. The SAN presents logical disks (LUNs) to the Windows cluster nodes. Alternatively, a NAS device might present an SMB3 share or NFS export that all nodes use (in Hyper-V’s case, Clustered Shared Volumes can even be on a Scale-Out File Server or NAS share). In all cases, the storage is physically separate from the servers that run the applications. This allows scaling or managing storage independently of the compute nodes. For example, you can add more disks or upgrade the SAN for capacity without touching the Hyper-V servers, or conversely add more cluster nodes for compute without altering storage (assuming the SAN can support the additional workload).
  • Windows Server as Host OS: Each node in the cluster runs a full Windows Server (Standard or Datacenter edition) installation. Failover Clustering is a feature of Windows Server (available in Standard and Datacenter), so no additional software purchase is needed beyond the Windows Server licenses. The cluster nodes can run any server roles or applications as needed. For instance, in a Hyper-V cluster scenario, each node would have the Hyper-V role and connect to the shared storage for VM files; in a SQL Server Failover Cluster Instance (FCI), each node runs SQL Server and uses a shared disk for the database; in a file server cluster, each node runs the File Server role serving the same SAN-backed volume. This architecture has been supported and common since the early 2000s with Windows Server.
  • Cluster Size and Scalability: Windows Server Failover Clustering supports up to 64 nodes in a single cluster when using shared storage. This is a much higher node count than the 16-node limit of S2D-based clusters. In practice, the cluster size might be limited by the SAN’s ability to handle concurrent connections or the specific workload, but the platform itself allows very large clusters. The scalability of storage is governed by the SAN/NAS capabilities – high-end SANs can offer multi-petabyte capacity, advanced tiering, deduplication, etc., often scaling far beyond what internal disks of a few servers could. This makes traditional clustering suitable for very large-scale needs or when extremely granular scaling of storage and compute is required (e.g., adding just storage capacity without adding compute).
  • Performance Characteristics: Performance in this model depends heavily on the SAN/NAS and the storage network. A well-designed SAN with a high-performance controller (or controllers) and fast disks can deliver excellent I/O throughput to all connected nodes. Technologies like Fibre Channel provide high bandwidth and low latency, and enterprise SANs often have large caches, all-flash tiers, or NVMe support. However, compared to hyperconverged storage, there is additional latency from network hops and potentially a bottleneck at the SAN controller for all IOs. For instance, all writes might funnel through a RAID controller on the SAN, whereas in an S2D cluster the write might be handled by the node’s own NVMe and distributed to a couple of partner nodes. In some cases, hyperconverged systems tout lower latency since they use direct-attached NVMe and RDMA between nodes, avoiding a shared fabric contention . On the other hand, a single high-end SAN can sometimes outperform a smaller HCI cluster in raw IOPS due to specialized hardware. The key point is performance will be as strong as the investment in the external storage and network — if an organization already has a robust 16 Gb Fiber Channel SAN or a high-speed iSCSI network, a traditional cluster can leverage that effectively.
  • Hardware and Complexity: A traditional cluster involves more components to manage. You have the cluster nodes (standard servers, which could be any hardware certified for Windows Server) and separately the storage subsystem (which could be from Dell, HPE, NetApp, etc., each with its own management interface and firmware). The network fabric connecting them (Ethernet or Fiber Channel switches) is another piece. All these must be configured correctly (zoning, LUN masking, multipath IO, etc. for SAN). This adds complexity and usually requires storage expertise. Many enterprises historically had distinct storage administrators and server/VM administrators. In small/mid-size setups, an IT generalist can manage both, but there is a learning curve to SAN management. The hardware flexibility is high: any server that passes Windows Server certification and any SAN/NAS that supports the required protocols can potentially be combined. This can be advantageous if you want to repurpose existing hardware or mix newer/older servers (though mixing is not recommended for cluster stability).
  • Manageability: Each layer of the 3-tier design is managed separately. Admins use Failover Cluster Manager or Windows Admin Center to manage the cluster roles and nodes, and use the SAN’s management software (or web interface) to manage disks, RAID sets, snapshots, etc. This segregation means maintenance tasks like updating firmware on the SAN or expanding a LUN must be coordinated with cluster maintenance (e.g., taking cluster disks offline). In contrast, an HCI solution tends to unify these operations. On the positive side, the separation also means issues in one layer (storage or compute) can be handled without directly affecting the other – for example, you might upgrade SAN firmware independently of the Hyper-V OS updates.
  • Cloud Integration: Traditional SAN-based clusters have no native cloud integration – they are a purely on-prem solution. That said, organizations can still leverage cloud services at a higher level: for example, backing up VMs to Azure using Azure Backup agent, or replicating VMs to Azure using Azure Site Recovery. Those are add-on services and require installing agents or configuring Azure Arc, etc., but are not built into the clustering solution. Microsoft positions Windows Server (with or without SAN) as suitable for “traditional infrastructure” or as the OS running in VMs on Azure Local , rather than something that itself connects to Azure. This means if cloud connectivity and hybrid features are a priority, this traditional approach may feel lacking compared to Azure Local.
  • Licensing and Cost: In the traditional model, Windows Server licensing is required for each cluster node. Typically, if you are running Hyper-V or multiple workloads, you’d use Datacenter Edition on each node, which allows unlimited Windows Server VMs on that node. This can actually be cost-efficient for virtualization: once you license the host servers, you don’t pay extra for each Windows VM’s OS (the VM OS is covered by the host’s Datacenter license) . There is no additional license fee for the clustering feature – it’s included in Windows. The major cost is the SAN/NAS hardware and support. Enterprise SANs can be expensive (often costing as much or more than the servers). They also usually come with ongoing support contracts. Thus, while you avoid the Azure Local subscription fee, you might pay significant capital expense for a SAN and its maintenance. For organizations that already own a SAN (sunk cost) or have a preferred storage vendor, this can be a sensible use of existing investment.
  • Use Cases and Longevity: This approach has been the standard for many years and is well-proven. It’s ideal for scenarios where data services need to be centralized – for example, a large shared storage that multiple clusters or servers access. Some use cases where it excels:
    • Legacy applications requiring shared disks: e.g., older clustering technologies (SQL Server FCI, older file server clusters) that rely on a shared disk resource work naturally with this design.
    • Mix of heterogeneous workloads: If you want one cluster node to run a SQL DB and another to run a different workload, and both use the same SAN storage, Windows Server allows that flexibility (though generally clusters run a single role like Hyper-V).
    • Independent scaling: If an org’s storage growth is outpacing compute growth (or vice versa), the decoupled nature is beneficial. For instance, you can expand storage capacity in the SAN without purchasing new compute nodes, or you can add more Hyper-V nodes if you need more CPU/RAM for VMs without necessarily adding storage (provided the SAN can handle more VMs).
    • High node count or geo-cluster: If you need a cluster larger than 16 nodes (for whatever reason), or are implementing a cluster set architecture (multiple clusters in one management domain), the traditional approach is necessary because S2D clusters have smaller node limits . Also, multi-site clusters with SAN replication (using Storage Replica or SAN-level replication) can be built on this model (though Azure Local now supports stretched clusters natively, SAN-based clusters have done stretch clustering via replication for years) .

Traditional WSFC Use cases

This traditional model is common in financial services and large enterprises that built their IT on SAN storage. For example, a bank might use a Fibre Channel SAN to host databases and core banking applications in a cluster for high availability, leveraging the reliability and familiarity of their storage arrays. Healthcare organizations sometimes use SAN-based clusters for hospital information systems or EMR/EHR applications – the data can reside on highly redundant SAN storage with proven replication to a secondary site. Workloads like file services can be implemented as a clustered file server where the file shares live on a shared disk (or alternately on a dedicated NAS appliance). In scenarios where direct Azure integration is not a priority – for instance, an isolated government network or a company with strict data residency that doesn’t allow cloud connectivity – a SAN cluster provides high availability completely on-prem. It’s also worth noting that Hyper-V virtualization can run on a SAN cluster: prior to HCI solutions, many organizations ran Hyper-V or VMware clusters using a SAN. Those who prefer the separation of storage (perhaps to allow non-stop data access even if one cluster is down, or to snap/backup data at the array level) might stick with this approach. On the downside, if an organization is starting fresh without existing storage, the high initial costs and complexity might make this less attractive in 2025 compared to newer HCI options.

In summary, WSFC with external SAN/NAS is a tried-and-true architecture that offers great flexibility in hardware and a clear separation of storage and compute. It may involve more complexity and higher upfront costs (for the SAN), but it’s a solid choice for organizations with existing storage infrastructure or those who need the absolute maximum in compatibility and independent scaling. It represents the “traditional” path where cloud integration is minimal and the pace of change is slower (aligned with Windows Server’s long-term release cycle). Microsoft still fully supports this model, although strategic focus has shifted toward HCI solutions for new deployments.

Windows Server Failover Clustering with Storage Spaces Direct (S2D)

Windows Server Failover Clustering with Storage Spaces Direct (S2D) is Microsoft’s software-defined hyperconverged solution built into Windows Server (available in Datacenter edition since Windows Server 2016). It allows a cluster of Windows Server nodes to use internal drives on those servers to create a shared storage pool, eliminating the need for an external SAN . In essence, it provides similar capabilities to Azure Local (since Azure Local’s storage is actually powered by S2D under the hood ) but within the standard Windows Server product and licensing model.

Key characteristics and features

  • Hyperconverged Architecture: In an S2D cluster, each node contributes its local disks (SSDs, HDDs, NVMe, etc.) to the cluster’s storage pool. The cluster will automatically create volumes that are spanned and replicated across nodes, providing fault tolerance. This means the compute and storage are converged on the same servers – each node both runs workloads (e.g., Hyper-V VMs) and participates in storage for the cluster. There is no external SAN; the cluster’s storage is entirely distributed. Microsoft’s implementation uses a Software Storage Bus to tie the nodes’ storage together, and features like a built-in cache (fast disks acting as cache for slower disks), mirror or parity layouts for resiliency, and Cluster Shared Volumes (CSV) to allow all nodes to access the volumes concurrently . The result is a single unified storage pool that all VMs see, similar to a SAN LUN, but it’s backed by the aggregated performance of all local drives.
  • Hardware Requirements: While not as strict as Azure Local’s curated hardware list, a successful S2D deployment still requires careful hardware consideration. Windows Server Datacenter Edition is required on each node (S2D is a Datacenter-only feature) . Microsoft maintains a list of S2D validated components and vendor solutions (often called “WSSD” or S2D Ready Nodes). It’s strongly recommended to use matched or very similar servers and drives. Each node needs some mix of storage devices – commonly a tiered setup (e.g., NVMe or SSD for cache, plus SSD/HDD for capacity). A high-bandwidth, low-latency network between nodes is critical: S2D relies heavily on east-west traffic for data mirroring. Typically, at least 10 Gbps Ethernet with RDMA (RoCE or iWARP) is used to connect nodes for storage sync, with SMB3 and SMB Direct as the transport. Without RDMA, S2D will work but with higher CPU usage and latency. The cluster nodes should have redundant network paths and ideally use switchless direct connections or quality switches for the storage network. In smaller (2-node) S2D clusters, you can even do switchless cross-connect with two cables, which saves cost and is supported .
  • Cluster Size and Scaling: S2D clusters support 2 to 16 nodes (minimum 2 for resiliency, up to 16 max) . This is the same node count limit as Azure Local (since both leverage S2D). You can scale capacity in two ways: Scale-up by adding more drives to existing servers (if they have free slots), or Scale-out by adding more nodes (each bringing its CPU/RAM and more disks). This coupled scaling means when you add a node, you boost both compute and storage pool capacity. Conversely, one drawback is if you only need more storage but not compute, you still add a whole server (unless you can just swap in larger disks). Microsoft does support a variant called “disaggregated” or converged S2D, where one cluster provides storage and another set of servers use it over network (via SOFS), but this is more complex and rarely used in favor of true hyperconverged mode. In general, S2D clusters are very flexible for mid-sized deployments – a typical deployment might be 2 to 8 nodes. Each node can host many VMs (with Datacenter license covering them). The storage capacity and IOPS scale out as you add nodes, roughly linearly for many workloads .
  • Performance: Storage Spaces Direct was built to take advantage of modern hardware – it has support for NVMe drives, persistent memory, and uses RDMA networking to reduce CPU overhead. It also integrates with ReFS (Resilient File System) to enable fast clone and tiered storage features beneficial for VMs. Performance in a well-configured S2D cluster can be excellent: you can get very high IOPS and throughput by using multiple nodes and drives in parallel, effectively leveraging aggregate bandwidth of all devices. The cluster automatically uses the fastest media as cache, so if you have, say, NVMe and HDD, the NVMe will cache hot data to accelerate the slower disks . S2D introduced features like mirror-accelerated parity to balance performance and capacity (combining some mirroring and some erasure coding on volumes). A 2-node S2D cluster typically mirrors data (2-way or 3-way mirror), whereas larger clusters can use parity for efficiency. Real-world benchmarks have shown S2D clusters rivaling or exceeding the performance of equivalently priced SAN solutions, especially for virtualization workloads that benefit from distributed, cached storage. One must note that part of this performance comes at the cost of using some of the cluster nodes’ CPU and memory for storage tasks (since it’s software-defined storage). In most designs, a certain overhead is factored in (e.g., leaving ~10-20% host CPU for storage operations).
  • Features and Limitations: Because S2D is essentially the same core tech in Azure Local, it shares many capabilities, but not all the cloud-related ones:
    • What it has: Full failover clustering for VMs or other roles, CSV support, the ability to handle disk and node failures gracefully (volumes remain online if copies exist on surviving nodes), integration with Cluster Aware Updating for rolling upgrades, etc. Windows Admin Center offers a nice dashboard for S2D clusters, showing drive health, IOPS, latency per node, etc., similar to Azure Local management (minus Azure cloud-specific info). S2D in Windows Server 2022 also inherited some improvements (faster repair times, adjustable repair speed, etc.). Admins can use PowerShell or WAC to create volumes, add nodes, and so on. S2D supports Storage Replica if you want to replicate volumes to another cluster (for DR), but stretch clustering (one cluster spanning two sites) was not available on Windows Server S2D (that is an Azure Local exclusive as of 22H2) . Windows Server would require setting up two separate clusters and replicating between them for a similar effect.
    • What it lacks vs. Azure Local: S2D on Windows Server does not include native Azure integration – you won’t have an Azure Portal view unless you onboard the cluster to Azure Arc manually. There are also a few new features Microsoft kept only for Azure Local: for example, Azure Edition of Windows Server (with hot-patching) is supported on VMs in Azure Local but not on vanilla Windows Server clusters ; certain performance tweaks (like the mentioned single-node cluster option, thin provisioning of S2D volumes, and other enhancements) came to Azure Local first . The release cadence is different: Windows Server is on a 2-3 year major release cycle (Windows 2016, 2019, 2022, etc.), whereas Azure Local gets new features annually or faster . So, if you deploy S2D on WS2022, you’ll get a stable feature set that won’t change until possibly the next Windows Server release (or maybe never, since Microsoft might not backport some features). This can be a pro or con: conservative stability vs. rapid innovation.
  • Management: An S2D cluster is managed like any Windows Server cluster. Windows Admin Center is the preferred tool now – it provides a unified web GUI to manage Hyper-V, storage, networking, etc., on the cluster. WAC can show the health of drives (and even initiate RMA for failed drives with some integrations), create volumes, set up tiering, and so on. Traditional tools like Failover Cluster Manager also work (you’ll see the cluster with “Storage Spaces Direct” enabled and see the CSV volumes). PowerShell cmdlets (like Enable-ClusterStorageSpacesDirect, New-Volume, etc.) are available for automation. The experience is very similar to Azure Local management, except without the Azure-specific pieces. You do need to monitor the hardware more on your own (disks, network) or use System Center if available. There’s no included cloud monitoring unless you hook into Azure Arc or another service. Overall, manageability is fairly streamlined for S2D clusters, especially compared to a separate SAN – you manage everything in one place with one set of tools.
  • Licensing and Cost: The big appeal of using S2D on Windows Server is licensing efficiency for Windows-centric shops. Since you must use Datacenter Edition on the hosts (to get S2D), you automatically get the benefit that all Windows Server guest VMs on those hosts are licensed. You don’t pay a separate subscription for S2D itself – it’s included in Windows. So if a company already has Windows Datacenter licenses (perhaps via Software Assurance or an Enterprise Agreement), they can use those to build an S2D HCI cluster with no additional software cost. In contrast, Azure Local would be an additional subscription cost on top of any guest OS licenses . This can make S2D clusters very cost-effective. The hardware cost is similar to Azure Local – you need multiple servers with good drives and networking – but you avoid purchasing an expensive SAN. Often, commodity storage in servers is cheaper per TB than proprietary SAN disks. That said, one must still factor the operational cost of managing an HCI – e.g., ensuring proper networking, dealing with drive replacements, etc., which in a SAN might be more automated. Bottom line: S2D enables a “build your own” hyperconverged system using Windows Server licenses. For a given set of hardware, if you already own the Windows licenses, it can be much cheaper than buying a vendor’s SAN or an Azure Local subscription.
  • Use Cases: S2D-based clustering is ideal for organizations that want the benefits of hyperconverged infrastructure (simpler architecture, no SAN, high performance) but either do not want or need the Azure integration or have constraints that require offline capability. Some scenarios where it excels:
    • Disconnected or Secure Environments: If the cluster will run in an environment with limited or no internet (e.g., a secure datacenter, tactical military environments, etc.), standard Windows Server is the way to go because it can operate fully offline indefinitely . Azure Local’s 30-day check-in requirement would be problematic there.
    • Existing Windows Server investments: Organizations with spare Windows Datacenter licenses or expertise in Windows Server might prefer S2D to leverage what they have. It’s a natural evolution for a Hyper-V shop that wants to move off SANs but isn’t ready to adopt a cloud service model.
    • Mid-size deployments: S2D clusters in the range of 2-8 nodes are very common and suitable for many mid-sized enterprise or departmental workloads. They provide a nice balance of performance and cost. Very large enterprise deployments might lean towards Azure Local for the extra bells and whistles, while very small (1-2 server) environments might stick to simpler setups or Azure Local for single-node. But the middle ground is S2D’s sweet spot.
    • Flexibility in hardware choice: Some users want to mix & match hardware or DIY their cluster (with caution). Windows Server gives a bit more flexibility to try different hardware not explicitly in an HCI catalog (though still must pass cluster validation). This is useful for test labs or repurposing existing servers into a new cluster, which might not be “officially” Azure Local nodes but can still be used for S2D.

WSFC with S2D Use cases

S2D on Windows Server is often chosen by organizations that want hyperconverged infrastructure without the cloud subscription model. It appeals to those who are comfortable with Windows Server and maybe already have Datacenter licenses. For instance, a mid-market company that has a small IT team might deploy a 4-node S2D cluster to run all their VMs (domain controllers, exchange, web servers, etc.) and enjoy high availability and performance without needing to buy a SAN. They get a lot of the benefits of Azure Local (HCI, high performance) but can keep everything self-managed. S2D clusters are also found in remote office/branch office (ROBO) scenarios where budgets didn’t allow a SAN – e.g. a law firm with a main office and a branch might put a 2-node S2D cluster in the branch for local redundancy, using existing Windows licenses. Edge and disconnected environments that cannot phone home to Azure are a natural fit for Windows Server S2D clusters, since they impose no cloud requirement. For example, a mining site or a ship at sea could run an S2D cluster completely offline for long periods. Industries like manufacturing sometimes use S2D clusters for on-premises data ingestion and analysis of IoT devices on the factory floor, especially if they don’t need the full Azure integration of Azure Local. On the other hand, organizations that require the absolute latest features or a cloud-managed approach might prefer Azure Local over vanilla S2D, as Microsoft is focusing new development there. It’s worth noting too that Microsoft’s roadmap indicates continued support for S2D in Windows Server (Windows Server vNext will presumably include it), but a lot of the buzz is around Azure Local. So some companies adopt Azure Local for primary sites and use Windows S2D for secondary sites due to licensing flexibility.

In summary, Failover Clustering with Storage Spaces Direct is essentially Microsoft’s on-premises hyperconverged solution that mirrors Azure Local’s technical approach but without the Azure dependency. It offers a path to modern HCI using familiar Windows Server licensing and tools. While it may not have every latest feature that Azure’s offering gets, it is powerful and mature, having been deployed in production since 2016. It strikes a balance between the old-world (independent, self-managed, one-time purchase) and the new-world (cloud-managed, subscription) approaches, giving customers choice based on their comfort level and requirements.

Comparative Analysis of the Three Solutions

Comparison Matrix

FactorAzure Stack HCI (Azure Local)WSFC w/ External Storage (SAN/NAS)WSFC w/ Storage Spaces Direct (S2D)
Cost ModelSubscription (per core, per month, OPEX). Separate OS licenses for guest VMs. Hardware CapEx.One-time Windows Server licenses (CapEx). High SAN/NAS costs. Datacenter edition covers unlimited VMs.One-time Windows Server Datacenter licenses (CapEx). Commodity hardware. No subscription fees.
Scalability1-16 nodes. Scale by adding nodes or drives. Limited to 16 nodes.Up to 64 nodes. Independent compute/storage scaling. SAN limits may apply.2-16 nodes. Coupled compute/storage scaling. Max 16 nodes.
PerformanceExcellent (S2D, NVMe/SSD, RDMA). Low latency, high throughput. Hardware-dependent.Variable (SAN-dependent). High-end SANs perform well; latency higher. Bottlenecks possible.Excellent (local flash, RDMA). High throughput, low latency. CPU overhead for storage.
Hardware RequirementsValidated Azure Local hardware. RDMA NICs, SSD/NVMe. Internet for Azure.SAN/NAS supporting clustering. Standard servers. Dedicated storage network.Standard x86_64 servers, local disks. RDMA NICs recommended. Validated designs preferred.
ManageabilityUnified via Windows Admin Center/Azure Portal. Automated patching. Azure subscription needed.Two-tier (Windows tools + SAN interface). No unified console. Familiar to storage teams.Windows Admin Center. No Azure integration. Cluster-Aware Updating. Vendor support for validated nodes.
Cloud IntegrationDeep Azure integration (Arc, backup, monitoring). Periodic Azure connectivity required.Minimal. Azure integration via add-ons (Site Recovery, Arc). Not automatic.Optional (Arc for monitoring). Standalone by default. Custom Azure setups possible.
LicensingAzure subscription for host OS. Guest VMs need licenses. No CALs for host.Windows Server (Standard/Datacenter). Datacenter for unlimited VMs. SAN may have fees. CALs if serving clients.Datacenter Edition mandatory. Covers unlimited VMs. CALs if serving clients. No extra S2D licensing.
Best-Fit Use CasesHybrid cloud, modernizing infrastructure, branch/edge, Azure Arc services, stretched clustering.Existing SAN investment, classic shared disks, independent scaling, proven stability, multi-vendor flexibility.On-prem HCI, cost-optimized virtualization, gradual SAN migration, remote sites, one-time purchase model.

Table: Comparison of Azure Local vs Traditional SAN-Based Cluster vs Storage Spaces Direct in Windows Server. Each approach has distinct advantages and ideal scenarios, as outlined above.

As shown, Azure Local shines in cloud-integrated scenarios, offering great performance and simplification at the cost of a subscription and the requirement of Azure connectivity. Traditional SAN-based clustering offers maximum familiarity and separate scaling of storage, best when a SAN is already in place or specific storage capabilities are needed, though it can be costly and complex to manage. Windows Server S2D clustering offers a middle ground – hyperconvergence and modern performance without mandatory cloud ties, leveraging existing Windows licensing, but with slightly fewer features and a limit of 16 nodes.

To further clarify the trade-offs, let’s summarize the pros and cons of each approach:

Azure Local – Pros and Cons

Pros:

  • Integrated Hybrid Cloud Features: Seamless use of Azure services (monitoring, backup, etc.) and unified management through Azure . Great for hybrid cloud and cloud bursting scenarios.
  • High Performance HCI: Excellent virtualization performance with built-in SSD/NVMe caching, RDMA support, and distributed storage . Optimized for running VMs and modern workloads with low latency.
  • Simplified Infrastructure: Eliminates the need for a separate SAN – all storage is local, reducing footprint and complexity . One vendor (Microsoft) solution with validated hardware leads to a streamlined experience and single point of support .
  • Continuous Innovation: Regular feature updates (annual or faster) deliver new capabilities (e.g., stretched clusters, new security features) without waiting for a new Windows Server release .
  • Flexible Deployment Sizes: Supports small edge cases (even one-node or two-node clusters) up to larger 16-node clusters, covering a range of needs. Two-node switchless configuration reduces hardware needed for small deployments.

Cons:

  • Ongoing Subscription Cost: Operational expense model can be costly over time – paying per core per month indefinitely . For environments with many cores and many years of use, this could exceed traditional licensing cost. Also requires an Azure subscription setup and credit considerations.
  • Requires Internet Connectivity: Not suitable for completely isolated environments – cluster must check in to Azure at least every 30 days . This can be a challenge for secure networks and adds dependency on Azure’s availability for provisioning new VMs.
  • Host OS Use Limited to Hypervisor: Cannot run additional roles or user services directly on the host OS (no general file server or domain controller on the host, for example) . All workloads must be virtualized, which might be a con if someone wanted to use the physical machine for something else (most won’t, but it’s a design consideration).
  • Hardware Lock-In (Validated Nodes): Must use supported hardware from the catalog. Less flexibility to repurpose random existing servers unless they meet HCI validation. Also, hardware must be fairly uniform – adding a different type of node later might not be supported.
  • Update Frequency: The need to apply feature updates within 6 months means a more aggressive update schedule . Some orgs prefer the slower cadence of Windows Server LTSC for stability. This requires discipline in cluster updating to stay supported.

Traditional WSFC + SAN – Pros and Cons

Pros:

  • Separation of Concerns: Compute and storage are separate, allowing specialized tuning of each. Storage administrators can manage the SAN for optimal performance and reliability, while server admins manage the cluster nodes. Each component (servers or SAN) can be maintained/upgraded independently.
  • Independent Scaling: Can scale up storage or compute independently (add disks/shelves to SAN without buying new servers, or add more compute nodes if CPU/RAM is the bottleneck) – very flexible for dynamic enterprise needs.
  • Mature and Stable: This model has been around for decades; both the technology and staff experience are often well-established. There are clear best practices and many tools for SAN management. Windows Server Failover Clustering on its own is stable and well-understood.
  • High Maximum Capacity: Can potentially handle very large clusters (up to 64 nodes) and very large storage pools (depending on SAN). Good for scenarios that need a lot of nodes or huge central storage.
  • Leverage Existing Investment: Many organizations already own SAN hardware or fiber channel networks. Continuing to use them can maximize ROI. Also, some proprietary SAN features (deduplication, replication, snapshots, etc.) can complement the cluster in ways that pure Windows-based solutions might not match.

Cons:

  • Higher Capital Cost: Good SANs are expensive. The upfront cost for a redundant SAN with sufficient performance can be significant, not to mention maintenance contracts. This can make the solution cost-prohibitive for smaller setups compared to using internal disks.
  • Complexity and Management Overhead: Two layers to manage means higher complexity. Firmware, multipath settings, network zoning issues – there are more things to configure and more points of failure (SAN, fabric, servers, all must work together). Troubleshooting can be harder when an issue could lie in either the storage array or the Windows cluster.
  • Single Storage Backend Risk: While SANs are usually redundant internally (dual controllers, etc.), the storage remains a centralized component. If it fails catastrophically or has a firmware bug, it could impact all nodes. HCI distributes this risk across nodes. SAN-based clusters mitigate this with robust hardware, but it’s still a consideration (SAN as a critical component).
  • Less Agile for New Workloads: Spinning up new storage for a project might require SAN reconfiguration, waiting on storage team, or purchasing new disks. HCI by contrast might just allocate some existing free space from the pool. In fast-paced environments, the slower provisioning of traditional storage can be a bottleneck.
  • Limited Azure Integration: As noted, this approach doesn’t inherently connect to cloud management or services. In a world trending towards hybrid IT, a pure SAN + WSFC has to bolt on those integrations, which might not be as smooth or might be skipped entirely, missing out on some benefits.

WSFC + Storage Spaces Direct – Pros and Cons

Pros:

  • No External SAN Needed: All storage is internal, so you avoid the cost and complexity of a SAN. This simplifies the infrastructure (just a set of servers) and can significantly reduce costs while still providing high availability.
  • High Performance & Resiliency: Achieves SAN-like (or better) performance using commodity hardware, thanks to features like SSD caching and RDMA. It also has no single point of storage failure – data is mirrored across nodes, so each server is a part of the redundancy. Failure of one node or one drive doesn’t bring down the storage pool.
  • Cost Efficient for Windows Workloads: If you already license Windows Datacenter on hosts, you get both the hypervisor and storage features in one – effectively “free” HCI software. And that license covers unlimited Windows VM instances on those hosts . Over a multi-year period, this can be far cheaper than paying a monthly fee or buying a third-party HCI solution, especially for Windows-centric environments .
  • Autonomous Operation: Does not rely on internet or cloud connectivity – can run in a dark site indefinitely . You have full control over when to update (can stick to an OS version for many years if it’s stable for you). This autonomy is important for certain regulated industries or offline environments.
  • Unified Management Stack: Because everything (compute + storage) is Windows, you manage it all with one set of tools (WAC, PowerShell, FCM). There’s a consistency in how you deploy, monitor, and update the environment. And you still have the option to integrate with existing Windows Server ecosystem tools (SCVMM, SCOM, etc.).**

Cons:

  • Node Limit (16) and Scaling Coupling: You cannot have clusters larger than 16 nodes with S2D, which in extremely large environments could be limiting (though 16 nodes with today’s hardware is already very high capacity). Also, scaling out requires adding full nodes – there’s less granularity than adding just storage to a SAN.
  • Fewer Cutting-Edge Features: Some new enhancements are exclusive to Azure Local (e.g., one-click stretch clusters, certain performance optimizations, Azure integration features) . Windows Server S2D might not get these until a new Windows release, if ever. So you might lag behind the state-of-the-art that Microsoft offers on Azure Local.
  • Hardware Compatibility and Support: While you can use a variety of hardware, not all configurations are equally tested. There’s a risk if you stray from validated designs. Also, support might be split (Microsoft supports Windows, but hardware issues – you’d go to OEM). With Azure Local, the lines are a bit more blurred and vendors often work closely for a solution support. Ensure you have a good vendor who knows S2D if building your own.
  • Management lacks Cloud Conveniences: You don’t get the Azure Portal view or Azure Monitor alerts natively. If you want centralized multi-cluster visibility, you might need to deploy System Center or other tools. Some admins find Azure’s integrated view in HCI very handy, and that’s missing here unless you do extra work to onboard to Arc.
  • Learning Curve & Mindset: For those used to SANs, moving to S2D requires adopting an HCI mindset – thinking in terms of nodes and software-defined storage, dealing with things like rebalancing volumes after adding a disk, etc. The cluster will do a lot automatically, but it’s a different operational model (more DevOps-y, some say). If an organization is not prepared to manage an HCI, they might struggle initially (though Azure Local would have the same challenge – it’s about the concept, not the product).

Having weighed these pros and cons, it becomes clear that the “best” solution varies by context. Below, we provide some example guidance by industry and workload to illustrate which approach might be a good fit:

Industry and Workload Considerations

Industry Examples

  • Healthcare: Hospitals and clinics value reliability and often have limited IT staff on-site. A large hospital data center might stick with a SAN-based cluster for core systems like electronic health records – it’s stable, and many healthcare apps have long support cycles that favor traditional setups. However, for smaller clinics or departmental needs, an S2D two-node cluster could provide HA without requiring a dedicated storage unit. If the healthcare provider is embracing cloud for analytics or patient apps, Azure Local could be trialed in non-critical workloads to get hybrid benefits (e.g., archival of medical images to Azure while processing them locally). Also, regulatory compliance is key – Azure Local’s frequent updates might be challenging in validated medical systems, so a Windows Server cluster (with its 5+5 year support) might align better in some cases.
  • Retail: Retailers with many branch stores or point-of-sale systems lean towards solutions that are easy to manage centrally. Azure Local is attractive here – Microsoft explicitly targets scenarios like retail stores and branch offices for Azure Local because it can run with minimal IT onsite and report to the cloud A 2-node Azure Local instance in each store could run the POS VMs and be monitored from HQ via Azure Arc. Conversely, some retailers with very minimal IT infrastructure might opt for a single server with failover to cloud rather than any cluster. Traditional SAN doesn’t fit well in dozens of small retail outlets due to cost and complexity. So, Azure Local (or S2D if cloud is not an option) clearly wins in distributed retail scenarios.
  • Financial Services: Banks and financial institutions often have a mix of needs. They might have big legacy systems on mainframes or SAN-based SQL Server clusters for core banking – those could remain on SAN clustering because they are already invested and may require proven setups. But for new initiatives (like private cloud for DevOps, or risk modeling platforms), they might deploy Azure Local clusters to integrate with Azure services (for example, bursting heavy calculations to Azure, or using Azure Security Center via Arc). Some financial orgs are wary of recurring costs, so they may prefer using Windows Server S2D clusters under existing licenses for test/dev or less critical workloads, while saving SAN for the crown jewels. Additionally, edge cases like trading floor analytics appliances could use a small S2D cluster where ultra-low latency on local SSDs is needed.
  • Manufacturing: Factories and industrial sites often have edge computing needs (running control systems, IoT gateways, etc. on-premises due to latency). Azure Local is a strong contender here because it can run IoT Edge or AKS locally and then sync data to Azure when possible. If the factory has intermittent connectivity, Azure Local can still function and now with disconnected mode preview it can even be managed offline. However, many manufacturing sites have been slow to adopt cloud – for an isolated factory network, a straightforward Windows Server S2D cluster or even a traditional cluster with a small NAS might be simpler for local engineers to maintain. If the environment involves older equipment or protocols that need a Windows interface (like running a legacy OPC server), those might just run on a Windows Failover Cluster with shared storage. For new smart-factory deployments, we see Azure Local used to bring cloud compute patterns on-prem (especially with OEMs offering ruggedized Azure Local systems for the edge).
  • Public Sector and Defense: These users often operate in disconnected, secure environments. They might prefer WSFC with S2D (for forward-looking agencies wanting HCI) or even stick to SAN for absolute tried-and-true reliability, because Azure Local’s normal operation requires internet connectivity and telemetry which might not be allowed. That said, Azure Stack (Hub) and now Azure Local disconnected mode exist to cater to these scenarios. It’s a matter of policy – if they can’t use any cloud dependency, Windows Server clustering is the safe route.

Workload Examples

For general virtualization (private cloud), Azure Local and S2D are usually more efficient and easier to expand than a SAN-based cluster, unless the scale is very large or an existing SAN is free to use. For database workloads: if using SQL Server Failover Cluster Instance (which needs shared storage), a SAN or a Scale-Out File Server (SOFS) on top of an S2D cluster would be required – Azure Local can’t directly present a shared disk to two VMs, so you’d either use SQL Always On Availability Groups (which don’t need shared storage) or still rely on some shared storage for FCI. Thus, very database-centric shops might keep a SAN around. Big data or file shares: If you have a giant file repository for a media company, a dedicated NAS appliance might be the simplest scaling solution (petabytes on a single namespace, accessible by many clients). Windows S2D clusters can also serve file shares (with the Scale-Out File Server role), but Azure Local is not licensed to directly serve file shares to end-users (you’d need a VM to do that. So for pure file service, one might use a traditional Windows file cluster with a SAN, or a Windows S2D cluster in file-server mode, depending on skill set. Virtual Desktop Infrastructure (VDI): Azure Local has a big advantage if using Azure Virtual Desktop on-prem – it’s basically designed for that hybrid scenario. S2D can technically support VDI VMs too, but you won’t get the integration with Azure AD and cloud VDI management that Azure Local (with Azure Arc) can provide. Disaster Recovery: If DR to cloud is a priority, all approaches can use Azure Site Recovery, but having your primary on Azure Local might also open the door to Azure Stack Hub or other hybrid DR strategies.

In summary, each industry or workload might favor a different approach, and often enterprises will end up using a mix: for example, a company might run Azure Local in branch offices, maintain a SAN-based cluster for a specific legacy application, and use an S2D cluster for a development lab – all under the umbrella of their hybrid cloud strategy.

Optional Azure Integration for WSFC (SAN or S2D) Clusters

Even if you choose traditional WSFC or S2D without adopting Azure Local, you can still take advantage of Azure hybrid integration on your own terms. Microsoft has developed services like Azure Arc to project on-premises resources into Azure, and others that extend Azure management and backup to on-premises servers.

  • Azure Arc for Servers: Azure Arc allows you to onboard your Windows Servers (physical or VMs) into Azure as connected resources. By installing the Arc agent on each node of your cluster, you can see those servers in the Azure Portal, organize them with tags, apply Azure Policy for configuration compliance, and use services like Azure Monitor and Microsoft Defender for Cloud on them. For example, you could enforce via Arc that certain security settings or updates are applied on all cluster nodes. Arc doesn’t replace your on-prem management tools, but augments them with cloud-based oversight. In the context of a SAN or S2D cluster, each node is Arc-enabled, so collectively you get a single pane in Azure showing “Cluster1-NodeA, Cluster1-NodeB…” etc., and you could even attach the Failover Cluster to Azure Monitor for metrics and alerts. Note that Arc is agent-based and requires outbound internet connectivity from the servers to Azure. It’s a free service; you pay only if you consume related Azure services (like Log Analytics storage, etc.).
  • Azure Backup: Azure offers cloud backup for on-premises servers through the Microsoft Azure Recovery Services (MARS) agent. You can install this agent on cluster nodes or within VMs to back up files, folders, system state, or application data directly to an Azure Backup Vault. For a failover cluster, you would typically install the MARS agent on each node and configure backup of the data (ensuring only the active node’s data is backed up at any time). Alternatively, you can use Azure Backup Server (MABS) or System Center Data Protection Manager to centralize backups and send them to Azure. For example, if you have a Hyper-V cluster (SAN or S2D), you could use MABS to back up VMs from the cluster and store recovery points in Azure. Azure Backup is especially useful for long-term offsite retention without managing tapes or secondary sites. It natively supports Windows Server workloads. Keep in mind cluster integration isn’t entirely seamless – e.g., the MARS agent doesn’t cluster-aware failover, so you manage it per node – but it’s a viable solution for cloud-based backups of clustered data.
  • Azure Site Recovery (ASR): ASR can provide disaster recovery to Azure for on-prem VMs. It supports replicating VMs from a Hyper-V cluster to Azure and orchestrating failover. In a Hyper-V WSFC (either using SAN or S2D storage), you would register each host with ASR. The service coordinates replication of VM disk data to Azure storage in the background. If your datacenter goes down, you can “fail over” to Azure, where those VMs spin up and run. This is a powerful integration for clusters: it means even a traditional cluster that typically would require a secondary datacenter for DR could instead use Azure as the DR site. Microsoft docs confirm that Site Recovery supports clustered Hyper-V hosts – you just need to install the ASR agent on every node and add them to the same recovery vault. Thereafter, ASR handles the fact that VMs might move between hosts (live migration) by ensuring all nodes report into the same vault. This provides an Azure-based DR with relatively little overhead on the cluster side. Similarly, for a file server cluster, you could potentially use Azure File Sync or DFS replication to a VM in Azure for DR, though ASR is more straightforward for VMs.
  • Other Services: There are other Azure services that can integrate with on-prem clusters. Azure Monitor can collect performance metrics and logs from Windows Servers (via the Log Analytics agent) and give you cloud-based dashboards and alerting. Azure Update Management (part of Automation) can schedule Windows Updates on your on-prem servers through Arc, acting as an alternative to WSUS. Microsoft Defender for Cloud can assess the security configuration of on-prem servers (cluster nodes) and even deploy anti-malware or file integrity monitoring. And for data, something like Azure File Sync can be used with a traditional file server cluster to tier cold files to Azure Files cloud storage, saving space on-prem. None of these require using Azure Local; they are all available à la carte for Windows Servers. The idea is that choosing WSFC with SAN or S2D doesn’t lock you out of Azure benefits – you can still incrementally adopt cloud management, backup, and DR components as it makes sense. It’s a hybrid approach where you modernize the management and DR of your traditional infrastructure without fully moving to a cloud-operated model.

Decision Framework – Choosing the Right Approach

To help choose the right clustering strategy for the environment, consider a decision framework based on key questions and requirements. Here are some guiding considerations:

  • Do you need tight Azure integration and hybrid cloud capabilities?
    • Yes: Choose Azure Local. It’s designed for hybrid scenarios, with native Azure management and easy extension to cloud services . If using Azure Arc, Azure Backup, etc., is a priority, HCI will give the best experience.
    • No or minimal: If cloud integration is not a priority or not allowed (disconnected environment), lean towards Windows Server (SAN or S2D). Azure Local’s benefits would be underutilized or impossible without connectivity . Windows Server clusters can still use cloud services, but in a decoupled way.
  • What is your existing infrastructure and investment?
    • Existing SAN/NAS in place: If you have a fairly new or underutilized SAN and fibre channel or iSCSI network, leveraging that with a traditional cluster might be cost-effective. You’ve already paid for the storage, and your team knows how to manage it. Windows Server Failover Clustering with external storage could be the right fit to maximize that investment (Microsoft explicitly suggests Windows Server for “VMs connected to Fibre Channel SAN storage” in traditional setups ).
    • No significant storage infrastructure: If you’re essentially building from scratch or your current storage is aging/insufficient, it might make sense to skip buying a new SAN and go straight to an HCI approach. Azure Local or S2D can reduce hardware footprint and future-proof the design . The choice between those two may then hinge on other factors (cloud vs no-cloud, licensing, etc.).
  • What are your scalability and performance needs?
    • Massive scale (lots of nodes, independent scaling): If you anticipate needing more than 16 nodes in a cluster or want to scale storage and compute on very different trajectories, a traditional SAN cluster is more suitable. It can scale to dozens of nodes , and you can beef up storage without adding servers and vice versa. For example, a large VDI deployment with very high IOPS and say 32 hosts might be better on a SAN that can handle that many connections, whereas S2D would force splitting into two clusters.
    • Moderate scale (up to 16 nodes) and desire for incremental growth: HCI (either Azure Local or S2D) works well here. You can start small and grow node by node, which is ideal for a lot of mid-sized use cases. Performance-wise, if you need ultra-low latency (NVMe, etc.), HCI has an edge by keeping storage local and parallel . For example, if you’re deploying an all-flash solution for a private cloud, S2D/Azure Local can deliver extremely high performance using commodity NVMe drives.
    • Special performance considerations: If you rely on specific SAN capabilities (like a storage array with hardware-assisted replication or very large caches for bursty workloads), and that’s critical, staying with a SAN cluster might be safer. If, however, your performance needs are general purpose virtualization, HCI will likely meet or exceed them with proper hardware.
  • What is your cost model preference (CapEx vs OpEx) and licensing situation?
    • CapEx and existing licenses favored: If you prefer to make one-time purchases and own software outright, and especially if you already have Windows Server Datacenter licenses, then building a Windows Server S2D cluster lets you leverage that without new recurring costs . Similarly, a SAN cluster has no ongoing software fees beyond support contracts – you pay up front for the SAN and Windows licenses.
    • OpEx / subscription model acceptable or desired: If your organization is aligning costs with consumption or if you want the benefit of continuous updates and support bundled in, Azure Local’s subscription might align well. Some businesses like the idea that Azure Local is treated as an Azure service in terms of billing, support, and upgrades – turning what used to be a capital purchase into a service model.
    • Licensing Windows VMs: Consider how many Windows VMs you’ll run. If it’s a large number, note that Azure Local will require you to license each of those VMs (or buy an add-on subscription for unlimited VM rights) , whereas Windows Server Datacenter covers unlimited VMs on that host. For example, an environment planning to run dozens of Windows Server VMs per host might find the licensing cost significantly lower with S2D on Windows Server (since one Datacenter license per host covers it all) as opposed to Azure Local (where one might pay per VM or per-core for guest licensing in addition to HCI subscription). This was noted by experts: if running many Windows VMs, Azure Local can be more expensive unless its unique advantages are needed .
  • How important are the latest features and future roadmap?
    • Need latest and greatest: Azure Local gets new capabilities faster (e.g., new security integrations, management features, stretch clustering, etc.) . If you want to be on the cutting edge of Microsoft’s infrastructure tech and are comfortable updating regularly, Azure Local ensures you’re never far behind the cloud’s state of the art.
    • Proven stability over new features: If you prefer a long-term stable platform with infrequent changes (maybe to meet compliance or just minimize change risk), then a Windows Server LTSC release (either SAN or S2D) might be preferable. You know exactly what you’ll get for 5+ years. The trade-off is you might miss out on some new features that Azure Local users enjoy. It’s a philosophy choice: fast innovation vs. slow and steady.
  • Operational model and team expertise:
    • Small IT team or one-person shop: Azure Local’s promise of unified management and cloud monitoring could simplify life, but the complexity of initially setting it up and the need to manage Azure aspects might be a learning curve. If the team is already familiar with Windows Server, deploying an S2D cluster can feel more straightforward (no Azure knowledge needed).
    • Existing divide between server and storage teams: If your org is structured with separate teams, introducing HCI might blur roles (storage now managed by server team). This can be positive (breaking silos) or negative (steep learning for storage admins). A SAN cluster fits the traditional model of separation. So consider the human factor: which approach can your team adopt readily? Perhaps start with a pilot or training if moving to HCI to get everyone comfortable.
    • Support and Vendor relationships: Some companies prefer to have a single throat to choke. Azure Local with integrated systems can be purchased as an appliance with full vendor support for hardware and Microsoft support for software . If you value that kind of support model, HCI might edge out a DIY Windows cluster. On the other hand, if you have a strong relationship with a storage vendor who provides excellent support for their SAN, that’s a factor in favor of the traditional route.
  • Specific Workload Requirements:
    • Running non-virtualized roles on cluster nodes: Only possible on a traditional Windows Server cluster. For example, a two-node file server cluster using SAN storage where the nodes directly serve files to users – that’s not something Azure Local can do because Azure Local nodes can’t serve clients except through VMs . If you have such needs (maybe a highly available Active Directory domain controller cluster or file server cluster), you’ll be using Windows Server clustering, not Azure Local.
    • Hyper-V vs others: All three support Hyper-V VMs. But if you plan to use other hypervisors or container platforms, note that Azure Local is essentially a Hyper-V/Azure Arc platform only. A Windows Server cluster could, in theory, run other workloads like a Docker cluster on Windows or even have VMware if using a VMFS on shared disk (though that’s uncommon). Generally, if you are standardized on Hyper-V, all are fine. If not, Azure Local might not be for you (since it’s Hyper-V only).
    • Disaster Recovery and multi-site: Do you need built-in DR? Azure Local (with stretch clustering) now offers an elegant multi-site sync replication solution . With Windows Server, you can achieve multi-site clustering either with SAN replication or with Storage Replica, but it’s a bit more involved. If DR is top priority and you want a turnkey solution, Azure Local might have an edge. Alternatively, if you plan to use Azure as your DR target (cloud DR), any Hyper-V solution can potentially use Azure Site Recovery – an area where Azure Local doesn’t have a unique advantage except perhaps easier config.

In essence, match the solution to your environment’s priorities:

  • If you’re a forward-leaning organization embracing cloud and wanting a modern, streamlined stack, Azure Local could be the top choice.
  • If you have constraints that require self-sufficiency, or you want to maximize existing licenses and keep costs low while still modernizing storage, Storage Spaces Direct on Windows Server is very attractive.
  • If you are entrenched in a traditional model or have very specific needs that only a SAN can fulfill (or simply trust that approach’s reliability), a Failover Cluster with external storage remains a viable and perhaps the safest evolutionary path.

Often, the decision is not purely one vs. another; some large organizations might use all three in different scenarios. For example, branch offices might use Azure Local, the main datacenter might use a large SAN cluster for certain workloads and an S2D cluster for others. Microsoft acknowledges that many customers deploy both Azure Local and Windows Server side by side for different purposes . The goal is to choose the right tool for each job.

Conclusion

All three clustering strategies – Azure Local, traditional Windows clustering with a SAN, and Windows Server S2D – achieve the fundamental goal of high availability and scalability for workloads, but they do so with different philosophies and trade-offs. In this deep dive, we explored each approach’s architecture and compared them across technical and strategic dimensions.

Azure Local brings cloud benefits on-premises, offering a cutting-edge, fully integrated HCI experience at the expense of requiring an Azure-connected, subscription-based model. It’s ideal for those looking to closely integrate with Azure and modernize their datacenter with the latest features.

Traditional SAN-based clustering remains a robust choice for those with existing storage infrastructure or specific needs – it separates storage and compute, potentially offering greater flexibility and proven stability, albeit with higher complexity and cost for the storage component.

Storage Spaces Direct on Windows Server offers a compelling middle path, blending the simplicity and performance of HCI with the licensing and offline advantages of traditional Windows Server. It can drastically reduce hardware costs by using internal disks and is a great use of Windows Datacenter licenses, though it doesn’t have the same level of Azure integration or rapid feature rollout as Azure Local.

In the end, the “best” solution depends on the context: budget, skill set, cloud strategy, existing assets, and workload requirements. The comparison tables and demos provided should equip the audience to evaluate these factors. Armed with this knowledge and the decision framework, attendees can make an informed choice about which clustering approach aligns with their environment and business goals. Each approach has its pros and cons, but all are powerful – it’s about leveraging the one that fits your scenario to achieve a reliable, efficient, and scalable infrastructure . The key takeaway is that Microsoft’s ecosystem gives us options, and understanding those options deeply (as we did in this session) is the first step to architecture the right solution for any given datacenter.