Designing VMware Cloud Foundation 9.1: The 31 Decisions You Need to Make

Every VCF deployment starts the same way: someone hands you a blank whiteboard and says design it. The problem is that VCF 9.1 is a broad platform, and without a structured approach it is easy to make decisions out of order, miss dependencies, or find out three phases in that an early choice locked you into something you did not intend.

Broadcom organizes the VCF 9.1 design process into nine phases covering 31 distinct decisions. This post walks through each phase, what the decisions are, and why they matter in practice. If you are using the VCF Designer tool, this maps directly to the decision schema it uses.

Phase 1: Starting Point and Profile

Before touching any configuration, you need two things nailed down: the design blueprint and the scope.

The Design Blueprint is your baseline deployment profile. Broadcom defines several: single site minimal, single site, multi-site single region, multi-region, and others covering application and security modernization. This is not a technical decision as much as it is a business one. It defines the complexity ceiling for everything that follows.

Scope and Use Cases is where you gate the rest of the design. VCF 9.1 can cover private cloud IaaS, Kubernetes via Supervisor, Private AI Foundation, vDefend lateral security, VCF Edge, and disaster recovery. What you check here enables or disables options in later phases. Do not mark something in scope unless there is a real requirement behind it.

Phase 2: Fleet-Level Decisions

The VCF Fleet Deployment Model defines how the fleet is laid out. A single VCF instance is the most common for customers starting out or running a standalone private cloud. A connected fleet with multiple instances comes into play when you have multiple sites or organizational boundaries that require separate management planes.

The VCF Fleet Sizing Model covers appliance sizing: Small, Medium, HA Medium, Large, and HA Large. Sizing here is not about your workload VMs. It is about the management plane itself. Undersizing the fleet appliances is one of the most common mistakes in early VCF deployments.

Phase 3: Consumption Decisions

This phase covers how cloud consumers interact with the platform. Five decisions, and they are tightly interconnected.

The VCF Automation Model decides whether VCF Automation is deployed and in what topology. If your organization needs self-service provisioning or catalog-driven deployments, you need this. If not, skip it. Running it just because it is available adds operational overhead without benefit.

The Network Consumption Model is one of the most consequential decisions in the entire design. VLAN, NSX Overlay Segments, VPC, or Transit Gateway. This drives downstream decisions on edge clusters, load balancers, and how workloads connect. Get this wrong and you are rearchitecting the network mid-project.

Workload Connectivity and Load Balancer Model follow from the network consumption choice. For load balancing, NSX Native covers most use cases. Avi (VCF Advanced LB) is needed when you require full L7 with advanced policies, SSL offload, or WAF capabilities.

Phase 4: Operations Decisions

Six decisions covering management services, management networking, operations tooling, logging, network observability, and recovery.

The VCF Management Services Model defines availability for SDDC Manager, vCenter, and NSX Manager. Standard vs. Highly Available. For production environments, the answer is almost always HA. The cost of an HA management plane is small compared to the cost of a failed SDDC Manager during a critical operation.

The VCF Management Network Model determines whether management components share a VLAN, use isolated VLANs per component, or run on NSX segments. NSX segments require NSX to be up before management components can communicate, which creates a chicken-and-egg risk during recovery scenarios. Plan this carefully.

The VCF Recovery Option aligns to your RPO and RTO requirements. Backup and restore, component-level recovery, and instance-level recovery each have different complexity and cost profiles. Define your recovery requirements before choosing this, not after.

Phase 5: Security and Compliance

Identity Broker and SSO decisions define how users authenticate to VCF components. Most enterprise environments will federate to Active Directory or an external IdP. Plan this early since it affects every component that needs authentication.

vDefend Lateral Security only applies if it was included in scope in Phase 1. If deployed, the Security Services Platform adds distributed IDS/IPS and east-west traffic inspection.

Phase 6: Virtual Infrastructure

Seven decisions covering domains, clusters, networking, and storage. This is where the design gets concrete.

The VCF Domain Model defines your management and workload domain topology. Single-AZ with one management plus one workload domain is the most common starting point. Stretched (multi-AZ) adds complexity but is required for metro HA.

The Storage Model is one of the decisions with the most downstream impact. VCF 9.1 supports vSAN OSA, vSAN ESA, NFS, VMFS on Fibre Channel, iSCSI, and NVMe variants. vSAN ESA is the recommended path for new deployments using compatible hardware. If you are connecting to an existing SAN or NAS, the external storage options apply.

NSX Manager topology and NSX Edge Cluster decisions define the control plane and data plane for your overlay network. Edge cluster sizing depends on the volume and type of north-south traffic. A shared NSX Manager cluster across domains reduces overhead. Dedicated per domain gives you blast radius isolation.

Phase 7: Physical Infrastructure

One decision: the Network Fabric Model. Routed VLAN fabric, Leaf-Spine VXLAN underlay, or EVPN-VXLAN fabric. This needs to be made in coordination with the network team. The fabric model affects how VLANs are extended across the environment and how the NSX overlay integrates with the underlay. EVPN-VXLAN provides the most flexibility for multi-site and stretched cluster scenarios.

Phase 8: Optional Workload Capabilities

VCF Edge and Private AI Foundation, both conditional on Phase 1 scope. For VCF Edge, single-host is suitable for small remote sites where HA is not required. Three-host provides local HA at the edge.

For Private AI Foundation, the compute model selection depends heavily on the type of workloads. Training workloads typically want full GPU passthrough or MIG. Inference workloads can often share via vGPU.

Phase 9: Closeout

Two workflow tasks, not configuration decisions. First, reconcile every decision made in Phases 1 through 8 against the Broadcom VCF Design Library to confirm alignment with supported patterns. Second, translate the finalized design into the VCF Planning and Preparation Workbook, which is the actual input consumed by the VCF Installer during bring-up. A clean design that does not translate into a properly completed workbook will cause bring-up failures. Budget time for this step.

The Full Decision Index

StepPhaseDecision
1Phase 1Design Blueprint
2Phase 1Scope and Use Cases
3Phase 2VCF Fleet Deployment Model
4Phase 2VCF Fleet Sizing Model
5Phase 3VCF Automation Model
6Phase 3vSphere Supervisor Model
7Phase 3Network Consumption Model
8Phase 3Workload Connectivity Model
9Phase 3Load Balancer Model
10Phase 4VCF Management Services Model
11Phase 4VCF Management Network Model
12Phase 4VCF Operations Model
13Phase 4Log Management Model
14Phase 4VCF Operations for Networks Model
15Phase 4VCF Recovery Option
16Phase 5Identity Broker Model
17Phase 5VCF Single Sign-On Model
18Phase 5Lateral Security with vDefend
19Phase 6VCF Domain Model
20Phase 6vSphere Cluster Model
21Phase 6Distributed Switch Model
22Phase 6Storage Model
23Phase 6NSX Manager and Control Plane Model
24Phase 6NSX Edge Cluster Model
25Phase 6Virtual Network Appliance Cluster Model
26Phase 7Network Fabric Model
27Phase 8VCF Edge Model
28Phase 8Private AI Foundation Platform Model
29Phase 8Private AI Foundation Compute Model
30Phase 9Reconcile Against Broadcom Design Library
31Phase 9Produce the Planning and Preparation Workbook

Comments

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.