Cloud & Infrastructure
Virtualization and Cloud
Private vs. public cloud, VMware vs. Azure/AWS/GCP, and where each workload belongs.
The Hypervisor
A hypervisor is the software layer that makes virtualization work. It sits above physical hardware and allocates CPU, RAM, storage, and network resources to virtual machines (VMs). From each VM's perspective, it's running on dedicated hardware — the abstraction is complete.
Two types:
Type 1 (bare metal): Runs directly on hardware, no underlying OS. VMware ESXi, Microsoft Hyper-V, Xen. More efficient, better performance, used in production environments.
Type 2 (hosted): Runs as an application on top of an existing OS. VMware Fusion, VirtualBox, Parallels. Convenient for development and testing; not used for production infrastructure.
VMware: On-Premises Virtualization
VMware vSphere (ESXi + vCenter) is the dominant enterprise on-premises virtualization platform. It provides the full stack: hypervisor, management console, live VM migration between hosts (vMotion), high availability, distributed resource scheduling, and storage virtualization.
VMware does best:
- Workloads that must remain on-premises for compliance, latency, or licensing reasons
- Predictable performance with full hardware control
- Environments with significant Dell/HPE/Cisco infrastructure already in place
- Workloads where software licensing prohibits or penalizes cloud deployment
The post-Broadcom reality: VMware's 2023 acquisition changed the economics significantly. Perpetual licensing is gone; subscription costs are considerably higher for many organizations. This is accelerating cloud migration conversations for customers who previously had no financial incentive to move.
AWS, Azure, and GCP: Public Cloud
The hyperscalers offer virtualization as a service. You rent compute rather than own hardware — no capital expense, no hardware refresh cycles, elastic scale.
AWS: Most mature platform, broadest service portfolio. Right for complex application architectures and teams with AWS expertise.
Azure: Best integration with Microsoft-centric environments — Active Directory, SQL Server, .NET. Microsoft licensing incentives often favor running Windows workloads in Azure.
GCP: Best for data analytics, machine learning, containerized workloads. Kubernetes originated at Google; GKE remains the benchmark managed Kubernetes offering.
The Real Decision: Workload by Workload
The on-premises vs. cloud debate is often framed as binary. It isn't. The right question is: where does this specific workload run best?
On-premises is right when:
- The workload is latency-sensitive (direct-attached storage for CAD/video, high-frequency queries)
- Compliance requires data residency in a specific location
- Workload is stable and predictable — reserved on-premises capacity is often cheaper than cloud for steady-state compute
- Existing capital investment and compatible licensed software
Public cloud is right when:
- You need to scale rapidly and unpredictably
- Geographic redundancy and disaster recovery are requirements
- The workload is variable — cloud pay-per-use makes sense
- You want to eliminate hardware refresh cycles and capital expense
Many organizations end up hybrid: on-premises for core infrastructure, latency-sensitive workloads, and predictable capacity — cloud for burst capacity, DR, and applications that benefit from managed services.
Azure Site Recovery is worth mentioning specifically: it replicates on-premises VMs to Azure, providing geographically redundant disaster recovery for roughly $25/VM/month plus storage costs. For organizations with existing on-premises VMware that want to reduce their footprint, this provides bi-directional DR infrastructure, geo-redundancy, and a credible exit ramp from on-premises — at a cost that's difficult to replicate with dedicated hardware.
GPU VMs, Local AI, and Agentic Workloads
One of the most interesting emerging use cases for on-premises and cloud VMs is AI inference. GPU-equipped VMs — available from all three major cloud providers, and deployable on-premises with NVIDIA hardware — can run large language models locally, without sending data to external APIs. For organizations with data privacy requirements, or that want to run automated AI workflows at scale, local inference changes the economics.
Separately: tools like Claude Code make VM capacity significantly more useful for development and automation. Spinning up a sandboxed VM, having an agent build and test something inside it, then discarding the VM is a workflow that requires available compute capacity — and rewards organizations that have it.
For real-world examples of infrastructure decisions, see our case studies.
