Azure Platform Engineering: The Complete Guide to Building an Enterprise Landing Zone

Most Azure environments don’t fail because of a bad cloud service choice. They fail because nobody built a foundation first. One team creates a resource group called test-rg that becomes production 18 months later. Another ships without tags and cannot attribute 40% of the monthly bill. A third grants Owner-level access to everyone “just to get things working.” These aren’t isolated mistakes — they are the predictable outcome of deploying workloads before designing the platform that hosts them.

As organizations move serious workloads to Azure, ad-hoc environments become a liability: inconsistent security posture, untraceable costs, and no repeatable process for onboarding new teams. An Azure Landing Zone solves this by establishing a governed, scalable platform before workloads arrive, not after the damage is done.

This guide details what an Azure Landing Zone is and the specific problems it prevents. You will learn how Microsoft’s Cloud Adoption Framework (CAF) structures the eight design areas every landing zone must address, the difference between platform and application landing zones, how to choose between Terraform AVM, Bicep AVM, and the ALZ Accelerator, and how this series maps each design area to a hands-on, portfolio-grade project.

This article is the pillar of a 10-part series. It covers the architecture and the core decisions. Each linked cluster article covers one design area end-to-end with working IaC code in both Terraform and Bicep.

What Is an Azure Landing Zone?

The Problem Without One

Azure sprawl follows a consistent pattern. A proof-of-concept subscription gets extended. A temporary resource group outlives its project. An exception to the naming convention becomes the naming convention. Within 18 months, the environment is ungovernable: multiple teams have invented their own security approaches, naming conventions conflict, and cost attribution has collapsed because tagging was optional at the start and nobody enforced it retroactively.

A cloud audit in this state typically surfaces the same findings: storage accounts with public access enabled, virtual machines without diagnostic logging, resource groups owned by employees who left the company, and a monthly bill that no one can break down by team or application. Fixing these problems after the fact requires negotiating with every team that owns affected resources — a process that takes quarters, not weeks.

What a Landing Zone Provides

A landing zone is not a product you install — it is a set of design decisions, encoded in IaC and policy, applied before workloads arrive. The CAF definition describes it as the output of a multisubscription Azure environment that accounts for scale, security governance, networking, and identity.

In practice, it means every team that deploys into a landing zone inherits the same baseline automatically: hub networking with centralized egress, identity constraints enforced by policy, mandatory tagging with budget alerts, and centralized logging without per-resource configuration. The platform team defines the baseline once. Workload teams receive it by default.

The separation is explicit: platform concerns (shared services, guardrails, centralized tooling) live in platform subscriptions managed by the platform team. Workload concerns (application code, databases, APIs) live in application subscriptions managed by product teams. Neither group needs to understand the other’s implementation details to perform their tasks.

Platform Landing Zones vs. Application Landing Zones

The Two-Layer Model

The standard ALZ architecture uses two distinct layers.

The platform landing zone is owned by the platform team. It contains shared services: the hub virtual network, Azure Firewall, Azure Bastion, private DNS zones, the centralized Log Analytics Workspace, and the policy definitions and assignments that govern the entire environment. Product teams do not deploy into these subscriptions; they use services made available through VNet peering and policy inheritance.

Application landing zones are spoke subscriptions. Each workload team owns one or more. They receive networking connectivity from the hub through VNet peering, inherit security and tagging policies from the management group hierarchy, and access centralized monitoring through diagnostic settings enforced by policy. Within those guardrails, application teams own their resources and operate at their own deployment pace.

The platform team acts as a product team. Their product is the application landing zone — the pre-configured subscription environment that product teams receive when they deploy a new workload. They define the environment’s configuration, provisioning workflow, and long-term guardrail maintenance.

Responsibility	Platform Team	Application Team
Hub VNet, Firewall, Bastion	✓
Private DNS Zones	✓
Policy definitions and assignments	✓
Management group hierarchy	✓
Log Analytics Workspace	✓
Spoke VNet (within IP plan)	✓ provision	✓ operate
Application resources		✓
Application-level RBAC		✓
Workload monitoring dashboards		✓

Why the Separation Matters

When platform changes — firewall rule updates, new policy assignments, DNS zone additions — are confined to platform subscriptions, workload teams remain unaffected. The platform team can update Azure Firewall rules or reconfigure DNS without coordinating with dozens of product teams.

Product teams can deploy application resources, update spoke networks within allocated IP ranges, and manage their own RBAC without platform team approval. They operate fast within guardrails they never had to configure.

Security and compliance posture is enforced at the platform layer. A product team cannot disable diagnostic logging — the policy that enables it has a DeployIfNotExists effect and runs remediation tasks automatically. They cannot create a storage account with public access unless an explicit policy exemption exists for their subscription.

The Eight CAF Design Areas

Microsoft’s Cloud Adoption Framework organizes landing zone decisions into eight design areas. Each one addresses a specific failure mode. Skipping one creates a gap that is expensive to fix later.

Azure Billing and Entra Tenant

Everything in Azure sits inside an Entra tenant. Before designing management groups or writing IaC, you must identify which Azure billing agreement your organization uses — Enterprise Agreement (EA), Microsoft Customer Agreement (MCA), or Cloud Solution Provider (CSP) — as this determines the APIs available for programmatic subscription creation. Subscription vending automation, covered in Post 5, requires the correct billing scope and enrollment account.

Identity and Access Management

Azure identity uses two separate systems. Entra ID roles govern tenant-level services (application creation, directory objects), while Azure RBAC governs Azure resources (VMs, storage, networking). Applying the principle of least privilege at the management group level ensures permissions flow down to every subscription in that group automatically.

Service principals with static client secrets are a liability. Workload Identity Federation with OIDC eliminates the credential rotation problem: GitHub Actions and Azure DevOps pipelines authenticate through short-lived tokens issued by the identity provider, with no secrets stored in the pipeline.

Post 3: Identity and Access Architecture covers RBAC role assignments, Privileged Identity Management (PIM) for just-in-time access, and OIDC configuration.

Management Group and Subscription Organization

The management group hierarchy determines where policies land and what every subscription inherits. The recommended ALZ hierarchy is:

Tenant Root Group
├── Platform
│   ├── Management
│   ├── Connectivity
│   └── Identity
├── Landing Zones
│   ├── Corp
│   └── Online
├── Sandbox
└── Decommissioned

Policies assigned at Landing Zones apply to every subscription under Corp and Online. The Sandbox group is less restrictive, allowing developers to experiment without triggering unnecessary compliance alerts. The Decommissioned group isolates subscriptions during workload shutdown to prevent accidental resource creation.

Management group depth has a hard limit of six levels below the tenant root. Most organizations fit comfortably in three to four levels.

Post 1: Management Group Hierarchy covers the full hierarchy design and AVM implementations.

Network Topology and Connectivity

Hub-and-spoke is the standard topology for enterprise Azure environments. The platform team owns one hub VNet containing Azure Firewall, Azure Bastion, VPN or ExpressRoute gateways, and private DNS zones. Each product team’s spoke VNet peers to the hub and routes traffic through Azure Firewall for inspection.

IP address space planning is difficult to reverse. VNet address spaces cannot be resized after VNet peering is established without causing an outage. Plan for three to five years of growth before assigning ranges.

Hub: 10.0.0.0/16
Corp spokes: 10.1.0.0/16 through 10.19.0.0/16
Online spokes: 10.20.0.0/16 through 10.39.0.0/16

Private DNS zones in the hub handle name resolution for private endpoints across all spoke VNets. Each VNet supports up to 500 peerings by default, and private endpoints scale to 1,000 per VNet.

Post 2: Hub-and-Spoke Networking covers hub VNet deployment, firewall policy, and spoke provisioning.

Security

Defender for Cloud provides Cloud Security Posture Management (CSPM) across all subscriptions. The Secure Score provides a baseline for security posture before workloads arrive. Enable the foundational CSPM tier on every subscription on day one.

Microsoft Sentinel acts as the SIEM/SOAR layer, ingesting logs from the platform Log Analytics Workspace. Security baselines — such as CIS, NIST, and the Microsoft Cloud Security Benchmark — are applied as Azure Policy initiatives at the management group level.

Post 7: Security Baseline covers multi-subscription enablement and compliance framework assignment.

Governance

Azure Policy is the primary enforcement layer. It supports three critical effects:

Audit: Flags non-compliant resources without blocking them.
Deny: Blocks non-compliant resource creation at the API level.
DeployIfNotExists: Automatically deploys companion resources, such as diagnostic settings or tags.

Policy-as-code ensures every definition, assignment, and exemption lives in version control and passes through a pull request before application.

Post 4: Governance at Scale covers policy definition structure and remediation automation.

Management and Monitoring

A centralized Log Analytics Workspace collects logs from across the environment. A DeployIfNotExists policy automatically configures diagnostic settings on new resources, ensuring no manual intervention is required.

Log flow for a production landing zone includes:

Azure Activity Log
Azure Firewall Logs
Entra ID Sign-in Logs
Defender for Cloud Alerts

Retention of 90 days satisfies most compliance frameworks; 180 days covers more demanding standards.

Post 6: Centralized Monitoring covers workspace deployment and Monitor Workbook configuration.

Platform Automation and DevOps

Landing zone IaC lives in a Git repository with a CI/CD pipeline. Subscription vending is the highest-impact automation a platform team can build. Instead of manual configuration, product teams open a pull request against a subscription request file. The pipeline creates the subscription, places it in the correct management group, provisions the spoke VNet, and assigns RBAC roles.

Azure Verified Modules (AVM) provide pre-built, tested Terraform and Bicep modules for major Azure resource types. Using AVM reduces boilerplate and ensures resources follow Azure’s recommended practices.

Post 5: Subscription Vending and Post 8: CI/CD Pipeline cover the automation patterns.

IaC Tooling: Terraform AVM vs. Bicep AVM vs. ALZ Accelerator

Azure Verified Modules

AVM modules are Microsoft-owned, publicly maintained Terraform and Bicep modules. Microsoft retired legacy approaches in 2026, including the terraform-azurerm-caf-enterprise-scale and classic ALZ-Bicep modules. All new implementations should use AVM modules directly.

Terraform AVM example (virtual network):

module "spoke_vnet" {
  source  = "Azure/avm-res-network-virtualnetwork/azurerm"
  version = "~> 1.0"

  name                = "vnet-spoke-prod-001"
  resource_group_name = azurerm_resource_group.spoke.name
  location            = var.location

  address_space = ["10.1.0.0/16"]

  subnets = {
    app = {
      name             = "snet-app"
      address_prefixes = ["10.1.1.0/24"]
    }
  }

  diagnostic_settings = {
    to_law = {
      name                  = "diag-to-platform-law"
      workspace_resource_id = var.log_analytics_workspace_id
    }
  }
}

Bicep AVM equivalent:

module spokeVnet 'br/public:avm/res/network/virtual-network:0.8.0' = {
  name: 'spokeVnetDeploy'
  params: {
    name: 'vnet-spoke-prod-001'
    location: location
    addressPrefixes: ['10.1.0.0/16']
    subnets: [
      {
        name: 'snet-app'
        addressPrefix: '10.1.1.0/24'
      }
    ]
    diagnosticSettings: [
      {
        name: 'diag-to-platform-law'
        workspaceResourceId: logAnalyticsWorkspaceId
      }
    ]
  }
}

Tooling Comparison

Terraform requires a state backend, typically an Azure Storage account, which adds minor operational overhead but offers precise control and multi-cloud portability.

Bicep has no state file, using Azure’s deployment history as the source of truth. Deployment Stacks (GA in 2025) add full lifecycle management, allowing the stack to delete resources removed from a template.

Factor	Terraform AVM	Bicep AVM
State management	Required — Azure Storage	None — ARM history
Native ARM integration	Indirect (azurerm provider)	Direct (native)
Ecosystem tooling	Terratest, Infracost	Azure CLI, Deployment Stacks
Learning curve	Medium	Low

Hands-On: Deploying the Landing Zone Scaffold

This example deploys the management group hierarchy and platform subscription structure using the ALZ Accelerator.

Prerequisites:

Azure CLI ≥ 2.80
PowerShell 7.4+
Terraform ≥ 1.10 or Bicep CLI ≥ 0.40
Global Administrator access for initial management group creation

Step 1: Install the ALZ Accelerator module

# Install from PowerShell Gallery
Install-Module -Name ALZ -Force -Scope CurrentUser

# Verify version (7.0.1+ recommended for 2026)
Get-Module -Name ALZ -ListAvailable | Select-Object Name, Version

Step 2: Run the interactive bootstrap

# Generates a starter repository in C:\Source\MyALZ
New-ALZEnvironment -Path "C:\Source\MyALZ" -DeploymentStrategy "GitHubActions"

Step 3: Review the generated management group module (Terraform)

# Generated: modules/management_groups/main.tf
# NOTE: Replace 'sttfstateplatform001' with your pre-created storage account name.

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
  backend "azurerm" {
    resource_group_name  = "rg-tfstate-platform-001"
    storage_account_name = "sttfstateplatform001"
    container_name       = "tfstate"
    key                  = "management-groups.tfstate"
  }
}

module "management_groups" {
  source  = "Azure/avm-ptn-alz/azurerm"
  version = "~> 1.0"

  management_group_name = "contoso"
  enable_telemetry      = false
}

Best Practices and Common Mistakes

Design the management group hierarchy before writing IaC. Changing it later requires moving subscriptions, which triggers immediate compliance scans and requires updating policy references across all teams.

Assign Azure Policy at the management group level, not the subscription level. Subscription-scoped assignments do not scale. Management group assignments flow down automatically to present and future subscriptions.

Use OIDC federated identity for all pipeline service connections. Static secrets expire and risk accidental exposure. OIDC tokens are short-lived and issued per workflow run.

Enable Defender for Cloud Foundational CSPM immediately. The free tier costs nothing and establishes a Secure Score baseline before workloads arrive.

Troubleshooting Common Issues

“Management group creation fails with AuthorizationFailed” The deploying identity needs Management Group Contributor on the Tenant Root Management Group (/).

# Run as Global Administrator
az role assignment create \
  --role "Management Group Contributor" \
  --scope "/" \
  --assignee "<service-principal-object-id>"

“Policy DeployIfNotExists remediation task is stuck in ‘Evaluating’” The managed identity for the policy assignment often lacks sufficient RBAC to deploy resources in target subscriptions.

# Get the managed identity principal ID for the assignment
$principalId = az policy assignment show `
  --name "diag-settings-initiative" `
  --scope "/providers/Microsoft.Management/managementGroups/contoso-landingzones" `
  --query "identity.principalId" -o tsv

# Grant Contributor to the policy managed identity at the management group scope
az role assignment create `
  --role "Contributor" `
  --scope "/providers/Microsoft.Management/managementGroups/contoso-landingzones" `
  --assignee $principalId

Key Takeaways

An Azure Landing Zone is a set of governed design decisions encoded in IaC. The cost of building it up front is lower than retrofitting governance onto a sprawling environment later.

Start with Post 1: Design Your Azure Management Group and Subscription Hierarchy — every subsequent design area depends on this hierarchy. Before proceeding, run the ALZ Accelerator bootstrap to generate a production-quality repository.