Azure Landing Zone Day-2 Ops: Maintenance and Evolution

min read

Deploying your landing zone was the easy part. Now you must operate it.

The most common failure in platform engineering is treating the landing zone as a finished project rather than an ongoing product. Azure releases new services, requirements change, and teams drift from standards. Without an operational strategy that anticipates these pressures, your foundation becomes a collection of special cases and undocumented changes.

By the end of this guide, you will:

  • Automate drift detection with scheduled GitHub Actions.
  • Remediate non-compliance via Policy without manual intervention.
  • Migrate from legacy modules to Azure Verified Modules (AVM) safely.
  • Implement quarterly identity and networking reviews.

This is Post 10 in the Azure Platform Engineering series.


Managing Configuration Drift

  graph TD
    subgraph Continuous_Cycle [Day-2 Operations Lifecycle]
        Deploy[Initial Deployment] --> Audit[Automated Drift Audit]
        Audit --> Detect{Drift Detected?}
        Detect -- Yes --> Remediate[Policy Remediation / IaC Apply]
        Detect -- No --> Evolve[Review & Evolve]
        Remediate --> Audit
        Evolve --> Update[Migrate to AVM / New Services]
        Update --> Audit
    end

    subgraph Tools [Operations Stack]
        GHA[GitHub Actions: Scheduled Scans]
        AzPolicy[Azure Policy: Compliance Store]
        AzPIM[Entra ID: Access Reviews]
    end

    Audit --- GHA
    Detect --- AzPolicy
    Evolve --- AzPIM

    style Continuous_Cycle fill:#f5f5f5,stroke:#9e9e9e
    style Remediate fill:#e8f5e9,stroke:#2e7d32
    style Detect fill:#fff9c4,stroke:#fbc02d

Visual Notes:

  • Continuous Auditing ensures that manual “out-of-band” changes are detected and documented.
  • Policy Remediation allows the platform to self-heal without manual engineering effort.
  • Periodic Evolution (Review/Evolve) incorporates new Azure features and module updates (like AVM) into the established foundation.

Detecting State and Policy Drift

Drift takes two forms: IaC state drift (the gap between code and reality) and Policy drift (compliance violations).

Terraform Refresh-Only: Detect manual portal changes without modifying resources:

terraform plan --refresh-only -no-color 2>&1 | tee drift-output.txt

PowerShell Policy Audit: Identify non-compliant resources across your hierarchy:

$nonCompliant = Get-AzPolicyState -ManagementGroupName "mg-intermediate" -Filter "complianceState eq 'NonCompliant'"
$nonCompliant | Group-Object -Property policyDefinitionName | Select-Object Name, Count

Scheduling Scans in GitHub Actions

Run drift scans weekly and automatically open a GitHub Issue when deviations are found. This ensures drift is triaged during sprint planning rather than accumulating indefinitely.


The Migration Path: Moving to AVM

Safe Refactoring with moved Blocks

The legacy CAF Terraform module is archived as of August 2026. Migrating to AVM modules is a requirement for continued support. Use the moved block to remap resource addresses without destruction:

moved {
  from = module.enterprise_scale.azurerm_management_group.level_1["mg-platform"]
  to   = module.management_groups_avm["mg-platform"].azurerm_management_group.this
}

Bicep AVM subnets

When moving to Bicep AVM, ensure your template matches the new schema:

module hubVnet 'br/public:avm/res/network/virtual-network:0.10.0' = {
  name: 'hub-vnet-avm'
  params: {
    name: 'conn-hub-vnet'
    // AVM uses an array of objects for subnets
    subnets: [
      { name: 'AzureFirewallSubnet', addressPrefix: '10.0.0.0/26' }
    ]
  }
}

Governance and Identity Lifecycle

Quarterly RBAC and PIM Reviews

Landing zones accumulate orphaned role assignments for deleted service principals. Run a quarterly audit to find and remove these entries.

Automated Access Reviews: Configure Entra ID to automatically revoke access if not explicitly approved by a lead:

az rest --method POST \
  --uri "https://graph.microsoft.com/v1.0/identityGovernance/accessReviews/definitions" \
  --body "{
    \"displayName\": \"Quarterly Platform Role Review\",
    \"scope\": {
      \"query\": \"/subscriptions?\$filter=startsWith(displayName, 'lz-')\",
      \"queryType\": \"MicrosoftGraph\"
    },
    \"settings\": {
      \"defaultDecision\": \"Deny\",
      \"autoApplyDecisionsEnabled\": true
    }
  }"

Networking Evolution

Upgrading Azure Firewall from Standard to Premium is a zero-downtime operation using the “Easy SKU change” method. Premium is required for TLS inspection and IDPS (Intrusion Detection and Prevention).


Best Practices

  • Audit Before Deny: Set new policies to Audit for 7 days before switching to Deny to avoid blocking active workloads.
  • Batch Remediation: When fixing thousands of resources, batch your remediation tasks by resource group to avoid ARM API throttling.
  • Canary Subscriptions: Test every AVM module upgrade in a Sandbox subscription before applying it to the production management groups.

Sources

You have completed the core series. Use the Day-2 Ops patterns established here to ensure your landing zone remains a reliable foundation for your application teams.