Deploying your landing zone was the easy part. Now you must operate it.
The most common failure in platform engineering is treating the landing zone as a finished project rather than an ongoing product. Azure releases new services, requirements change, and teams drift from standards. Without an operational strategy that anticipates these pressures, your foundation becomes a collection of special cases and undocumented changes.
By the end of this guide, you will:
- Automate drift detection with scheduled GitHub Actions.
- Remediate non-compliance via Policy without manual intervention.
- Migrate from legacy modules to Azure Verified Modules (AVM) safely.
- Implement quarterly identity and networking reviews.
This is Post 10 in the Azure Platform Engineering series.
Managing Configuration Drift
graph TD
subgraph Continuous_Cycle [Day-2 Operations Lifecycle]
Deploy[Initial Deployment] --> Audit[Automated Drift Audit]
Audit --> Detect{Drift Detected?}
Detect -- Yes --> Remediate[Policy Remediation / IaC Apply]
Detect -- No --> Evolve[Review & Evolve]
Remediate --> Audit
Evolve --> Update[Migrate to AVM / New Services]
Update --> Audit
end
subgraph Tools [Operations Stack]
GHA[GitHub Actions: Scheduled Scans]
AzPolicy[Azure Policy: Compliance Store]
AzPIM[Entra ID: Access Reviews]
end
Audit --- GHA
Detect --- AzPolicy
Evolve --- AzPIM
style Continuous_Cycle fill:#f5f5f5,stroke:#9e9e9e
style Remediate fill:#e8f5e9,stroke:#2e7d32
style Detect fill:#fff9c4,stroke:#fbc02d
Visual Notes:
- Continuous Auditing ensures that manual “out-of-band” changes are detected and documented.
- Policy Remediation allows the platform to self-heal without manual engineering effort.
- Periodic Evolution (Review/Evolve) incorporates new Azure features and module updates (like AVM) into the established foundation.
Detecting State and Policy Drift
Drift takes two forms: IaC state drift (the gap between code and reality) and Policy drift (compliance violations).
Terraform Refresh-Only: Detect manual portal changes without modifying resources:
terraform plan --refresh-only -no-color 2>&1 | tee drift-output.txt
PowerShell Policy Audit: Identify non-compliant resources across your hierarchy:
$nonCompliant = Get-AzPolicyState -ManagementGroupName "mg-intermediate" -Filter "complianceState eq 'NonCompliant'"
$nonCompliant | Group-Object -Property policyDefinitionName | Select-Object Name, Count
Scheduling Scans in GitHub Actions
Run drift scans weekly and automatically open a GitHub Issue when deviations are found. This ensures drift is triaged during sprint planning rather than accumulating indefinitely.
The Migration Path: Moving to AVM
Safe Refactoring with moved Blocks
The legacy CAF Terraform module is archived as of August 2026. Migrating to AVM modules is a requirement for continued support. Use the moved block to remap resource addresses without destruction:
moved {
from = module.enterprise_scale.azurerm_management_group.level_1["mg-platform"]
to = module.management_groups_avm["mg-platform"].azurerm_management_group.this
}
Bicep AVM subnets
When moving to Bicep AVM, ensure your template matches the new schema:
module hubVnet 'br/public:avm/res/network/virtual-network:0.10.0' = {
name: 'hub-vnet-avm'
params: {
name: 'conn-hub-vnet'
// AVM uses an array of objects for subnets
subnets: [
{ name: 'AzureFirewallSubnet', addressPrefix: '10.0.0.0/26' }
]
}
}
Governance and Identity Lifecycle
Quarterly RBAC and PIM Reviews
Landing zones accumulate orphaned role assignments for deleted service principals. Run a quarterly audit to find and remove these entries.
Automated Access Reviews: Configure Entra ID to automatically revoke access if not explicitly approved by a lead:
az rest --method POST \
--uri "https://graph.microsoft.com/v1.0/identityGovernance/accessReviews/definitions" \
--body "{
\"displayName\": \"Quarterly Platform Role Review\",
\"scope\": {
\"query\": \"/subscriptions?\$filter=startsWith(displayName, 'lz-')\",
\"queryType\": \"MicrosoftGraph\"
},
\"settings\": {
\"defaultDecision\": \"Deny\",
\"autoApplyDecisionsEnabled\": true
}
}"
Networking Evolution
Upgrading Azure Firewall from Standard to Premium is a zero-downtime operation using the “Easy SKU change” method. Premium is required for TLS inspection and IDPS (Intrusion Detection and Prevention).
Best Practices
- Audit Before Deny: Set new policies to
Auditfor 7 days before switching toDenyto avoid blocking active workloads. - Batch Remediation: When fixing thousands of resources, batch your remediation tasks by resource group to avoid ARM API throttling.
- Canary Subscriptions: Test every AVM module upgrade in a
Sandboxsubscription before applying it to the production management groups.
Sources
You have completed the core series. Use the Day-2 Ops patterns established here to ensure your landing zone remains a reliable foundation for your application teams.
