The incident ticket lands on a Tuesday morning. Production degraded last Friday. You open the Azure portal, pull up the subscription’s Log Analytics Workspace, and realize the diagnostics on the App Gateway were never enabled. The VNet flow logs are going to a different workspace some application team created six months ago. The Key Vault audit logs aren’t going anywhere at all.
You have logs. You just can’t see any of them from the same place at the same time.
This is what “Siloed Observability” costs you in practice: a three-day gap between when something breaks and when you can reconstruct what happened. The fix is a Centralized Logging Architecture—all telemetry from every spoke subscription flowing into a single “Hub” workspace owned by the platform team, with Azure Policy making sure nothing slips through. This article covers how to deploy that hub using Azure Verified Modules (AVM) and wire up the self-healing baseline with Data Collection Rules (DCRs).
1. Centralized Logging Architecture
The core design principle is the Hub-and-Spoke Telemetry Flow. While applications run in spokes, their diagnostic “signals” are streamed back to the management subscription.
Centralized Logging Architecture
This centralization enables cross-subscription KQL queries, allowing you to answer questions like “Show me all failed login attempts across the entire production estate” in sub-second time.
2. Deploying the Hub with Terraform AVM
The AVM module handles opinionated defaults for retention, tiering, and workspace settings so you’re not hand-rolling those decisions every time.
module "log_analytics_workspace" {
source = "Azure/avm-res-operationalinsights-workspace/azurerm"
version = "~> 0.4"
name = "law-prod-mgt-001"
resource_group_name = "rg-prod-mgt-001"
location = "eastus"
# Standard production settings
sku = "PerGB2018"
retention_in_days = 90
daily_quota_gb = 50 # Prevent runaway ingestion costs
}
3. Data Collection Rules (DCR) and Cost Optimization
Logging every packet passing through a firewall is expensive—and most of it is noise. Data Collection Rules (DCRs) let you filter and transform logs at the ingestion point, before they hit your bill.
DCR Ingestion Filtering (Cost Optimization)
By applying a KQL transform to your firewall logs (e.g., dropping ALLOW logs for internal-to-internal traffic), you can reduce your ingestion costs by 40-60% without losing security visibility into high-risk external traffic.
4. Automating Diagnostics with Azure Policy
You cannot rely on developers to remember to enable diagnostic settings. Instead, use a DeployIfNotExists (DINE) policy (from Article 4) to automatically connect every new VNet, Key Vault, and Storage Account to your central hub.
// Example: Policy existence condition for VNet Diag
{
"field": "Microsoft.Insights/diagnosticSettings/logs.enabled",
"equals": "true"
}
5. Visualizing the Platform with Workbooks
Raw data in Log Analytics answers questions; Azure Monitor Workbooks help you ask better ones. Workbooks give you an interactive canvas that combines KQL queries, parameters, and visualizations on a single page—much more useful for investigation than a static dashboard. A standard “Platform Health” workbook should include:
- Top 10 Log Producers: Identify noisy resources that need DCR filtering.
- RBAC Change Audit: Track every permission change across the management group.
- Firewall Denials: Map outbound blocks to specific application spokes.
KQL Pro Tip: Use parse_json(tostring(...)) to extract deep metadata from the Properties field in AzureActivity logs, as they are often double-encoded.
Key Takeaways
- Hub is the Hub: One Log Analytics Workspace for the entire landing zone is the most efficient design for correlation and cost.
- Filter at Ingestion: Use DCRs to drop low-value logs and keep your data clean.
- Policy is the Glue: Use DINE policies to ensure 100% logging coverage across new subscriptions.
- Workbooks over Dashboards: Use Workbooks for deep, interactive investigation and Dashboards for high-level monitoring.
Next Steps:
- Read Security Baseline: Defender for Cloud and Microsoft Sentinel in a Landing Zone to layer Microsoft Sentinel on top of this data for automated threat detection.
- Read Cost Governance in the Landing Zone: Tagging Enforcement, Budgets, and FinOps Automation to implement cost governance queries using this centralized telemetry.
