Most Azure environments have policies. Most of those policies are in audit mode. Audit mode tells you what is wrong after it has happened. It does not stop it from happening. The gap between an Azure environment with audit policies and one with enforced governance is the gap between a compliance report and a compliance posture. This post is about closing that gap.
Why Audit Mode Dominates (And Why It Shouldn’t)
Teams choose audit mode for understandable reasons. Deny mode blocks deployments, and the prospect of a policy blocking a production deployment feels like operational risk. The fear is reasonable. The conclusion — that audit mode is therefore safer — is wrong.
Audit mode as a permanent state produces three predictable outcomes. The first is alert fatigue: compliance dashboards fill with violations that nobody acts on because the volume is too high and the urgency is unclear. The second is governance theatre: policies exist in the portal, engineers know they don’t enforce anything, leadership believes governance is in place, and the environment drifts freely. The third is false confidence: when an audit or incident surfaces the actual state of the environment, the existence of policies makes the governance failure harder to explain, not easier.
The path from audit to enforce is not a one-step switch. It is a process with four stages. First, audit mode to understand the current violation landscape — you need to know what is non-compliant before you can enforce. Second, remediation tasks to bring existing resources into compliance so that switching to enforce mode doesn’t land on a large population of already-non-compliant resources. Third, exemptions defined for legitimate edge cases — the resources that genuinely cannot comply need to be formally documented before enforcement so that they aren’t blocked by the policy switch. Fourth, enforce mode enabled once the exemption scope is understood and the violation baseline is clean.
[DAN: Add your experience of the audit-to-enforce transition — specifically how long you spent in audit mode before switching to enforce, what the exemption process looked like, and what resistance you encountered from engineering teams. This is the practical detail that practitioners need and documentation doesn’t provide.]
The resistance from engineering teams during the transition is real and worth addressing directly. Engineers who have operated in an environment without enforced guardrails have developed workarounds, non-standard configurations, and deployment patterns that depend on the absence of enforcement. Switching to enforce mode surfaces all of these at once, usually during deployments, in the form of deny effects that weren’t anticipated. The organisations that handle this well run enforce mode in development environments for a sprint or two before applying it to production, so that the friction is discovered and resolved before it blocks a production deployment.
Policy Initiative Design
Individual policy assignments do not scale. Below about ten to fifteen individual assignments, the compliance dashboard is manageable and the assignment overhead is acceptable. Beyond that, the compliance view becomes a list of percentages with no meaningful grouping, and managing parameter values per assignment becomes a maintenance burden.
Initiatives — policy sets in the API, initiative definitions in the portal — group related policies under a single assignment. They allow parameter values to be set once at the initiative level, produce a single compliance score per initiative rather than per-policy noise, and make it possible to assign a coherent governance domain rather than a collection of individual rules.
The most useful design principle for initiative grouping is to organise by compliance domain rather than by technical category. A “Network Security” initiative maps to the question a CISO asks and produces a compliance score that is meaningful to a governance conversation. An “NSG policies” initiative maps to how the policies are implemented internally and is less useful for communicating governance posture upward. The grouping should reflect the questions the governance conversation actually asks.
Parameter discipline matters significantly at the initiative level. Hard-coding values — allowed regions, approved SKUs, required tag keys — into policy definitions makes those definitions inflexible. A policy that denies deployment outside UK South and UK West is not reusable for a workload that legitimately needs to run in North Europe. Parameters at the initiative level make a single initiative reusable across environments and management group scopes with different values, which keeps the initiative catalogue small and the maintenance overhead low.
Assignment scope should be as high in the management group hierarchy as the policy applies. Subscription-level assignments multiply as the organisation adds subscriptions — an environment with a hundred subscriptions and twenty subscription-level assignments has two thousand assignment objects to manage. Management group assignments inherit down. Assign at the management group level wherever the policy intent applies across a group of subscriptions, and reserve subscription-level assignment for genuine exceptions.
Remediation Tasks — The Missing Step
Enforce mode prevents new non-compliant resources from being created. It does not fix the resources that were already non-compliant before enforcement was enabled. Remediation tasks are the mechanism for closing that gap — they execute the policy effect against existing non-compliant resources without requiring manual intervention.
The effect type determines whether a remediation task is possible. Deny effects have nothing to remediate — the resource that would have been created was blocked, and the existing non-compliant resources need to be addressed separately. DeployIfNotExists and Modify effects both support remediation tasks. DeployIfNotExists deploys a related resource or configuration when the policy finds a non-compliant resource — for example, deploying a diagnostic settings resource to a storage account that doesn’t have one. Modify adds, replaces, or removes resource properties — for example, appending a required tag or correcting a property value.
Scope remediation tasks carefully. A Modify policy that appends a tag to all resources will modify every resource in scope in a single operation. At subscription scale with hundreds of resources, this can produce a large number of simultaneous resource modifications, which may trigger downstream automation — runbooks that respond to resource modification events, ITSM integrations that log changes, monitoring alerts that trigger on resource state changes. Always run remediation tasks against a test environment or a small representative scope before running them at production scale.
The managed identity requirement is a common and non-obvious failure mode. Remediation tasks execute under a system-assigned managed identity that is created when the policy assignment is created. That managed identity must have the RBAC permissions needed to make the modifications in scope. A remediation task that fails because its managed identity has no permissions over the target resources will fail silently from the perspective of the compliance dashboard — the task shows as failed, but unless you’re monitoring remediation task status, the failure is easy to miss.
[DAN: Add a specific remediation task experience — what the policy was, what the remediation scope was, and any unexpected consequences from running it at scale. The “what happened when we ran it” story is what practitioners want to read.]
Exemptions — The Governance Kill Switch
Exemptions are the mechanism for allowing specific resources, resource groups, or subscriptions to bypass policy evaluation. They are necessary — not every resource can comply with every policy, and the governance model needs a formal way to acknowledge that without creating a permanent violation. They are also the primary mechanism through which governance slowly erodes.
Azure Policy exemptions have two categories. Waiver exemptions acknowledge that a resource is non-compliant but grant an explicit exception for a defined reason and duration — the resource is excluded from compliance evaluation while the exemption is active. Mitigated exemptions assert that the resource meets the intent of the policy through a compensating control — the resource complies with the spirit of the requirement even if it doesn’t comply with the specific policy definition.
Exemption hygiene is where most governance programmes fail. Every exemption needs three things: an expiry date, a business justification, and an owner. Exemptions without expiry dates accumulate indefinitely — an exemption granted in 2022 for a temporary workload may still be active years later, long after the original justification expired and the original owner moved on. The exemption list should be reviewed quarterly and treated as a debt ledger: each entry represents a policy that is not enforced for a defined scope.
The exemption approval process matters as much as the exemption content. An exemption granted unilaterally by a resource team is a very different thing from an exemption that has been reviewed by a platform or governance team, documented with a compensating control, and given a defined review date. The process creates accountability. Without it, the exemption mechanism becomes a way for teams to opt out of governance without oversight.
The Compliance Score as a Lagging Indicator
The Azure Policy compliance score is the most visible governance metric in the portal and one of the least useful in isolation. A 94% compliance score sounds like strong governance posture. It might mean 6% of your most critical production resources are operating outside policy. It might mean 6% of sandbox resources are missing a non-critical tag. The number alone is not enough to assess risk.
The metrics that are actually useful for governance posture are more granular. Compliance score by initiative rather than overall, because an initiative compliance score maps to a compliance domain and carries meaning that an aggregate percentage does not. Compliance trend over time — a score that is improving week-over-week indicates a governance programme that is working; one that is degrading indicates drift or new workloads that aren’t landing compliantly. Compliance by resource type, because some resource types carry more risk than others and their compliance status deserves more weight. Time-to-remediation for new violations, because a governance programme that detects violations quickly and remediates them quickly is more mature than one that detects violations and leaves them open.
The compliance score is a lagging indicator by design — it reflects the state of already-deployed resources. A governance posture that is deteriorating will show it in the trend before it shows in the absolute score. The organisations that catch governance regression early are the ones tracking the trend, not just the number.
Moving from audit mode to enforced governance is a cultural change as much as a technical one. The technical steps are well-defined: audit, remediate, exempt, enforce. The resistance comes from engineering teams who have operated in an environment without guardrails and experience their introduction as friction. The organisations that navigate this well do it by involving engineering in the exemption design process — letting the teams who will be governed help define where the formal exceptions are — rather than imposing policies and then discovering the informal workarounds that fill the gap. The result is governance that holds because the teams it governs helped design it.