governance-at-scale 9 min read
23 May 2026

Governance Drift: How Azure Environments Decay Over Time and How to Prevent It

Every well-governed Azure environment eventually drifts. Here is what drift looks like, how it happens, and the practices that stop it.

Daniel Inman
Daniel Inman Cloud Solution Architect

Practical architecture guidance grounded in delivery, trade-offs, and real platform constraints.

#governance drift #compliance #azure policy #maintenance
Architecture Brief Systems thinking, implementation detail, and a bias toward clarity over noise.

Every Azure environment I have seen that started well-governed has drifted without active maintenance. Not catastrophically — drift is subtle. A policy exemption that was meant to be temporary and was never reviewed. A resource group created outside the landing zone for “just this one project.” An RBAC assignment granted for a contractor who left six months ago. Individually, none of these are a crisis. Cumulatively, they are the difference between a governed environment and a compliant-looking one.

What Governance Drift Looks Like

Four visible signals indicate that an environment is drifting:

Compliance score degrading over time without a corresponding increase in workloads — new policies are being violated faster than they are being remediated. If your Policy compliance dashboard shows a trend line that has been declining for three quarters, the governance process is not keeping pace with what is being deployed into the estate.

Exemption list growing continuously — more resources opted out of governance over time. Exemptions are sometimes appropriate. What is not appropriate is an exemption list that has never had anything removed from it, where the oldest exemptions date to the initial deployment and nobody is certain whether the original justification still applies.

RBAC assignment count increasing without bound — access being granted without a corresponding review and removal process. In an estate with active joiner/mover/leaver management, the assignment count grows when joiners are processed and falls when leavers are processed. An assignment count that only ever grows indicates that the removal half of the process is not happening.

Resources appearing outside managed resource groups — ad hoc deployments that bypass the IaC process. These are resources with no Terraform state, no deployment pipeline, and no guarantee that they conform to the standards the pipeline enforces. They are also the resources most likely to be forgotten about when the team that created them moves on.

The insidious property of drift is that it is hard to see from inside it. An organisation that has been drifting for eighteen months has normalised the drift. The compliance dashboard shows 87% and nobody remembers when it was 96% — or questions whether 87% is acceptable. Drift does not announce itself; it becomes the baseline.

Drift is not a failure of intent. Every individual decision that contributed to it — the exemption granted, the access assigned, the resource created outside IaC — was made by someone with a reason. Drift is a failure of process: governance frameworks without maintenance processes decay, regardless of how well they were initially designed.

The Common Drift Vectors

Exemption accumulation is the most structurally damaging drift vector. Each exemption is individually justified at the time it is created; the aggregate is ungoverned scope that grows with every exemption that is never removed. The root cause is exemption creation without expiry dates — Azure Policy exemptions support expiry dates and most teams do not set them. An exemption with an expiry date either expires and is reviewed, or it is extended with documented justification. An exemption without an expiry date is a permanent carve-out from governance that nobody reviews.

IaC bypass accumulates when the IaC path has too much friction for the urgency of the moment. An engineer needs a storage account for a proof of concept, the pull request process takes three days, and the portal takes three minutes. The result is a resource that exists outside Terraform state, cannot be reproduced by the pipeline, may not conform to the naming convention, the tagging policy, or the network restrictions that the IaC enforces. One such resource is an annoyance; twenty is a pattern that means the IaC process is not fit for the speed at which the organisation operates.

Stale RBAC follows directly from imperfect joiner/mover/leaver management. The failure mode is usually not that the process does not exist — most organisations have an offboarding process — but that the process is manual, depends on someone remembering to raise a ticket, and is not systematically verified. Guest accounts are the worst category: invited for a vendor engagement, granted Reader on a specific resource group, and then never reviewed because guest account lifecycle is outside the normal HR-driven offboarding process.

Policy version lag is the drift vector that is least visible in a compliance dashboard. The policy set that was correct eighteen months ago reflects the requirements and resource types of eighteen months ago. New Azure resource types appear that the policy set does not cover. Compliance requirements evolve — a new control framework requirement, a new internal security baseline — and the policy set is not updated to reflect them. The dashboard continues to show a high compliance score because the policies being evaluated are the old ones; what it does not show is the gap between the old policies and the current requirement.

[DAN: Add the drift vector you’ve seen most commonly in practice and the specific mechanism by which it accumulates. The “this is how it actually happens” story is more useful than the taxonomy.]

The Governance Review Cycle

Drift prevention requires a regular review cycle. Quarterly is the right cadence for most organisations; monthly for highly regulated environments where the compliance evidence obligation means drift cannot be allowed to accumulate for more than a few weeks before it becomes an audit finding.

A quarterly governance review covers five areas:

Policy compliance trend: Is the score improving, stable, or degrading? What are the top five policy violations by count, and have those same violations been in the top five for multiple review cycles? Persistent top-five violations indicate either a policy that is incorrectly scoped and generates false positives, or a genuine compliance gap that is not being remediated.

Exemption audit: Review all active exemptions. Remove any that have expired. For exemptions without expiry dates, require a decision: extend with documented justification and set an expiry date, convert to a permanent exemption with documented permanent justification, or remove. The output of the exemption audit is a smaller, more current exemption list — not a longer one.

RBAC audit: Review all assignments at subscription level and above. Identify and remove stale assignments — users who have left, project assignments for projects that have ended, guest accounts that are no longer active. Confirm that break-glass account credentials are accessible and that the monitoring alert for break-glass sign-in is still active.

IaC drift detection: Run terraform plan or the equivalent against production environments to identify resources that exist outside IaC management. Any resource that appears in the plan as needing to be destroyed (because it exists in the environment but not in the state) is a candidate for either importing into Terraform state or decommissioning.

Resource inventory: Identify resources created in the last quarter outside managed resource groups. These are the most recent IaC bypass events and the easiest to address before they become normalised.

The review does not need to be a large ceremony. A ninety-minute session with the platform team, working from a pre-prepared compliance report that aggregates the Policy dashboard, the exemption list, the RBAC assignment count, and the Terraform plan output, is sufficient for most environments. The discipline is in doing it consistently — not treating it as optional when the team is busy, and not treating a clean dashboard as evidence that the review is unnecessary.

[DAN: Add what a governance review looks like in practice for you — how long it takes, who is in the room, what you produce at the end of it. Even a brief description of the format helps practitioners who are trying to establish this process.]

Automated Drift Detection

Manual review catches drift that has already accumulated. Automated detection catches it sooner and reduces the amount of drift that accumulates between reviews.

Azure Policy compliance alerts are the first automated signal. Configure an Azure Monitor alert rule that fires when the overall compliance score for a policy initiative drops below a threshold — 90% is a reasonable starting point for a mature environment. The alert does not replace the quarterly review, but it surfaces significant compliance degradation in real time rather than at the next scheduled review.

Azure Advisor recommendations overlap with governance in areas like security, access, and cost. Integrating Advisor recommendations into the governance review — treating them as a supplementary input rather than a separate process — avoids the common failure mode where Advisor recommendations are visible but never acted on because they belong to nobody’s process.

Terraform state drift detection is the automated IaC complement to the manual drift detection in the quarterly review. Run terraform plan on a schedule — nightly for critical environments, weekly for others — and alert on any output that includes planned changes when no IaC changes have been committed in the preceding window. A plan that shows changes nobody made in the IaC means something changed in the environment outside the deployment pipeline. This alert is high signal-to-noise when the plan is run against a stable environment; it requires discipline to maintain the Terraform state accurately enough for the alert to be reliable.

[DAN: Add any automated drift detection tooling you’ve implemented beyond the Azure-native options — whether custom scripts, third-party tools, or a specific Azure Monitor alert configuration that has proved useful.]

Governance Debt Repayment

An organisation that has been drifting has governance debt: exemptions to review, stale RBAC to remove, resources to bring under IaC management, policy sets to update against current requirements. The extent of that debt is usually visible in the first governance review that is run with any rigour — it surfaces a backlog that nobody was tracking.

The temptation is to address everything at once. The safer approach is the same as technical debt repayment: prioritise by risk, address the highest-risk items first, and build the maintenance process that prevents new debt accumulating while the existing debt is being cleared. Attempting to address everything simultaneously tends to produce a large coordinated effort that stalls when the team has other delivery commitments, and leaves the estate in an intermediate state that is neither the old posture nor the target.

Prioritisation by risk: security and access policy gaps first — stale RBAC and policy exemptions that expose privileged operations are the attack surface. Cost governance gaps second — untagged resources, resources outside managed resource groups, and workloads without budget alerts represent uncontrolled spend. Operational consistency third — resources outside IaC management represent a reproducibility and audit gap, but not an immediate security or financial exposure.

Set a governance debt target: a compliance score target to reach within a defined period, an exemption count ceiling that the estate should not exceed, an RBAC assignment count limit per subscription. Track progress against the target in the quarterly review. A target that is not tracked tends to be treated as aspirational; a target with a named metric in the governance review becomes a forcing function for consistent progress.


Governance drift is inevitable without active prevention. The organisations that maintain strong governance posture are not the ones with the best initial governance design — well-designed governance frameworks drift just as quickly without maintenance. They are the ones that built a review cycle alongside the governance framework and treated that cycle as non-optional, regardless of delivery pressure. The maintenance process is as important as the policy set.

Daniel Inman
About the Author

Daniel Inman

Cloud Solution Architect focused on Azure, platform design, and translating technical complexity into decisions that teams can actually execute.

Previous The Azure Governance Conversation Nobody Wants to Have (Until It's Too Late) Next Governance as a Competitive Advantage, Not a Compliance Tax