UK Public Sector · Reference Operating Model

Governing AI-Enabled Public Services at Scale

An interactive reference operating model — for the assured delivery of secure, ethical, safe and robust AI services.
Version 1.0 May 2026 · Reference release
Source synthesis: Level 3 V2.0 / Strategic AI Gov / AAQ v4.3 / RMT / RA

The Reference Operating Model

Four interlocking layers govern AI from ambition to live operation. Strategic intent sets the conditions, tactical capabilities turn intent into reusable assets, operational delivery runs services safely, and continuous assurance evidences trust. Every cell maps to one or more of the source frameworks.

Strategic
Intent & Ambition
10 Foundational PrinciplesRisk appetite, public trust, ethics Risk Appetite & TieringR1 Minimal → R4 Eager Central Deployment GuidanceD1–D7 commitments; cross-government floor
Leadership ChoiceGovern strategically
Public AccountabilityTransparent records, ATRS, SIRO
Workforce & CapabilityAI literacy, multi-disciplinary teams
Tactical
Capability & Platforms
Pre-approved RoutesCommon AI use-case patterns
Shared Evaluation StandardsE2VT & test harnesses
Model RegistryVersioning, provenance, exit plan
Model Update GovernanceG2 · five triggers, tiered response
Reusable PatternsOpen source, open standards
Procurement & Supplier MgmtVendor lock-in, model-change risk
Operational
Delivery & Service
Pilot → MB → Beta → LivePhase gates with AI assurance Qs
Service Standard (14)User need, inclusion, reliability
Human-in-the-loopOverride, appeal, fallback
Monitoring & DriftHallucination, accuracy, fairness
Incident Mgmt & Kill-switchRollback, escalation, comms
Continuous
Assurance & Evidence
E2VT LoopEvaluate · Evidence · Validate · Trust
DPIA / EqIA / AIAPrivacy, equality, AI impact
Red-team & AdversarialPrompt injection, abuse cases
Audit Trail & Decision LogReproducible reassessment
RACI & Assurance BoardNamed accountable owner
1
Pilot
Is this worth pursuing? Can we identify the risk?
2
Managed Beta
Safe with limited users under controlled conditions?
3
Beta
Safe with real users, real data, real constraints?
4
Live
Safe, reliable, continuously assured at scale?
Strategic
Tactical
Operational
Assurance
Pilot
Managed Beta
Beta
Live
“Strategic AI governance is not the brake on innovation. It is the infrastructure that allows innovation to travel safely, at speed, and with public trust.” — Strategic AI Governance at Scale, GCAIO / DSIT

How to use this model

  1. Classify the AI use case using Risk Tiering. R-tier sets the proportionate control depth.
  2. Sequence the work through the four lifecycle phases — do not collapse pilot results into live readiness.
  3. Apply the E2VT loop at each phase gate to generate evidence rather than assertion.
  4. Govern using the RACI — every artefact has a named accountable owner.
  5. Stress-test using the failure-mode matrix before formal service assessment.
  6. Evidence using the templates: service story, evidence tracker, model card, phase decision log, incident report.

10 Foundational Principles

Drawn from the Strategic AI Governance position and aligned with the AI Risk Management Toolkit, these are the non-negotiable principles every AI-enabled public service must satisfy regardless of risk tier.

P1
User need first
AI is justified by a real user or operational need, not by technological novelty. Non-AI alternatives have been considered.
P2
Proportionate to risk
Controls scale with the R-tier. Vulnerable cohorts and high-consequence decisions demand independent assurance.
P3
Meaningful human control
A named person can pause, disable, override, appeal or roll back any AI component in production.
P4
Transparency
Users know when AI is involved, in language they understand. The service is recorded on the Algorithmic Transparency Recording Standard.
P5
Fair and inclusive
Differential impact is tested before launch and monitored after. Accessibility evidence covers the full user base.
P6
Secure by design
Threat-modelled for prompt injection, data exfiltration and adversarial misuse. Patched continuously.
P7
Privacy preserving
DPIA completed. Minimum personal data principle applied to training, prompts and logs.
P8
Evidenced, not asserted
Every assurance claim is grounded in evidence held in a discoverable evidence pack — not in one person’s head.
P9
Reliable in operation
Runbooks, monitoring, drift alerts, supplier-failure fallback and rollback criteria all exist and have been rehearsed.
P10
Accountable & reassessable
Single SRO. Decision log maintained. Service is reassessable on amber review or material change.

Risk Tiering and Categories

Risk tiering is the cornerstone of proportionate assurance. It converts a binary “is this AI risky?” into a four-band decision that drives the depth of evaluation, the seniority of sign-off, and the cadence of re-certification.

Risk Tiers (R1–R4)

R1
Low
Low consequence
Non-personal content search; internal lookups; low-stakes summarisation.
  • Basic offline evaluation; usage logs
  • Lightweight privacy & security checks
  • A/B testing only after sign-off
R2
Medium
Staff productivity / non-critical user advice
Drafting assistants; meeting summarisation affecting decisions; non-binding guidance.
  • Expanded offline eval; calibration; bias screen
  • Red-team lite
  • Monitoring SLOs and drift alerts
R3
High
Eligibility, triage, sensitive advice
Benefits triage; case-prioritisation; advice to citizens with real consequence.
  • Formal validation protocols (E2VT); SME panels
  • Strong privacy & PII controls
  • Comprehensive red-teaming; HITL; change-freeze windows
R4
Critical
Safety, legal exposure, vulnerable cohorts
Decisions with legal or safety consequence; vulnerable cohorts; statutory functions.
  • Independent assurance & external audit
  • Formal validation campaigns (E2VT); rigorous canary with kill-switch
  • Full traceability; quarterly re-certification

The Nine Risk Categories

Aligned with the AI Risk Management Toolkit (D02 v1). Every assessed AI use case is screened against these nine categories to identify the dominant risk profile and the assurance disciplines required.

Financial
Financial losses from increased operational costs, maintaining AI solutions, or financial implications of automated decisions.
Legal & regulatory compliance
Failing to meet legal frameworks — data protection, equality law, EU AI Act, UK GDPR and evolving AI-specific laws.
Appropriate transparency & explainability
Whether users and those impacted understand how the AI works, its decision-making, and that they are interacting with AI.
Fairness
Unfair, biased or discriminatory outcomes; impact on individual rights; compliance with equality laws.
Accountability & governance
Lack of clear accountability; ineffective governance; absent risk management processes, roles and communication.
Contestability & redress
Ability of users or affected parties to contest outputs or seek redress — accessible, transparent mechanisms.
Technical robustness
Reliable functioning and sustained performance — data quality, model reliability, behaviour under unexpected conditions.
Security
Threats arising from deployment — data poisoning, leakage, cyber-attacks, prompt injection and adversarial misuse.
People & the environment
Impact on physical and mental wellbeing, safety of critical infrastructure, and the environment.

Tier × Category cross-walk

Use this view to choose proportionate controls for each of the nine categories at each tier.

CategoryR1 LowR2 MediumR3 HighR4 Critical
FinancialCost logVfM checkRecurring VfMIndependent VfM audit
Legal & regulatory complianceLegal screenDPIADPIA + EqIADPIA + EqIA + external counsel
Appropriate transparency & explainabilityInternal noteUser noticeATRS entryATRS + published model card
FairnessBias awarenessBias screenSub-group testingExternal fairness audit
Accountability & governanceProduct ownerSRO namedAssurance boardBoard + minister sighting
Contestability & redressEmail routeHelp channelDefined appealStatutory appeal & ombudsman
Technical robustnessOffline evalCalibration + driftFormal validation + canaryIndependent test campaign
SecurityBaseline cyber hygieneThreat model + secrets mgmtRed-team + prompt-injection testsIndependent pen-test + continuous monitoring
People & the environmentLowWellbeing checkVulnerability & sustainability screenSafeguarding partnership + env. impact assessment

Service Lifecycle & Phase Gates

A pilot is not a beta. A beta is not live. Each phase asks a different assurance question and demands different evidence. This view encodes the Level 3 lifecycle, the AI Assurance Questionnaire stages and the Service Standard expectations into one progression.

Phase 1   Pilot — is this worth pursuing?
Primary assurance question

Is the AI use case valid? Can we identify the risk?

AI-specific focus

User need; feasibility; risk discovery; non-AI alternatives appraised.

Expected evidence
  • Use-case rationale (problem → AI → outcome)
  • Prototype results; failure-category log
  • Initial risk register & data assessment
  • Comparison with rules-based / manual alternative
Assessor-style questions
  • What user need does the AI capability serve?
  • What non-AI options did you consider?
  • What risky assumptions did the pilot test?
  • What would cause you to stop or redesign the AI use case?
  • Who owns the AI risk at this phase?
Decision

Proceed to managed beta preparation · Pivot · Stop

Phase 2   Managed Beta — safe under controlled conditions?
Primary assurance question

Can the service work safely with limited users under controlled conditions?

AI-specific focus

Human control; monitoring; security & privacy; performance; accessibility; fallback.

Expected evidence
  • Evaluation results against representative scenarios
  • DPIA, threat model, model card
  • Test logs; support model; rollback plan
  • Accessibility & fairness testing pre-launch
Assessor-style questions
  • How do users know when AI is involved?
  • Where does meaningful human control happen?
  • What happens when the AI output is wrong?
  • How are prompt injection, adversarial inputs or misuse handled?
  • What is the rollback plan?
Decision

Proceed to private/public beta · Remediate · Pause

Phase 3   Beta — safe with real users?
Primary assurance question

Can the service work safely with real users under controlled conditions?

AI-specific focus

Live drift monitoring; supplier risk; sub-group performance; incident response readiness.

Expected evidence
  • Pre-production performance dashboard
  • External-AI integration test results & versioning controls
  • Rollback rehearsal log; incident runbook
  • Differential-impact monitoring plan
Decision

Proceed to live readiness · Remediate · Pause

Phase 4   Live — safely operating at scale?
Primary assurance question

Can the service operate safely, reliably and continuously at scale — including when the AI degrades, fails or changes?

AI-specific focus

Drift monitoring; incident response; model updates; operational ownership; continuous assurance.

Expected evidence
  • Runbooks & on-call escalation routes
  • Live dashboards: accuracy, fairness, hallucination, cost
  • Audit logs; retraining / update controls
  • Supplier-failure fallback; sustainability evidence
  • Quarterly re-certification record (R3/R4)
Assessor-style questions
  • Who can pause, disable or roll back the AI component?
  • How are model updates approved?
  • What happens if the AI supplier fails?
  • Is AI still the most cost-effective way to meet the user need?
  • What evidence will be maintained for reassessment or amber review?
Decision

Go live · Limited live · Delay · Reassess

E2VT & Service Standard alignment

Trust cannot be asserted. It has to be evidenced. The E2VT loop — Evaluate, Evidence, Validate, Trust — is the operational discipline applied at every phase gate. It maps onto the 14 Service Standard points so that AI-specific evidence travels through the same assessment route as any other digital service.

Evaluate

Test models rigorously against defined criteria. Are outcomes good enough? Is real-world impact assessed?

Evidence

Ground every assurance claim in evidence, not assumption. Controls and requirements demonstrably met.

Validate

The system meets user, policy and regulatory needs. Are we solving the right problem in the right way?

Trust

End-to-end workflows defined; monitoring & incident response live; transparency artefacts published.

14 Service Standard points — AI-enabled evidence

Standard pointAI-enabled service evidenceTypical owner
1. Understand users & needsResearch showing AI solves a real user need (not a tech preference)User research lead
2. Solve a whole problemEnd-to-end journey showing where the AI boundary starts and stopsProduct / design lead
3. Joined-up channelsAssisted-digital, offline and non-AI fallback journeys definedService designer
4. Simple to useExplanations, confidence language, user-facing control over AI outputsInteraction / content lead
5. Everyone can use itAccessibility, inclusion and bias testing covering non-standard speech, dialects, disability, low digital confidenceDesign / research lead
6. Multidisciplinary teamNamed AI, data, security, policy and service ownersService owner
7. Agile ways of workingIteration log, AI learning loop, decision recordsDelivery manager
8. Iterate and improveEvaluation cycles, controlled model-update processProduct / AI lead
9. Secure and privateDPIA, threat model, data minimisation, prompt-injection testingTech / security lead
10. Define successService KPIs plus AI accuracy, fairness and safety metricsPerformance analyst
11. Right tools and techAI option appraisal, model-selection rationale, exit planTechnical architect
12. Open sourceCode, prompts, configs, reusable patterns where appropriateTech lead
13. Open standards / componentsReuse of common platforms, open standards, shared patternsTech / design lead
14. Reliable serviceRunbooks, monitoring, fallback, rollback, incident processService / ops lead

Governance Structure & RACI

Governance must be operational, not ceremonial. Each artefact has a single accountable person, a working responsible team, and clearly defined consulted and informed parties. The model layers a strategic AI Governance Board over departmental assurance boards and product-level day-to-day controls.

Three-tier governance

L1
AI Governance Board (strategic)
Sets risk appetite, approves R-tier policy, owns ATRS publication, commissions external audits. Chaired by senior accountable officer; SIRO and DPO members.
L2
Assurance Board (departmental)
Approves go-live, holds phase-gate decisions, owns evaluation standards and shared platforms, runs assurance clinics (Level 1–3).
L3
Product / Service team (operational)
Executes E2VT, maintains evidence pack, runs monitoring, manages incidents. Day-to-day accountable owner is the Service / Product Owner.

Operating RACI

AreaAccountable (A)Responsible (R)Consulted (C)Informed (I)
Risk TieringService / Product OwnerPrincipal Technologist, Safety LeadLegal, DPOTeam
Evaluation PlanPrincipal TechnologistML Eng, Data Scientist, QADomain SMEs, UX ResearchersAI Governance Board
Data ProvenanceAI / Data LeadML Eng, Data ScientistDomain SMEs, UX ResearchersAI Governance Board
Red TeamSafety LeadSecurity, Red-TeamersProduct Owner, LegalAll
DPIA / EqIADPOProduct Owner, LegalSIRO, User ResearchAssurance Board
Go-Live decisionSROService / Product OwnerAssurance BoardStakeholders
Monitoring & IncidentsService / Product OwnerSRE / On-callSafety Lead, CommsAll
Re-certificationSROPrincipal TechnologistAssurance BoardAI Governance Board

Towards observable maturity

The target operating model is supported by four operational dashboards/registries — each is a capability that should exist at department or cross-government level:

Generic AI Dashboard
Service-level view of uptake, accuracy, cost, incident count.
Model Registry
Versioning, provenance, owner, R-tier, ATRS link, exit plan.
Hallucination Detection
Output quality monitoring with sub-group breakdowns.
Incident Management
Severity, escalation, comms, post-incident learning loop.

AI-enabled Failure-Mode Matrix

The bigger assessment risk is rarely whether the model works. It is whether the team can show why AI is appropriate, how harms are detected, how humans remain in meaningful control, and how the service operates reliably in live conditions. Stress-test these failure modes before assessment.

Failure modePilot interventionManaged beta interventionBeta / live interventionAssessment risk (points)
AI not justified by user needCompare AI vs non-AI optionsValidate with controlled usersReassess VfM & user impact1, 2, 11
Model output wrong or misleadingIdentify failure categoriesTest against representative scenariosMonitor accuracy & incidents9, 10, 14
Bias or exclusionIdentify protected-characteristic risksAccessibility & fairness testingMonitor differential impact5, 9, 10
Prompt injection / adversarial misuseThreat-model attack pathsTest adversarial prompts & abuseMonitor abuse, patch controls9, 14
Weak human controlDefine human role & handoffsTest decision handoff & overrideAudit human review & escalation4, 6, 9
Model driftDefine baselineMonitor pre-live quality changesDrift alerts & retraining governance8, 10, 14
Supplier / model unavailableIdentify dependencyTest fallbackOperate fallback & incident route11, 14
Governance unclearName accountable ownerConfirm governance gatesMaintain live decision log6, 8, 14
Evidence scatteredBuild evidence mapRehearse assessor narrativeMaintain evidence packAll 14

Answer-quality scoring (for reference)

ScoreMeaningExample
1Assertion only“We have tested this.”
2Some evidence“We ran testing and have results.”
3Evidence linked to risk“Testing showed these risks and these controls are in place.”
4Evidence + control + ownership + learning“Testing showed X; control Y is owned by Z; we revise on amber review.”

Operating Templates

Eight reusable templates that turn the operating model into daily artefacts. Edit in-browser, then use the buttons to print or copy. Together they form the minimum evidence pack any AI-enabled service should hold.

T1 · One-page Service Story
Pilot → Live
T2 · Risk-tier classification
All phases
T3 · Evidence Tracker (14 Standard points)
Beta → Live
Standard areaClaimEvidence locationOwnerRAG
T4 · Failure-mode card
Pilot → Live
T5 · Model Card
Managed beta → Live
T6 · Phase-gate decision log
Every gate
T7 · AI Incident Report
Live
T8 · Service RACI worksheet
Setup
ActivityARCI

Centrally Endorsed Guidance for Responsible Deployment

Departmental approaches to AI assurance are inconsistent today — some teams over-engineer controls, others under-control. This section sets a single, written and centrally endorsed position covering responsible deployment, model-update governance, accuracy expectations and acceptable risk thresholds. It is designed to unblock safe experimentation by making the rules explicit, so teams stop guessing and risk-aversion stops acting as a default veto.

“A consistent, written deployment standard means innovation can travel safely across government. Where the rules are clear, departments can move faster — not slower.” — Reference Operating Model, central guidance principle

G1 · Responsible deployment standard

Every AI-enabled service deployed in the UK public sector must satisfy the following seven mandatory commitments, regardless of risk tier or department. Anything below this floor is not a deployment, it is an experiment and must remain inside a controlled environment.

D1
Named SRO & risk tier
A single accountable owner and an explicit R1–R4 classification recorded against the service. Aligns: Risk Toolkit RACI; AAQ governance stage.
D2
DPIA / EqIA completed
Privacy and equality impact assessments signed off before any real user touches the service. Aligns: AAQ Ethics; Risk Toolkit categories Legal & Fairness.
D3
ATRS entry published (R2+)
Algorithmic Transparency Recording Standard entry live before public exposure for R2 and above. Aligns: Risk Toolkit Transparency category.
D4
Meaningful human control
Override, appeal, fallback and kill-switch routes defined, tested and owned. Aligns: AAQ governance; Service Standard pt 4.
D5
Monitoring & drift plan
Live dashboards for accuracy, fairness, hallucination and cost — with thresholds defined before launch. Aligns: Toolkit Technical Robustness; AAQ Lifecycle.
D6
Inclusion evidence
Accessibility and sub-group testing covering dialects, accents, low digital confidence and disability. Aligns: Research Assessment; Service Standard pt 5.
D7
Evidence pack & decision log
Single discoverable evidence pack and phase-gate decision log; not held in one person’s head. Aligns: Level 3 Evidence Tracker; AAQ v4.3.

G2 · Model update governance

Model behaviour changes — either intentionally (retrain, prompt edit, version bump) or externally (supplier model swap, fine-tuning). All four trigger the same governance path. The depth of the path is set by R-tier.

Update typeTriggerR1R2R3R4
Prompt / configuration changeTeam-initiatedPeer review & logPeer review + regression suiteChange Advisory Board + canaryCAB + change-freeze respected + canary + rollback rehearsal
Retrain / fine-tuneDrift, new data or scheduledBaseline diffEval pack rerunFull E2VT rerun + SME reviewIndependent re-validation + external audit hook
Version bump (own model)Release processSemver + changelogEval rerun + SLO checkCanary 5 → 25 → 100% with kill-switchQuarterly re-certification supersedes
Supplier model changeProvider-initiatedNotify ownerRegression + bias deltaPause flow + revalidate + ATRS updateHalt service until revalidated & signed off by SRO
Knowledge base / RAG updateContent teamQuality spot-checkHallucination diff vs baselineSME panel + factuality evalSME + independent eval, with citation audit

Hard rule for all tiers: no silent updates. Every model change leaves a record in the Model Registry, the decision log and (R2+) the ATRS entry. Where the change is provider-initiated, the service operates under a presumption to pause until revalidation completes.

G3 · Accuracy expectations

Accuracy is not a single number — and "high accuracy" is not a control. The central position is that each service must declare four metrics with explicit thresholds, set before launch and proportionate to tier. Below the lower threshold the service must pause; in the amber band it must remediate; above the green threshold it may operate.

Metric familyWhat it measuresR1 floorR2 floorR3 floorR4 floor
Task accuracyCorrect outputs on golden / held-out set≥ 70%≥ 85%≥ 92%≥ 95% with CI reported
Sub-group parityMax gap in accuracy across protected groups≤ 15 pp≤ 10 pp≤ 5 pp≤ 3 pp with rationale
Hallucination / factualityUnsupported claims on representative prompts≤ 10%≤ 5%≤ 2%≤ 1% with citation audit
Refusal & safetyCorrect refusal of unsafe / out-of-scope queries≥ 80%≥ 90%≥ 95%≥ 98% with red-team set

Thresholds are reference defaults. Departments may set stricter values for the same tier; they may not relax them without AI Governance Board approval. pp = percentage points.

G4 · Acceptable risk thresholds

Mapped to the Orange Book-aligned appetite scale (Averse → Eager) used in the AI Risk Management Toolkit, against the nine risk categories.

Risk categoryCentral positionFloor (will not accept below)Trigger for pause
FinancialOpenProject not solvent without AI subsidyRun-rate > 1.5× business case
Legal & regulatory complianceMinimalAny unresolved legal challenge with material likelihoodDPIA / EqIA finding above ‘low’ not remediated
Appropriate transparency & explainabilityCautiousUsers cannot tell they are using AIATRS entry missing / out of date at R2+
FairnessMinimalSub-group parity gap > G3 floorLive monitoring shows widening gap over 2 cycles
Accountability & governanceCautiousNo named SRO; no decision logSRO change without handover within 10 working days
Contestability & redressCautiousNo redress channel definedRedress SLA breach rate > 5%
Technical robustnessOpenBelow G3 task-accuracy floorDrift alert sustained for > 1 cycle without action
SecurityAverseUnmitigated prompt-injection or data-exfil riskAny sev-1 incident; CVE on the model path
People & the environmentMinimalForeseeable harm to vulnerable cohort without safeguardSafeguarding incident, or sustainability budget breach

G5 · Pre-approved deployment routes

To remove the most common cause of departmental risk-aversion — uncertainty about what is allowed — the following pre-approved patterns may be deployed at R1/R2 using the reference controls only, without bespoke board review:

Anything outside the pre-approved patterns, or at R3/R4, requires departmental Assurance Board sign-off plus AI Governance Board notification.

G6 · Consistency across government

To prevent departmental drift, the following are common across all departments and may not be re-defined locally:

Risk tiers R1–R4
Single definition used across HMG. Departments may add sub-tiers, not redefine the four.
Nine risk categories
AI Risk Management Toolkit D02 v1 categories used verbatim.
Seven deployment commitments (D1–D7)
Floor for any live deployment; departments may add, not subtract.
G3 accuracy floors
Reference minima per tier; stricter allowed, looser requires AI Gov Board approval.
Model-update governance (G2)
Same five triggers, same four tier responses across departments.
Security baseline
Appetite = Averse for security; common minimum across HMG.
Fairness floors
Sub-group parity gap thresholds in G3 are common minima.
Incident notification
Sev-1 incidents notified to SIRO, DPO and AI Gov Board within 24 hours.

G7 · How this navigates risk aversion

The most common failure pattern observed in departmental practice is not deploying badly — it is not deploying at all, because no one is sure what "good enough" looks like. This guidance addresses each pattern:

Symptom of risk aversionCentral guidance response
“We don’t know what controls are required.”D1–D7 deployment commitments + Tier×Category cross-walk are written and endorsed.
“What accuracy is good enough?”G3 declares default floors per tier; departments may set stricter, not looser.
“How do we change the model safely?”G2 sets the five update triggers and tiered response.
“What if the supplier changes the model?”G2 supplier-change row + ‘presumption to pause’.
“Do we need board sign-off for every use case?”No — G5 pre-approved patterns are deployable without bespoke review at R1/R2.
“Our department does it differently to others.”G6 fixes the non-negotiable common ground; local addition allowed, redefinition not.
“How do we know when to stop?”G4 pause triggers + G3 amber/red thresholds are explicit, not judgemental.

G8 · Where this guidance lives in the operating model

Framework Synergy & Critical Alignment

The five source frameworks each address a slice of the AI assurance problem. The reference operating model integrates them so a single piece of evidence satisfies multiple frameworks simultaneously — reducing duplication and exposing genuine gaps.

Strategic AI Governance
Sets ambition & risk appetite
10 principles, R1–R4 tiering, leadership choice, target operating model components.
Level 3 Intervention
Service-assessment readiness
Lifecycle integrity (pilot ≠ beta ≠ live), failure-mode drill, assessor-role play, evidence challenge against the 14 points.
AI Assurance Questionnaire v4.3
146 evidenced questions
Question bank mapped to project governance stage, lifecycle stage, risk tier and ethical dimension.
AI Risk Management Toolkit
Risk identification & control
Nine risk categories (D02 v1), Orange Book-aligned appetite scale, RACI, control taxonomy and mitigation patterns.
Research Assessment
AI in user research
Hallucination checks on research outputs, synthetic-user validation, dialect & accent coverage.

Cross-framework synergy matrix

Where the same operating-model area is informed by multiple frameworks, alignment is genuine. Gaps below are where the reference operating model adds bridging guidance.

OPM areaStrategic GovLevel 3AAQ v4.3Risk ToolkitResearch
Risk appetite / tiering● primary
Service lifecycle gates● primary
Evidence & questioning● primary
Failure modes & controls● primary
User research integrity● primary
Operating dashboards● primary
RACI & accountability● primary

● primary contributor · ○ reinforcing contributor · – not in scope

Critical observations

“The hard part is not access to AI. The hard part is operating model maturity. Without clear governance, assurance and accountability, AI adoption fragments.” — Strategic AI Governance at Scale