The Governance Codex — AI in Clinical Trials

§00

The Auditor's Foreword

Why a governance playbook, and why now — explained by someone whose job is to ask the hard questions after the trial is over.

◆ What you'll understand

Why regulators now treat AI as something to be governed, not just adopted
How a federal reviewer reads an AI-enabled submission
What "credibility" means and why it outranks accuracy

◆ What you'll be able to do

Frame AI governance as a patient-safety and evidence-integrity question
Explain to a non-technical board why governance is non-optional
Use this playbook as a working checklist, not a brochure

Picture the moment a marketing application lands on a reviewer's desk and, somewhere inside it, a sentence reads: "Eligibility was determined with the support of a machine-learning model." The reviewer does not panic. The reviewer asks a sequence of very old questions in a very new context: What did the model decide? How much did its answer matter? And can you prove it was right — not once, but reliably?

That is the whole game. AI does not get a special pass and it does not get a special punishment. It gets the same scrutiny every other piece of evidence in a regulated trial receives: show your work, show your controls, and show that a human remained accountable for the outcome. The agencies have now said this out loud. In January 2025 the FDA published its first dedicated draft guidance on using AI to support regulatory decisions, built around a risk-based credibility assessment. One year later, in January 2026, the FDA and EMA jointly issued ten Guiding Principles of Good AI Practice in Drug Development — a shared transatlantic foundation for every guidance still to come.

FED
AUD

From the reviewer's chair

I am not impressed by a model that is 98% accurate. I am impressed by a sponsor who can tell me where the model sits in the decision, what happens if it is wrong, and what evidence they gathered in proportion to that risk. Governance is simply the discipline of being able to answer those three questions before I ask them.

How to use this playbook

This is a working document, not a reference shelf. Each section opens with plain-language objectives framed in Bloom's taxonomy — from remembering a definition to creating your own governance artifacts — so you always know what you should be able to do when you finish. Interactive tools let you classify your own AI tools, walk the credibility steps, and score your organization's readiness. Your progress saves automatically; the Resume button in the top bar brings you back to exactly where you stopped. When you reach the end, you can issue yourself a completion certificate.

A note on scope. This playbook is an educational synthesis of public frameworks, offered free by Aurelyn AI Clinical. It is not legal advice and it does not replace the source documents, your regulatory affairs team, or direct engagement with a health authority. Every framework cited is referenced in §10.

§01

The 2026 Regulatory Landscape

A guided tour of the rules that now apply to AI in trials — what each one is for, and how they fit together rather than compete.

◆ What you'll understand

The major AI-relevant frameworks active in 2026
How voluntary frameworks and binding law differ in force
Where these instruments overlap and reinforce each other

◆ What you'll be able to do

Name the framework that governs a given AI question
Distinguish "good practice" from "legal requirement"
Brief a colleague on the 2025–2026 regulatory timeline

The single most common mistake is to assume these frameworks are rivals you must choose between. They are layers. NIST gives you the management system. The FDA gives you the evidence standard for regulatory decisions. The FDA–EMA principles give you the shared values. And data-integrity and privacy rules give you the non-negotiable floor. A mature program speaks all four at once.

The instruments, in plain terms

Instrument	What it is	Force	What it governs for you
NIST AI RMF 1.0 + Generative AI Profile, 2024	A voluntary risk-management framework structured around four functions: Govern, Map, Measure, Manage.	Voluntary	The operating system for your whole AI governance program.
FDA Draft Guidance Jan 2025 · Docket FDA-2024-D-4689	"Considerations for the Use of AI to Support Regulatory Decision-Making" — introduces a 7-step, risk-based credibility assessment tied to a model's context of use.	Draft guidance	How to prove an AI output is trustworthy enough for a specific regulatory decision.
FDA–EMA Guiding Principles Jan 14, 2026	Ten jointly issued principles for good AI practice across the medicine lifecycle.	Principles	The shared expectations that future binding guidance will be built on.
ICH E6(R3) · GCP	The modern Good Clinical Practice standard, written to accommodate digital and computerized systems.	Adopted GCP	The trial-conduct umbrella every AI tool still operates under.
21 CFR Part 11	US rule for electronic records and electronic signatures — trustworthiness, audit trails, access control.	Regulation (US)	The record-keeping floor for any system AI touches.
EU AI Act	Risk-tiered horizontal AI law; many clinical and safety uses fall in the "high-risk" tier.	Binding law (EU)	Hard legal obligations when you operate in or ship to the EU.

How we got here — the timeline that matters

Jan 2023NIST releases AI RMF 1.0The four-function framework (Govern · Map · Measure · Manage) becomes the de-facto global standard for managing AI risk.
Jul 2024NIST Generative AI Profile (AI 600-1)Extends the RMF to large language models and agentic systems — the model classes now appearing in trial operations.
Sep 2024EMA Reflection Paper on AI in medicinesEurope sets out transparency, data quality, and stakeholder-engagement expectations across the lifecycle.
Jan 2025FDA's first dedicated AI draft guidanceIntroduces context of use and the 7-step credibility assessment framework for AI supporting regulatory decisions.
Apr 2025Public comment period closesIndustry, academia, and patient groups flag how to apply the steps to generative and LLM-based tools.
Jan 14, 2026FDA & EMA issue 10 joint Guiding PrinciplesA transatlantic foundation for harmonized AI guidance — the centerpiece of §06 of this playbook.
Expected 2026FDA final guidance anticipatedExpected to incorporate comment-period input and align with the January 2026 joint principles.

The through-line

Read the timeline closely and one word recurs in every document: risk-based. Nobody is asking you to validate a low-stakes scheduling assistant to the same standard as a model that influences who is dosed. The entire system is built to make the amount of proof you owe scale with the amount of harm a wrong answer could cause. Master that one idea and the rest of this playbook becomes common sense.

§02

The Aurelyn Governance Model

NIST's four functions, translated from abstract risk language into concrete clinical-trial controls. Open each function to see what it asks of you.

◆ What you'll understand

The four NIST functions and what each is for
Why Govern is the cross-cutting function that holds the others
How each function maps to a real trial control

◆ What you'll be able to do

Place any governance activity into the right function
Identify which function your organization is weakest in
Assign ownership for each control by ID (GOV / MAP / MEA / MAN)

NIST gives us four verbs. They are not a sequence you finish — they are a loop you keep running. Govern sits in the middle and feeds the other three: it is the culture, the policies, and the people. Map sets the context. Measure tests and evaluates. Manage acts on what you find. Below, each function is expanded into the specific controls a clinical-trial program should own. Select a function to reveal its controls.

GOV-01 Stand up a cross-functional AI governance body with clinical, regulatory, quality, data-science, ethics, and IT security at the table.
GOV-02 Maintain a written AI policy that defines acceptable uses, prohibited uses, and the approval path before any tool touches trial data.
GOV-03 Assign a named accountable owner for every AI system — a person, never "the vendor."
GOV-04 Map and actively manage AI-related legal and regulatory obligations (Part 11, GCP, EU AI Act, privacy law).
GOV-05 Govern third-party and "AI-as-a-service" risk: contracts, audit rights, and documentation flow-down to vendors.
GOV-06 Train staff on AI literacy and on this policy, and record that they were trained.

MAP-01 Maintain a living inventory of every AI system, model, and AI-enabled vendor service in use across studies.
MAP-02 For each, write a one-paragraph context of use: what it does, where in the trial, and what decision it supports.
MAP-03 Identify affected parties — participants, investigators, reviewers — and the potential impact on each.
MAP-04 Set risk thresholds and categorize each tool before deployment, not after an incident.
MAP-05 Document the data the system relies on and whether it is fit-for-use for this population and question.

MEA-01 Run testing, evaluation, verification & validation (TEVV) sized to each tool's risk tier.
MEA-02 Evaluate the whole system, including the human reviewing the output — not the model in isolation.
MEA-03 Measure the trustworthiness characteristics in §05 with metrics appropriate to the context of use.
MEA-04 Test for bias and subgroup performance across the populations the trial will enroll.
MEA-05 Document methods, datasets, results, and any deviations so the evidence is independently reviewable.

MAN-01 Prioritize and treat identified risks; document the rationale where you accept rather than mitigate.
MAN-02 Define and preserve meaningful human oversight — a person who can question, override, and is accountable for the call.
MAN-03 Monitor deployed models on a schedule for data drift and performance decay; re-evaluate periodically.
MAN-04 Run an incident process: detect, respond, recover, and communicate when an AI system misbehaves.
MAN-05 Manage change control — any model update is a change that re-enters Map and Measure.

Why Govern is in the center

Map, Measure, and Manage are activities. Govern is the gravity that keeps them in orbit. Without a governance body, an owner, and a policy, your validation evidence becomes a pile of disconnected PDFs that no one can find during an inspection. Build Govern first — everything else has somewhere to attach.

§03

Context of Use & Model Risk

The single most important calculation in the FDA framework — made interactive. Tell the matrix how much your model matters and what happens if it's wrong, and it tells you how much evidence you owe.

◆ What you'll understand

What "context of use" (COU) precisely means
The two ingredients of model risk: influence & consequence
Why two models with the same accuracy can owe very different proof

◆ What you'll be able to do

Write a defensible context-of-use statement
Classify any AI tool into a risk tier using the matrix
Right-size the rigor of your credibility evidence

Before you can assess an AI model, the FDA asks you to pin down its context of use — the specific role and scope of the model: what question it answers, and how its output will be used in a decision. "A model that flags potential serious adverse events for human review in a Phase II oncology study" is a context of use. "We use AI" is not.

From the context of use, the framework derives model risk as the combination of two things:

① Model influence

How much does the model's output drive the decision, relative to other evidence and to human judgment? A model whose output is one input among many a clinician weighs has low influence. A model whose output is acted on automatically has high influence.

② Decision consequence

If the decision is wrong, how bad is the harm? A wrong answer that delays a newsletter is low consequence. A wrong answer that affects participant safety, dosing, or the integrity of the pivotal evidence is high consequence.

The Model Risk Determination Matrix

Select the cell that matches your tool. The readout shows your risk tier and the rigor the framework expects in return.

Decision consequence →

LowMediumHigh

Model influence →

Select a cell

Awaiting your selection

Choose the cell that best matches one of your AI tools to see the credibility rigor the FDA framework would expect for it.

FED
AUD

From the reviewer's chair

Sponsors lose credibility two ways: by under-proving a high-risk model, and by drowning a trivial tool in validation no one needed. The matrix protects you from both. When you tell me a tool is low-risk, I expect a crisp justification for why — low influence, low consequence — not an absence of thought.

§04

The 7-Step Credibility Assessment

The FDA's core method, walked one step at a time. Use the controls to move through the sequence; each step shows what to do and a clinical-trial worked example.

◆ What you'll understand

The seven steps in order
Why "plan before you test" is the heart of the method
How each step produces a documented artifact

◆ What you'll be able to do

Build a credibility assessment plan for one of your models
Decide when to engage the agency early
Assemble the documentation a reviewer will expect

The framework's elegance is that it forces you to decide how much proof you need before you go looking for it. You define the question, define the context, weigh the risk, and only then design a credibility plan proportionate to that risk — execute it, document it, and judge whether it was enough.

Engage early

For higher-risk contexts of use, the framework actively encourages sponsors to discuss the credibility plan with the agency before executing it — for example, through existing meeting pathways. An auditor would far rather see an aligned plan than a clever after-the-fact justification.

§05

The Seven Marks of Trustworthy AI

NIST defines what "trustworthy" actually means in seven concrete characteristics. For each, here is what it means inside a trial — and the question an auditor will ask you.

◆ What you'll understand

The seven NIST trustworthiness characteristics
Why they often trade off against each other
How to weigh which characteristics matter most for a given tool

◆ What you'll be able to do

Translate each characteristic into a testable requirement
Answer the auditor's question for each one with evidence
Document the trade-offs you deliberately accepted

"Trustworthy" is not a vibe. NIST breaks it into seven characteristics, and the honest part of the framework is that they compete: making a model more secure can make it less usable; maximizing accuracy can reduce explainability. Governance is not about scoring perfectly on all seven — it is about making the trade-offs deliberately and writing down why.

CHAR-01

Valid & reliable

The output is accurate for its purpose and stays accurate across repeated, real-world use.

Auditor asksShow me performance on data that looks like my trial's population, not just your training set.

CHAR-02

Safe

The system does not, under foreseeable conditions, lead to states that endanger human life or health.

Auditor asksWhat is the worst thing a wrong output can cause, and what catches it before it reaches a patient?

CHAR-03

Secure & resilient

It withstands adversarial inputs and recovers gracefully from disruption or attack.

Auditor asksWho can access and alter this system, and how would you know if someone tampered with it?

CHAR-04

Accountable & transparent

Information about the system is available, and a responsible human owner is identifiable.

Auditor asksName the person accountable for this model's outputs in this study.

CHAR-05

Explainable & interpretable

You can describe how it works and give meaning to its outputs in human terms.

Auditor asksWhen the model flags a case, can the reviewer understand why well enough to act on it?

CHAR-06

Privacy-enhanced

It safeguards individual autonomy, identity, and dignity in how it handles data.

Auditor asksWhat participant data does it use, under what basis, and how is it protected and minimized?

CHAR-07

Fair, with bias managed

Harmful bias is identified and managed so the system does not produce inequitable outcomes.

Auditor asksHow does performance differ across the subgroups you will enroll, and what did you do about it?

The trade-off you must document

A reviewer is not unsettled that you made trade-offs — every real system does. A reviewer is unsettled when trade-offs appear to have happened by accident. If you accepted lower explainability for higher accuracy in a high-risk tool, say so, justify it against the context of use, and show the compensating human oversight. Deliberate beats perfect.

§06

Ten Principles of Good AI Practice

The FDA–EMA joint principles of January 2026, quoted faithfully and paired with the one operational move each demands of you.

◆ What you'll understand

All ten FDA–EMA principles
How they consolidate everything in §02–§05
Which principles your program already satisfies

◆ What you'll be able to do

Map each principle to an owner and an artifact
Use the principles as the agenda for a governance review
Anticipate the expectations of future binding guidance

On 14 January 2026 the FDA and EMA jointly released ten principles to steer responsible AI across the medicine lifecycle. They are not yet binding — but they are explicitly described as the foundation for future guidance. Treat them as a preview of the questions you will be asked. The bold text below is the agencies' own wording; the note beneath is your operational move.

1

Human-centric by design

"The development and use of AI technologies align with ethical and human-centric values."

Your move · Put patient interest and public health first in design reviews; build safeguards in from the start, not as a patch.

2

Risk-based approach

"…a risk-based approach with proportionate validation, risk mitigation, and oversight based on the context of use and determined model risk."

Your move · Run every tool through the §03 matrix and size your evidence to the tier it lands in.

3

Adherence to standards

"AI technologies adhere to relevant legal, ethical, technical, scientific, cybersecurity, and regulatory standards, including Good Practices (GxP)."

Your move · Connect AI controls to Part 11, ICH E6(R3), GAMP 5, and your security standards — don't run a parallel universe.

4

Clear context of use

"AI technologies have a well-defined context of use (role and scope for why it is being used)."

Your move · Maintain a written COU statement for every tool in your inventory (MAP-02).

5

Multidisciplinary expertise

"Multidisciplinary expertise covering both the AI technology and its context of use are integrated throughout the technology's life cycle."

Your move · Ensure clinical, data-science, and regulatory voices all sign off — not data science alone.

6

Data governance and documentation

"Data source provenance, processing steps, and analytical decisions are documented in a detailed, traceable, and verifiable manner, in line with GxP requirements."

Your move · Apply ALCOA+ thinking to data lineage; protect sensitive data across the whole lifecycle.

7

Model design and development practices

"…best practices in model and system design and software engineering… leverages data that is fit-for-use, considering interpretability, explainability, and predictive performance."

Your move · Treat model development as engineering: version control, testing, and documented design decisions.

8

Risk-based performance assessment

"…evaluate the complete system including human-AI interactions, using fit-for-use data and metrics appropriate for the intended context of use…"

Your move · Validate the human-plus-model system together; pick metrics that match the decision, not generic accuracy.

9

Life cycle management

"Risk-based quality management systems are implemented throughout the AI technologies' life cycles… scheduled monitoring and periodic re-evaluation… (e.g., to address data drift)."

Your move · Schedule monitoring and re-validation; a model is never "done" (MAN-03).

10

Clear, essential information

"Plain language is used to present clear, accessible, and contextually relevant information… regarding the AI technology's context of use, performance, limitations, underlying data, updates, and interpretability or explainability."

Your move · Write a plain-language summary for users and, where relevant, participants — this very playbook models the standard.

§07

Lifecycle Controls in Practice

Governance is not an event at launch — it runs from data sourcing to retirement. These are the four control domains an auditor inspects most closely.

◆ What you'll understand

The four lifecycle control domains
How established GxP rules already cover most AI obligations
What "human oversight" must actually look like

◆ What you'll be able to do

Connect AI controls to Part 11, GAMP 5, and ALCOA+
Design a drift-monitoring and re-validation cadence
Specify vendor obligations in contracts and audits

DATA INTEGRITY

Provenance & ALCOA+

Every dataset feeding a model must be Attributable, Legible, Contemporaneous, Original, and Accurate — plus Complete, Consistent, Enduring, and Available. Document where data came from, how it was processed, and every analytical decision. Under 21 CFR Part 11, the systems holding that data need audit trails, access controls, and trustworthy electronic records.

VALIDATION

Computerized system validation & GAMP 5

GAMP 5's risk-based, category-driven approach maps cleanly onto AI: scale validation effort to the system's risk and novelty. The twist for AI is that validation is not one-and-done — a model that learns or is retrained re-enters validation through change control.

HUMAN OVERSIGHT

Meaningful human control

Oversight is "meaningful" only if the human can realistically understand, question, and override the output, and is accountable for the decision. A rubber-stamp reviewer who cannot interpret the model is not oversight — it is the appearance of oversight, which an auditor will see through immediately.

MONITORING

Drift, decay & re-evaluation

Real-world data shifts away from what a model learned — data drift. The FDA–EMA principles call for scheduled monitoring and periodic re-evaluation. Define your triggers (calendar-based and performance-based), your thresholds, and what happens when a threshold is breached.

FED
AUD

From the reviewer's chair

"AI-as-a-service" is where I find the most surprises. A sponsor outsources a model to a vendor, and with it, quietly outsources the documentation it can no longer produce for me. You can delegate the work. You cannot delegate the accountability. Build audit rights and documentation flow-down into the contract on day one.

§08

Governance Readiness Self-Assessment

Sixteen questions, mapped to the four NIST functions. Answer honestly — the score is for you, and it saves automatically so you can return and re-test as you improve.

◆ What you'll understand

Where your program is strong and where it is exposed
Your overall readiness against the four functions

◆ What you'll be able to do

Produce a baseline readiness score to track over time
Prioritize the function with the lowest score first
Generate a focused remediation conversation for leadership

Your readiness, scored live

Each answer: Yes = in place & evidenced · Partial = informal or undocumented · No = gap.

Not yet scored

§09

The Audit Evidence Pack

The artifacts a federal reviewer will actually ask to see. Tick them off as your program produces each one — your checkmarks save automatically.

◆ What you'll understand

The documents that constitute an audit-ready file
How each artifact maps back to a framework requirement

◆ What you'll be able to do

Assemble a credibility dossier for a single AI tool
Identify which artifacts you are missing today
Create your own version-controlled evidence index

Inspection-readiness is mostly a documentation problem. If you can hand a reviewer this set of artifacts for any AI tool in your inventory, you are in genuinely good shape. Use it as a packing list.

AI system inventory entryTool name, owner, vendor, and where it operates in the study (MAP-01).
Context-of-use statementThe model's role, scope, and the decision it supports (MAP-02).
Model risk determinationInfluence × consequence, the resulting tier, and the justification (§03).
Credibility assessment planThe proportionate evidence plan, ideally agreed with the agency for high risk (§04).
Data provenance & ALCOA+ recordSource, processing steps, analytical decisions, and fitness-for-use (Principle 6).
Validation / TEVV reportPerformance, subgroup/bias testing, and whole-system (human-AI) evaluation (MEA-01–05).
Part 11 controls evidenceAudit trails, access control, and electronic-record integrity for systems involved.
Human oversight procedureWho reviews, what they can override, and the record that they did (MAN-02).
Monitoring & drift planTriggers, thresholds, cadence, and re-validation rules (MAN-03).
Change-control logEvery model update, retraining, or version change and its re-assessment (MAN-05).
Vendor governance packageContracts, audit rights, and documentation flow-down for AI-as-a-service (GOV-05).
Plain-language summaryContext, performance, and limitations stated clearly for users (Principle 10).

§10

Glossary & Sources

Plain-language definitions for the terms used throughout, and the primary public sources every claim is drawn from.

Glossary

Context of Use (COU): The specific role and scope of an AI model — what question it answers and how its output is used in a decision.
Model risk: The combination of model influence (how much it drives the decision) and decision consequence (how bad a wrong answer is).
Credibility: Trust in an AI model's output for a particular context of use, established by evidence proportionate to its risk.
TEVV: Testing, Evaluation, Verification & Validation — the evidence-gathering activities under NIST's Measure function.
Data drift: The gradual divergence of real-world data from the data a model was trained on, which degrades performance over time.
ALCOA+: Data-integrity expectations: Attributable, Legible, Contemporaneous, Original, Accurate — plus Complete, Consistent, Enduring, Available.
GxP: The family of "Good Practice" quality regulations (GCP, GMP, GLP) governing regulated clinical and manufacturing work.
GAMP 5: A risk-based framework for validating computerized systems, scaling effort to system risk and novelty.

Primary sources

NIST. Artificial Intelligence Risk Management Framework (AI RMF 1.0). January 26, 2023; plus the Generative AI Profile (NIST AI 600-1), July 26, 2024.
FDA. "Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products." Draft Guidance for Industry, January 2025 (Docket FDA-2024-D-4689).
FDA & EMA. "Guiding Principles of Good AI Practice in Drug Development." January 2026 (10 jointly issued principles, quoted in §06).
FDA. "Artificial Intelligence and Medical Products: How CBER, CDER, CDRH, and OCP are Working Together." March 2024, revised February 2025.
ICH. E6(R3) Good Clinical Practice.
US FDA. 21 CFR Part 11 — Electronic Records; Electronic Signatures.
European Union. Regulation (EU) 2024/1689 (the EU AI Act); EMA Reflection Paper on the use of AI in the medicinal product lifecycle (September 2024).
ISPE. GAMP 5: A Risk-Based Approach to Compliant GxP Computerized Systems (2nd ed.).

You've reached the end

You now have the full operating model: the landscape, the four functions, context of use and risk, the seven credibility steps, the seven marks of trust, the ten principles, the lifecycle controls, a scored readiness baseline, and an evidence packing list. Issue yourself a certificate of completion — then go put one tool through the §03 matrix this week.

Governing Artificial Intelligencein Clinical Trials

The Auditor's Foreword

◆ What you'll understand

◆ What you'll be able to do

How to use this playbook

The 2026 Regulatory Landscape

◆ What you'll understand

◆ What you'll be able to do

The instruments, in plain terms

How we got here — the timeline that matters

The Aurelyn Governance Model

◆ What you'll understand

◆ What you'll be able to do

Govern

Map

Measure

Manage

Context of Use & Model Risk

◆ What you'll understand

◆ What you'll be able to do

① Model influence

② Decision consequence

The Model Risk Determination Matrix

Awaiting your selection

The 7-Step Credibility Assessment

◆ What you'll understand

◆ What you'll be able to do

The Seven Marks of Trustworthy AI

◆ What you'll understand

◆ What you'll be able to do

Valid & reliable

Safe

Secure & resilient

Accountable & transparent

Explainable & interpretable

Privacy-enhanced

Fair, with bias managed

Ten Principles of Good AI Practice

◆ What you'll understand

◆ What you'll be able to do

Human-centric by design

Risk-based approach

Adherence to standards

Clear context of use

Multidisciplinary expertise

Data governance and documentation

Model design and development practices

Risk-based performance assessment

Life cycle management

Clear, essential information

Lifecycle Controls in Practice

◆ What you'll understand

◆ What you'll be able to do

Provenance & ALCOA+

Computerized system validation & GAMP 5

Meaningful human control

Drift, decay & re-evaluation

Governance Readiness Self-Assessment

◆ What you'll understand

◆ What you'll be able to do

Your readiness, scored live

The Audit Evidence Pack

◆ What you'll understand

◆ What you'll be able to do

Glossary & Sources

Glossary

Primary sources

Governing Artificial Intelligence
in Clinical Trials