#framework #AI #agentic-workflows #operations
# The AGENTIC Framework
**The operating system that tells organisations what agents to build and when.**
*Developed by Madeleine Pierce*
The AGENTIC Framework is a practical system for figuring out where AI agents belong across your operations, and making that real. Not just pilots. A governed, structured, self-improving system with built-in agents that handle the orchestration.
It's a living framework, still developing, being tested across real organisations. This is where it stands today.
> [!info] Where this stands
> This framework is built from first principles, drawing on experience across digital transformation, product management, design thinking, and hands-on workflow mapping. It's currently being applied at two very different organisations (a global marine conservation non-profit and a minerals exploration venture studio) and they're hitting the same wall: not "should we use AI?" but "how do we actually do this properly?" I'm refining it through application and feedback as I go.
---
## In short
**The 30-second version. If you read nothing else, read this.**
**Scan fast. Focus where it counts. Go deep only where it's earned.**
Start with a one-day exploratory per function or role. Capture the broad strokes: what's painful, what's repetitive, where the judgment calls live. Feed that into the **AGENT Prioritisation Matrix**, a living dashboard designed to be managed by agents that would score, rank, and continuously resurface what to work on next. Then go deep only on the workflows the dashboard surfaces.
**AGENT**: the pipeline that does the deep work.
- **A: Assess**: map the workflow, produce the machine-readable spec
- **G: Greenlight**: score, prioritise, and commit to what gets built
- **E: Engineer**: build, prove, and ship
- **N: Nurture**: monitor, fix, and feed corrections back
- **T: Track**: watch the frontier, resurface what's ready, and feed the next cycle
**The AI Governance Stream** governs the boundaries throughout: accountability, risk, ethical red lines.
**The AI Adoption Stream** brings people with you: change management, trust, adoption from day one.
Everything the framework produces is structured documentation stored in the **AGENTIC Vault**. Because it's structured, agents could read it, scan it, and improve it over time. The documentation isn't paperwork. It's infrastructure.
The system is designed to compound. Every workflow you put through it should make the next one faster. The framework comes with agents designed to handle the orchestration, so your team can focus on the decisions that actually need human judgement.
---
## Overview
**The structural map. This is what you'd hand someone on day one.**
The AGENTIC Framework has four parts:
1. **Kickoff**: the intake engine. Two passes: task list sweep, then deep conversations. Feeds the dashboard.
2. **The AGENT Pipeline**: a five-stage sequence for preparing, building, and sustaining workflows
3. **The AI Governance Stream**: boundaries, accountability, risk, and oversight
4. **The AI Adoption Stream**: change management, transition, trust, and adoption
Kickoff feeds the pipeline. The pipeline does the deep work. The streams run alongside, not after.
At the centre of everything sits the **AGENT Prioritisation Matrix**, a living dashboard designed to be populated and managed by agents, tracking every workflow the framework has ever touched and continuously rescored as technology evolves and adoption data flows in. Every part of the framework either feeds the dashboard or acts on what it surfaces.
And underneath everything sits the **AGENTIC Vault**, the structured repository of every artefact the framework produces. Specifications, governance models, collaboration designs, performance baselines, build files. Every piece of documentation is structured so that AI can act on it. Nothing is random. Every agent has a source file. Every build decision has a rationale. When someone leaves, nothing becomes a black box.
---
## What AGENTIC produces
Every function scanned through the framework produces tangible, reusable outputs, regardless of how far through the pipeline each workflow goes.
- **Documented SOPs** for every workflow captured. Processes that lived in people's heads are now on paper, validated by the people who do the work. This has standalone value even if nothing else happens.
- **A scored, prioritised dashboard** showing which workflows are worth building, which need more investigation, and which aren't ready yet. Always current, always showing what's next.
- **Machine-readable specifications** with defined success criteria and test cases. Structured enough that an agent can build from them. Human-readable enough that the Workflow Owner can sign off.
- **Collaboration models** defining exactly which steps are AI-run, AI-led, human-led, or human-run. Per step, not per workflow.
- **Governance models** with risk tiers, accountability, escalation paths, and compliance checkpoints embedded in the specifications.
- **Performance data and override logs** from every deployed workflow. Every human correction is an instruction. Every override is a data point.
- **A structured vault** of reusable artefacts (specifications, patterns, modules, templates, governance models) that makes every subsequent workflow faster to prepare than the last.
Even if a workflow never makes it past Assess, you still end up with documented processes you didn't have before. That has standalone value.
---
## How to read this framework
**Layer 1: The Overview**
The AGENTIC Framework helps organisations figure out where AI belongs in their operations, build and prove the solutions, govern the boundaries, and bring people with them. The dashboard shows where you are. The pipeline does the work. The vault holds the intelligence.
**Layer 2: The structure**
Kickoff, the AGENT pipeline (five stages), the AI Governance stream, and the AI Adoption stream. Described below with short summaries of each.
**Layer 3: The full method**
All the detail: actions, outputs, stream connections, under-the-hood methodology, and the feedback loops that keep it alive. Each stage and stream has its own detail page linked below.
Most of the time, Layer 2 is enough. Layer 3 is for when you're actually doing the work.
---
## The relationship between the parts
```mermaid
graph LR
subgraph intake["KICKOFF"]
K["Scan → Score → Surface"]
end
subgraph pipeline["THE AGENT PIPELINE"]
direction LR
A["<b style='font-size:1.3em'>A</b> : Assess"] --> G["<b style='font-size:1.3em'>G</b> : Greenlight"]
G --> E["<b style='font-size:1.3em'>E</b> : Engineer"]
E --> N["<b style='font-size:1.3em'>N</b> : Nurture"]
N --> T["<b style='font-size:1.3em'>T</b> : Track"]
end
subgraph centre[" "]
M["AGENT Prioritisation Matrix: the living dashboard"]
end
subgraph streams[" "]
direction LR
GOV["AI Governance Stream ─────────────────────────────"]
PPL["AI Adoption Stream ─────────────────────────────────"]
end
K -->|"feeds"| M
M -->|"surfaces"| A
N -->|"feeds"| M
T -->|"feeds"| M
A -.->|"Refinement loop"| A
G -.->|"Design loop"| A
N -.->|"Reality loop"| A
N -.->|"Reality loop"| G
T -.->|"Reshuffle"| G
style K fill:#6c5ce7,stroke:#6c5ce7,color:#fff
style A fill:#2d3436,stroke:#636e72,color:#dfe6e9
style G fill:#2d3436,stroke:#636e72,color:#dfe6e9
style E fill:#2d3436,stroke:#636e72,color:#dfe6e9
style N fill:#2d3436,stroke:#636e72,color:#dfe6e9
style T fill:#2d3436,stroke:#636e72,color:#dfe6e9
style M fill:#d63031,stroke:#d63031,color:#fff
style GOV fill:#0984e3,stroke:#0984e3,color:#fff
style PPL fill:#00b894,stroke:#00b894,color:#fff
style intake fill:none,stroke:#6c5ce7,color:#dfe6e9
style pipeline fill:none,stroke:#636e72,color:#dfe6e9
style centre fill:none,stroke:none
style streams fill:none,stroke:none
```
Kickoff scans fast and feeds the **AGENT Prioritisation Matrix**. The assessment agent would score and surface what's worth the deep dive. The AGENT Pipeline does the deep work: Assess, Greenlight, Engineer, Nurture, Track. Nurture and Track feed performance and capability data back into the dashboard. The dashboard reshuffles. The next cycle begins.
The pipeline is sequential on first pass. But the pipeline is not one-and-done. Four feedback loops keep it alive:
**The Refinement loop (within Assess).** You observe a workflow, try to formalise the specification, and realise something's missing: a decision rule, an exception, an undocumented shortcut. You go back to the workflow owner. This loop runs repeatedly during initial preparation, and it's completely normal.
**The Design loop (Greenlight → Assess).** You evaluate agent capability, then discover the specification needs more structured input, or the workflow itself needs to change to make the collaboration model work. The specification gets revised. The collaboration model and the specification co-evolve.
**The Reality loop (Nurture → back into the pipeline).** After deployment, human overrides reveal gaps: a missing rule, an ambiguous step, unclear data. Nurture captures the pattern and feeds it back into Assess or Greenlight. The workflow gets updated. This loop runs continuously.
**The Intake loop (Kickoff → Matrix → Pipeline → Matrix → Kickoff).** After the first batch goes through the pipeline, the assessment agent goes back through the Kickoff data and resurfaces the next candidates. As capability evolves, workflows that were "not yet" become "focus here." The intake loop keeps the pipeline fed.
Beyond the loops, the parts also interact:
- **Track** reshuffles **Greenlight** as priorities change with capability evolution
- **Nurture** feeds into the **AI Governance stream** as risk profiles change as the system learns
- **Nurture** feeds into the **AI Adoption stream** where adoption data informs whether a workflow is truly working
- The **AI Adoption stream** surfaces issues for **Governance** where resistance sometimes reveals accountability gaps
- The **AI Governance stream** constrains **Track**, as the pipeline cannot expand beyond what governance can cover
- The **AGENTIC Vault** accelerates **Assess** as reusable specifications make new workflows faster
The pipeline is a sequence on first use and a learning system in ongoing use. The streams are constants throughout. The dashboard is always running.
---
## Who this is for
AGENTIC is designed for organisations that know they should be using AI but don't have a system for figuring out where it belongs or how to do it properly.
It's built for **operations and transformation leads** who need to orchestrate AI adoption across teams, not just run one-off experiments. For **small-to-mid organisations** without dedicated AI engineering teams, where one or two people need to drive the work with agent support. And for **consultants and advisors** helping clients navigate AI adoption and needing a structured, repeatable methodology to work from.
You don't need advanced AI maturity. You don't need a data science team. You need one person willing to run the process, one function willing to talk, and a recording device.
---
## What you need to start
- One person to run the process (the AGENTIC Orchestrator)
- One function or team willing to talk about their work
- A way to record a conversation (phone voice memo, Zoom, Otter, anything)
- Access to an AI tool that can process transcripts (Claude, GPT-4, or equivalent)
- An executive sponsor who has endorsed the initiative (they don't need to be in the room)
That's it. No tooling migration. No data infrastructure. No AI maturity assessment. Start with a conversation.
---
**Quick Start**: Want to get going? The Quick Start guide gives you a concrete two-week plan for taking your first workflow from discovery to parallel-run.
→ [AGENTIC Quick Start](AGENT%20Pipeline/AGENTIC%20Quick%20Start.md)
**Agent Architecture**: The framework comes with purpose-built agents. This page maps what they do, what they read, what they produce, and how they relate.
→ [AGENTIC Agent Architecture](AGENT%20Pipeline/AGENTIC%20Agent%20Architecture.md)
**Foundations**: Theoretical and practical references underpinning the framework.
→ [AGENTIC Foundations](AGENT%20Pipeline/AGENTIC%20Foundations.md)
---
## Kickoff
*Day one. Pick a function. Two passes: a fast task list sweep, then deep conversations on the ones that earned it.*
Kickoff is the entry point. Before anything enters the AGENT Pipeline, someone has to identify which workflows are worth the deep dive. That's Kickoff's job: a fast, broad scan that captures enough signal to know where to focus. It runs as two passes.
**Pass one: the task list sweep.** Start with what's fast and low-friction. Capture the tech stack the function uses, then get a list of every task people do in their roles. Just the names and a line of context, not detailed steps. This is deliberately lightweight: you don't need people to understand AI to participate. They just need to describe their week. A filtering agent would take that task list alongside the tech stack and flag which tasks are worth a deeper look, based on patterns like repetition, manual data handling, and handoff points. The task list sweep also becomes the foundation for the inventory that Track uses later to resurface candidates as capability evolves.
**Pass two: the deep conversation.** The filtering agent produces a shortlist. Now you go deep, but only on the tasks that earned it. Hit record and let people talk. Not a structured interview, but a conversation. Let them ramble. The nuance lives in the ramble: the workarounds, the frustrations, the "oh you're not gonna believe how this actually gets done." You'll hear what they hate doing, what they're proud of, what they'd hand off tomorrow if they could. That's your data.
The transcript goes to an agent that extracts the workflows, restructures the insights, and builds user profiles: who owns what, how they feel about it, where the pain is. That feeds straight into the **AGENT Prioritisation Matrix**. An assessment agent would then score each workflow based on current AI capability, organisational readiness, and the broad-strokes data captured in the scan, and surface where to focus. It could also run quick capability tests by trying a single well-crafted prompt against a workflow and reporting back: "this one can be largely solved with a prompt, here's where it breaks." Sometimes the simplest solution is the right one.
The two-pass structure matters because the hardest problem in AI adoption isn't the technology. It's that most people don't know where AI applies to their work. An urban design firm, an architectural studio, a conservation field team: they're not going to walk in and say "automate my permit compliance tracking." But they can list their tasks. The filtering agent bridges the gap between what people know about their work and what AI can do with it.
Kickoff isn't a one-time exercise. It's an intake engine that runs continuously: new functions, new roles, revisits as capability changes. After the first batch goes through the pipeline, the assessment agent would go back through the Kickoff data and resurface the next candidates. The data doesn't expire. It gets revisited.
→ [Full Kickoff detail, actions, outputs, and methodology](AGENT%20Pipeline/Kickoff/AGENTIC%20Kickoff.md)
---
## The AGENT Pipeline
Five stages. Sequential on first pass. Each one answers a question.
| Stage | Question |
|---|---|
| **A: Assess** | What is really happening, and how do we express it? |
| **G: Greenlight** | Which workflows should we commit to building? |
| **E: Engineer** | Can we build it, prove it, and ship it? |
| **N: Nurture** | Is it working, and how do we keep it healthy? |
| **T: Track** | What's changed out there, and what should we work on next? |
The pipeline processes what Kickoff surfaces. You don't document everything then decide. You scan fast, focus where it counts, then go deep.
The sequencing is non-negotiable.
- Without **Assess**, you automate chaos
- Without **Greenlight**, you waste effort on the wrong workflows
- Without **Engineer**, you have ideas but no working systems
- Without **Nurture**, deployed workflows degrade and nobody notices
- Without **Track**, the system never compounds
---
### A: Assess
*What is really happening, and how do we express it?*
Assess is the deep dive. A workflow enters here because the assessment agent surfaced it from Kickoff. It earned its place. Now it gets the full treatment.
Observe and map the workflow as it actually runs, with the people who do the work. Capture every step, handoff, decision, exception, and workaround, including the ones that never made it into documentation. Step back and look hard: where are the bottlenecks, redundancies, and fragile points? Optimise before you formalise.
Then turn it into an executable specification. Every shortcut, every judgment call, every "I just know which ones need special handling" has to become explicit, structured, and machine-readable. This translation is handled by the **AGENTIC Specification Generator**, a prompt-driven agent that produces a first-pass specification from the assessment outputs. The Workflow Owner validates and refines.
Define success criteria here. What does "working" look like for this workflow? What are the test cases? These travel with the specification into Engineer and become the basis for evaluation. You can't prove something works if you haven't defined what "works" means.
The most common mistake in AI adoption is encoding existing inefficiency into a new system and scaling it. Assess is your best chance to fix the workflow before it gets formalised, though the feedback loops mean it's never too late to revisit.
Because the specification is structured and machine-readable, it becomes an asset, not paperwork. Agents could read it, score it, scan it for patterns, and surface improvements when capability changes. The documentation is the infrastructure.
→ [Full Assess detail, actions, outputs, and methodology](AGENT%20Pipeline/A%20–%20Assess/AGENTIC%20Assess.md)
---
### G: Greenlight
*Which workflows should we commit to building?*
Two jobs at this stage. First: evaluate what current AI can reliably do in this specific workflow. Break it into individual tasks and decisions. Test each one for repeatability, ambiguity, and consequence of error. Design the human-AI collaboration model using the **AGENT Collaboration Spectrum**: four levels are AI-run, AI-led with human verification, human-led with AI assistance, and human-run. Most workflows use all four across different steps.
Second: score, rank, and decide. The **AGENT Prioritisation Matrix** is the living dashboard at the centre of the framework. Every workflow gets scored across multiple dimensions covering workflow characteristics (frequency, time cost, leverage), implementation factors (difficulty, risk, cost-benefit), and organisational readiness (team readiness, owner buy-in, staff dependency). The full scoring model and dimension definitions are in the [Matrix detail page](AGENT%20Pipeline/G%20–%20Greenlight/AGENT%20Prioritisation%20Matrix.md). The traffic-light system makes the call visible. Green means go. This is a human decision point: the agent would populate and recommend, but people commit.
The best workflows to start with are the repetitive, low-risk, high-frequency ones. They compound fastest and carry the least downside.
→ [Full Greenlight detail, actions, outputs, and methodology](AGENT%20Pipeline/G%20–%20Greenlight/AGENTIC%20Greenlight.md)
---
### E: Engineer
*Can we build it, prove it, and ship it?*
This is where the specification becomes real. "Build" doesn't always mean building a bespoke agent from scratch. It might mean configuring native AI features in tools the organisation already uses, designing templates and database structures, or setting up automation rules inside existing software. What matters is that the specification is implemented and the collaboration model is honoured.
The framework draws on established architectural patterns for agentic systems. Anthropic's published patterns (prompt chaining, routing, parallelisation, orchestrator-worker, evaluator-optimizer) provide one useful build vocabulary (source: Anthropic, "Building Effective Agents," anthropic.com/engineering/building-effective-agents). But the build space is broader than any single taxonomy: tool-use agents, retrieval-augmented generation (RAG) pipelines, multi-agent orchestration, and hybrid architectures that combine several patterns are all valid approaches depending on the workflow. The framework is pattern-agnostic. What matters is that the chosen architecture delivers the specification and honours the collaboration model. When you know where a workflow sits on the Collaboration Spectrum, these patterns tell you how to architect it.
Because the full machine-readable spec exists from Assess, agents could autonomously build and test workflows in a sandbox (a technology proving ground where the specification meets reality). But a sandbox prototype is rarely complete without the Workflow Owner. They bring the judgment, the system access, and the edge-case knowledge that turns a proof-of-concept into something that actually works.
Before moving up the **AGENT Collaboration Spectrum**, workflows pass through a **parallel-run phase** where the agent shadows the human process. Evaluation is explicit: run the test cases defined at Assess, measure against the success criteria, document the evidence. Trust is built through data, not promises.
Once proven, this is also where production build and deployment happen. Engineer takes a workflow from specification to working system.
→ [Full Engineer detail, actions, outputs, and methodology](AGENT%20Pipeline/E%20–%20Engineer/AGENTIC%20Engineer.md)
---
### N: Nurture
*Is it working, and how do we keep it healthy?*
Monitor every live workflow. Capture every signal. Feed it all back. Every human override is a data point. Every correction is an instruction. Nurture keeps deployed workflows healthy and improving.
The framework describes two roles at this stage: the **Assessor** and the **Builder**. These are likely to be agents, though how they're implemented will vary by organisation and will evolve as the technology matures. The Assessor monitors live workflows by logging overrides, catching recurring gaps, and feeding corrections back so agents improve with every run. The Builder acts on what the Assessor finds by opening specifications, revising them, and proving fixes in sandbox before anything goes live.
Nurture also maintains the **AGENTIC Vault**, the structured repository that every agent in the system reads from and writes to. Specifications, collaboration models, performance baselines, proven modules, build files, override logs, capability updates. The vault is what makes the framework compound and what would make the whole system scannable and self-improving. When new capability lands, an agent could scan the spec vault and identify which workflows could now handle their exception cases differently.
All performance and adoption data feeds back into the **AGENT Prioritisation Matrix**, keeping the dashboard current.
→ [Full Nurture detail, actions, outputs, and methodology](AGENT%20Pipeline/N%20–%20Nurture/AGENTIC%20Nurture.md)
---
### T: Track
*What's changed out there, and what should we work on next?*
Watch the AI capability frontier: both the broader landscape and the organisation's own software stack. When Notion ships a new AI feature, when Copilot adds a capability, when a new model release changes what's possible, workflows that were previously scored too low get resurfaced and reprioritised. What scored too low to automate last month might be ready tomorrow because the technology caught up.
Track feeds capability changes to the assessment agent. The agent would rescore the **AGENT Prioritisation Matrix**. The dashboard reshuffles. The next batch surfaces. Track also identifies expansion opportunities: adjacent workflows that share structure with proven ones, reusable modules from the **AGENTIC Vault** that unlock new parent workflows.
Track also owns the **decommission path**. Not everything that gets automated should stay automated forever. When performance drops below threshold, when a governance model is invalidated, when a team restructures, or when the tool an agent was built on gets deprecated, Track identifies the trigger and initiates a governed retirement. Decommissioning requires the same level of governance sign-off as greenlighting. A workflow doesn't just get quietly turned off.
Track also monitors the financial picture: cost per run, ROI trends, and whether the cost-benefit equation still holds. A workflow that was cost-effective six months ago might not be now.
Without Track, the pipeline runs once and stops. With Track, it compounds. This is the stage that turns a process into a learning system.
→ [Full Track detail, actions, outputs, and methodology](AGENT%20Pipeline/T%20–%20Track/AGENTIC%20Track.md)
---
## The AI Governance Stream
Governance runs alongside the AGENT pipeline, not inside it. It connects to every stage but doesn't sit in the sequence. You build it early, you reference it throughout, and you update it as the pipeline grows.
It covers risk classification, accountability, escalation paths, ethical red lines, regulatory and compliance checkpoints, auditability, external output gates, and review cadence. The organisations that handle AI ethics well are the ones that have turned principles into operational criteria: things that get checked, measured, and acted on at the workflow level.
Governance also covers decommissioning. Turning off an automation requires the same rigour as turning one on: defined triggers, a structured process, stakeholder communication, and formal sign-off.
Governance that works for three automated workflows may be inadequate for thirty. The AI Governance stream must be revisited as the pipeline grows.
→ [Full AI Governance Stream detail, connections, and methodology](AI%20Governance%20Stream/AGENTIC%20AI%20Governance%20Stream.md)
---
## The AI Adoption Stream
Change management runs alongside the AGENT pipeline, not after it. It starts the moment you begin talking to people about their workflows and doesn't stop until the new way of working is the normal way of working.
It covers transition planning, workforce impact, trust building, resistance as data, enthusiasm as risk, failure protocols, and adoption measurement. When people resist a new system, the instinct is to push harder. But resistance usually contains information: where trust hasn't been built, where the design doesn't match how people actually work. The person who resists most is often the person who cares most about getting the work right. Equally, people who love the technology can run ahead of the guardrails by using general-purpose tools for specialised tasks, bypassing purpose-built systems, and producing external-facing outputs without the right checks. The most robust safeguards are structural: build the guardrail into the tool, not around the person, and put human checkpoints on outputs that matter.
→ [Full AI Adoption Stream detail, connections, and methodology](AI%20Adoption%20Stream/AGENTIC%20AI%20Adoption%20Stream.md)
---
## The AGENT Prioritisation Matrix: the living centre
The **AGENT Prioritisation Matrix** is the operational centre of the entire framework. It's not a spreadsheet filled in during a workshop. It's designed to be a living dashboard populated, scored, and managed by agents.
Every workflow the framework touches lives in the matrix. It tracks readiness, priority, cost, performance, and status. The assessment agent would populate it from Kickoff data and rescore it as technology evolves and adoption data flows in from Nurture. The tracking agent would feed capability changes from Track. The intent is a dashboard that's always current, always showing: where you are, what's ready, what just changed, the financial picture. How much of this is agent-driven versus human-maintained will depend on the organisation and the maturity of available tools.
Workflows are scored across multiple dimensions and assigned a traffic-light status: 🟢 Ready to Build, 🟡 Evaluate Further, 🔴 Not Yet, ⚫ Human-run, or ⬜ Retired. The matrix reshuffles continuously as signals flow in. A workflow scored 🔴 three months ago might become 🟡 today as a new capability lands.
The human's job shifts from "figure out what to do" to "decide whether to act on what the agent recommends." The agent would do the analysis. The human makes the call.
**Worked example.** A Finance team's weekly reporting compilation scores high on frequency (5, daily or more), time cost (5, full day of senior time), and implementation difficulty (5, simple and linear). It scores lower on leverage (4, impacts compliance) and organisational readiness (4, leadership backing). Total: 42 out of 45. Status: 🟢 Ready to Build. Compare that with board pack preparation: lower frequency (2, monthly), high leverage (5, material to company outcomes), but high risk (2, regulatory implications) and moderate buy-in (3). Total: 32. Status: 🟡 Evaluate Further, needs deeper governance design before build. The full scoring model with all dimension definitions is in the [Matrix detail page](AGENT%20Pipeline/G%20–%20Greenlight/AGENT%20Prioritisation%20Matrix.md).
---
## The AGENTIC Vault: documentation as infrastructure
The **AGENTIC Vault** is the structured repository that makes the whole system work. It's not a catalogue. It's infrastructure.
Every piece of documentation the framework produces is an asset that AI can act on:
- **Specifications are executable**: structured enough that an agent could read them, build from them, and verify against them
- **The vault is scannable**: an agent could look across all specifications and find patterns, redundancies, and optimisation opportunities
- **Improvements can surface automatically**: when a new capability lands, an agent could scan the spec vault and identify which workflows could now handle their exception cases differently
- **Governance is auditable**: every decision, every score, every human override is recorded. An auditor (human or AI) can trace any live agent back to its specification, its scoring rationale, its governance approvals, and the original conversation where the workflow was mapped
- **Nothing becomes a black box**: when someone leaves, every agent they built has a source file, a specification, and a build record. The system doesn't depend on any one person's memory
What it stores: Kickoff scan data, Assess specifications, Greenlight scores and decisions, Engineer build files and configurations, Nurture override logs and adoption data, Track capability updates. Everything feeds in. Everything is structured. Everything is scannable.
In practice, the vault might be a structured workspace in Notion, a Google Drive with a defined folder convention, a SharePoint library, a Confluence space, a Git repository, or a set of markdown files with consistent naming. Start simple. The structure is what makes it compound, not the platform.
"Machine-readable" here means structured enough that an agent could parse it without human interpretation. In practice, that means: consistent templates for each artefact type (every specification follows the same structure, every governance model uses the same fields), predictable file naming and folder hierarchy (so an agent can locate artefacts programmatically), and structured metadata (status, owner, version, pipeline stage, last-modified date) attached to each artefact. Today, this might be as simple as a Notion database with defined properties, or markdown files with YAML frontmatter in a Git repo. The bar isn't "fully automated." The bar is "an agent given access to this vault could find what it needs and understand what it's looking at."
Without structured documentation, AI adoption tends to fragment: different people using different tools for different tasks, with no shared memory and no way to build on what came before. The vault is what prevents that.
---
## Workflow decomposition
Real workflows are not flat. A parent workflow like "submit an ASX announcement" contains dozens of sub-workflows, and each of those may contain steps that are significant capabilities in their own right: generating a figure from a datatable, cross-referencing compliance requirements, formatting to regulatory standards.
The atomic unit here is the **workflow capability**: the organisation's repeatable ability to produce a specific outcome, with a defined specification, a known autonomy level, and measurable performance characteristics. A capability might be a full workflow, a sub-workflow, or a reusable module. The framework discovers, decomposes, builds, and governs capabilities. The vault catalogues them. The dashboard tracks them. The more capabilities you prove, the faster the next one is to prepare, because proven capabilities become building blocks.
The framework handles this through decomposition:
- **At Assess**: map the parent workflow first, then identify which steps are actually sub-workflows that need their own pass through the pipeline. Structure specifications as modular components. A sub-workflow that appears inside multiple parent workflows gets its own specification once, then gets referenced wherever it's needed
- **At Greenlight**: evaluate modules individually. A sub-workflow might be fully automatable even if the parent workflow isn't. Score both parent workflows and reusable modules. A module that appears across five parent workflows has different leverage than a one-off step
- **At Engineer**: build and prove modules independently. A proven module deployed inside one workflow is immediately available for any other workflow that needs the same capability
- **At Nurture**: the **AGENTIC Vault** catalogues modules alongside full workflows. Every reusable module that gets built and proven accelerates every parent workflow that uses it
- **At Track**: when new capability ships, modules that were previously scored too low get resurfaced. A proven module that unlocks a new parent workflow gets flagged for Greenlight
**When to decompose and when to stop.** Not every step needs its own pass through the pipeline. Decompose when a step is complex enough that it has its own decision logic, exceptions, or failure modes that would clutter the parent specification. Decompose when a step appears across multiple parent workflows and would benefit from being catalogued as a reusable module. Stop when a step is simple enough to be fully defined within the parent specification (a single action, no branching logic), when it has no reuse potential outside the current workflow, or when the overhead of a separate pipeline pass would exceed the value of the module it produces. A useful test: if the step can be described in two or three sentences with no ambiguity, it probably doesn't need its own specification. If you find yourself writing "it depends" or "except when," that's a signal to decompose.
This is how the framework compounds at scale. The second parent workflow that uses a proven module is dramatically faster to prepare than the first.
---
## How to measure success
The framework measures its own impact across four categories:
- **Operational**: completion rates, cycle time, error rates, escalation frequency across deployed workflows
- **Financial**: hours saved, cost per run, return on automation investment, payback period
- **Quality**: accuracy, consistency, exception handling success rate
- **Human**: override frequency, team satisfaction, adoption rate, tool-substitution rate
At the system level, the framework tracks **compounding metrics**: preparation time per workflow (is each new workflow faster than the last?), template reuse rate, pattern coverage, and time to adoption. These are the signals that show whether the system is learning.
---
## Workflow maturity
Every workflow capability has a maturity state that maps to where it sits in the pipeline. This gives organisations a common language for describing progress and setting targets.
| State | Meaning | Pipeline stage |
|---|---|---|
| **Captured** | The workflow has been identified and broadly scanned. It exists in the dashboard with rough data. | Kickoff |
| **Specified** | The workflow has been mapped in detail, formalised into a machine-readable specification, and validated by the Workflow Owner. | Assess |
| **Scored** | The workflow has been scored across all dimensions, assigned a collaboration model, and given a traffic-light status. A build decision has been made. | Greenlight |
| **Proven** | The workflow has been built, tested in sandbox, and validated through parallel-run against the success criteria defined at Assess. | Engineer |
| **Live** | The workflow is deployed in production, being monitored, and feeding override data back through the pipeline. | Nurture |
| **Compounding** | The workflow is stable, its modules are being reused in other workflows, and it is being rescored as capability evolves. | Track |
Not every workflow reaches every state, and that's fine. A workflow might stay at Captured indefinitely if it never scores high enough to enter the pipeline. A workflow at Scored might be marked ⚫ Human-run and stay there by design. The maturity states describe where a capability is, not where it should be.
At the organisation level, maturity shows up as the distribution across these states: how many capabilities are captured versus specified versus live versus compounding. An organisation early in adoption will have many workflows at Captured and few at Live. A mature portfolio will have a pipeline that's continuously moving capabilities through the states, with Track feeding new candidates back in as technology evolves.
---
## Common misconceptions
**"So you document everything first?"**
No. Kickoff scans fast. Detailed documentation happens only on the workflows that earn it through scoring. The framework is designed for action, not analysis paralysis.
**"This is a one-time project?"**
No. Kickoff keeps scanning. Nurture and Track run continuously. The streams run continuously. It's a living practice, not a transformation programme with an end date.
**"This replaces people?"**
No. It makes workflows legible enough that AI can handle the structured, repetitive steps, freeing people up for the work that actually needs their judgment, creativity, and care.
**"Does it lock you into specific tools?"**
No. The framework is tool-agnostic and portable. Nothing about AGENTIC creates lock-in.
---
## How AGENTIC relates to common approaches
Most organisations approach AI adoption through one or more of these. AGENTIC doesn't replace them. It's the layer that connects them.
**Maturity assessments** score an organisation's readiness for AI. Useful for benchmarking, but they produce a report, not a next step. AGENTIC can take that readiness data and feed it into workflow-level prioritisation.
**Use case workshops** generate ideas for where AI could add value. The challenge is what happens after: ideas fragment across teams without a system for prioritising or following through. AGENTIC provides that system.
**Vendor-led pilots** start with a tool and look for places it fits. They're useful for proving capability, but adoption gets shaped by what the tool can do rather than what the organisation needs. AGENTIC starts with the work and is tool-agnostic.
**Architecture guides** (like Anthropic's "Building Effective Agents") explain how to build agents: the patterns, the tooling, the evaluation methods. AGENTIC uses these as build vocabulary at the Engineer stage. They answer *how to build*. AGENTIC tries to answer *what to build, when, and how to keep it working*.
Any of these can operate inside AGENTIC. The framework is designed to be the connecting layer, not a competitor to any of them.
---
## What I haven't solved yet
I'm applying this at two organisations right now, both under 50 people. That means the framework works at small scale. I think the architecture scales, but I haven't proved that yet, and I'm not going to claim otherwise.
The agents described throughout this framework are a mix of what works today and what I'm building toward. An LLM can already extract workflows from a transcript, generate a first-pass specification, and score against defined dimensions with a human reviewing the output. A fully autonomous assessment agent that manages the dashboard end-to-end, a Builder that revises specs without supervision, agents that scan the vault and surface improvements unprompted — that's where this is heading, but it's not where it is as of today. Right now while the system is being developed, a human maintains the dashboard, reviews every score, and drives the feedback loops. The methodology works either way. The agents accelerate it. They don't gate it.
I don't have published case studies yet. The worked examples are based on real patterns but presented as composites. Formal case studies with measured outcomes will come as implementations progress.
The AI Governance Stream gives you a solid structure, but if you're in a regulated industry — healthcare, financial services, legal — you'll need to extend it with your own domain-specific compliance requirements. I've built the hooks for that, not the detail.
The framework doesn't solve organisational politics. It has structural answers for some of the human dynamics: resistance-as-data in the AI Adoption Stream, an Executive Sponsor role, governance sign-off gates. But when a sponsor loses interest at month three, or budget gets redirected, or someone quietly blocks adoption because it threatens their team's structure — those are political problems, not methodology problems. The framework can surface those signals early, especially through the AI Adoption Stream and the override data in Nurture. But surviving them depends on the people involved, not the system.
---
## Implementation team
The AGENTIC Framework defines roles, not headcount. In a small organisation one person might cover multiple roles. In a larger organisation each role might be a dedicated person or team. The framework works at any scale.
**AGENTIC Orchestrator**
Runs the AGENT pipeline end to end. Leads Kickoff scans, coordinates Assess and Greenlight, manages the **AGENT Prioritisation Matrix** alongside the assessment agent, and keeps the Governance and AI Adoption streams connected to the pipeline. This is the person who makes the framework happen. Could be an internal operations or transformation lead, or an external consultant.
**Workflow Owner**
The person closest to the work. They know how the workflow actually runs, not just how it's documented. They are the primary source in Assess, the reality check in Greenlight, the collaborative partner in Engineer. Every workflow needs one. This is not an AI role. Their value is domain knowledge.
**AI Governance Lead**
Owns the AI Governance stream. Defines risk tiers, accountability structures, ethical red lines, compliance checkpoints. Reviews and updates governance as the pipeline grows. Owns the decommission governance process. Could sit in legal, compliance, risk, or be a dedicated AI governance function.
**AI Adoption Lead**
Owns the AI Adoption stream. Transition planning, reskilling, trust building, resistance-as-data, enthusiasm-as-risk, adoption measurement. Responsible for closing the gap between a working system and an adopted one. Could sit in HR, be a dedicated change manager, or be the AGENTIC Orchestrator wearing another hat.
**Assessor and Builder**
These are roles, not people, and likely agents, though how they're implemented will depend on the organisation and will evolve with the technology. The Assessor lives across Nurture and Track: it monitors live workflows, feeds corrections back, logs every override as a learning signal, watches the AI capability frontier, and feeds the assessment agent that manages the dashboard. The Builder acts on what the Assessor finds by revising specifications and proving fixes in sandbox so the team can decide what's ready to configure and deploy. These are the only roles in the framework designed to be machines.
**Executive Sponsor**
Senior enough to clear blockers, make resourcing decisions, and back the hard calls. Especially important when the AI Adoption stream surfaces uncomfortable truths about role changes or when governance and speed are in tension. The Executive Sponsor is the person who looks at the dashboard and says "go."
> [!tip] Minimum viable team
> Three roles: AGENTIC Orchestrator, Workflow Owner, and Executive Sponsor. The Orchestrator covers governance and people early on. As the pipeline grows and more workflows come through, the AI Governance Lead and AI Adoption Lead become their own roles. The Assessor and Builder come online at Nurture when there are deployed workflows to monitor. The assessment agent that manages the dashboard can begin work as soon as Kickoff data exists.
**Framework overhead and what to defer.** The full framework describes a lot of moving parts. For a small team (one orchestrator, a few workflow owners, a supportive sponsor), running everything at once would be heavier than the work it's trying to improve. The framework is designed to be adopted incrementally. Start with Kickoff and Assess. These produce documented SOPs and specifications with no infrastructure requirements. Add Greenlight when you have enough workflows to need prioritisation (three or more candidates is a reasonable threshold). Engineer and Nurture come online when you're actually building. Track becomes valuable once you have deployed workflows and want the system to compound. The Governance and Adoption streams can start as lightweight checklists maintained by the Orchestrator and grow into dedicated functions as the pipeline scales. The Vault can start as a single folder with a naming convention. Resist the urge to build infrastructure ahead of the work that needs it.
**What changes at scale.** The framework is the same at any size, but how you run it shifts.
At **5 workflows or fewer** (first function, first pass): one Orchestrator runs everything. The vault is a folder. The dashboard is a spreadsheet. Governance is a checklist. The Adoption stream is conversations with the people affected. This is where most organisations start, and it's enough.
At **10–20 workflows** (multiple functions scanned, several live): the vault needs real structure, consistent templates and naming conventions so artefacts are findable. The dashboard needs to be a proper shared tool (a Notion database, a structured spreadsheet with defined views). The Governance and Adoption streams become recurring agenda items, not ad hoc conversations. A dedicated AI Governance Lead starts to make sense. The Orchestrator is spending most of their time on the framework.
At **50+ workflows** (portfolio scale): governance needs its own function, not a checklist. The vault needs tooling that supports search, versioning, and access control. The dashboard needs to support filtering by function, status, and owner. Track becomes critical because the volume of capability changes and resurfacing decisions is too high for manual management. The AI Adoption Lead is a dedicated role. You're likely running multiple Kickoff scans in parallel across different functions. The Assessor and Builder agents become essential because manual monitoring of 50+ live workflows isn't sustainable.
At **200+ workflows** (enterprise scale): the framework hasn't been tested here yet. The architecture is designed to scale (the vault, the dashboard, the feedback loops all work in principle at any size), but governance, tooling, and team structure at this scale will need adaptation that doesn't exist in the current version. This is an honest gap. If you're operating at this scale and applying the framework, your experience would shape what V6 looks like.
---
## Applying the AGENTIC Framework in practice
Before you score anything, ask the people who do the work one question: *what would you love to stop doing?*
Everyone has tasks they'd get rid of tomorrow if they could: the repetitive admin, the weekly report nobody reads, the inbox triage that eats the first hour of every day. Start there. When someone sees the thing they hate most start to disappear, they don't need to be sold on the framework. They're already in.
The best place to start is a Kickoff scan with one function or team. One day. Two passes: a fast task list sweep to identify candidates, then deep conversations on the ones that earned it. Let the filtering agent surface which workflows are worth the deep dive. Then run the first one through the full AGENT pipeline before starting the second. The discipline of completing all five stages on a single workflow teaches the methodology faster than starting five workflows at once.
> [!tip] Where to start
> Pick one function. Spend a day. Start with the task list sweep: capture every task, capture the tech stack. Let the filtering agent flag what's worth going deep on. Then have the conversations, but only on the shortlisted tasks. Feed the data into the dashboard. Pick one workflow to take through the full pipeline, ideally one that runs at least weekly, involves at least one handoff, and that someone in the organisation could describe differently to how it actually runs. That gap between the official version and the real version is exactly where AGENTIC begins.
---
*Framework developed by Madeleine Pierce. Currently being applied at Marine Megafauna Foundation (MMF) and MDF Global.*
---
## Version history
| Version | Date | What changed and why |
| ------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| V1 | February 2026 | First draft. Seven stages in three phases (Foundation, Execution, Evolution). Got the ideas down but the structure was too linear. |
| V2 | February 2026 | Pulled governance and people out as continuous streams instead of stages. This was the key structural insight: governance isn't a phase you complete, and the human side isn't an afterthought. Five pipeline stages, two parallel streams. |
| V3 | March 2026 | Split into overview and linked detail pages. The framework had outgrown a single document. Added the Kickoff Playbook as a fast-start entry point. |
| V4 | March 2026 | Major pipeline restructure. Merged early stages into Assess. Created Greenlight as a proper decision gate. Added Engineer as a dedicated build stage. Split monitoring into Nurture (live workflows) and Track (capability frontier). The pipeline went from mapping-heavy to action-oriented. |
| V5 | March 2026 | Kickoff became a first-class entry point with one-day scans per function. The Prioritisation Matrix moved to the centre of everything. The Vault became structured infrastructure. Assessment agent concept introduced. Decommission path added. Architectural patterns integrated into Engineer. The framework went from "here's a process" to "here's an operating system." |
| V5.1 | March 2026 | Credibility pass after external review. Sharpened the line between what works today and what's design intent. Grouped scoring dimensions into three categories. Named "workflow capability" as the framework's atomic unit. Added workflow maturity states, scaling guidance, and decomposition heuristics. |
| V5.2 | March 2026 | Kickoff evolved to a two-pass discovery model (task list sweep, then deep conversations on shortlisted tasks). |
---
## Glossary
**AGENTIC Framework**: The full operating model: Kickoff, the AGENT Pipeline, the AI Governance Stream, the AI Adoption Stream, the AGENT Prioritisation Matrix, and the AGENTIC Vault.
**AGENT Pipeline**: The five-stage sequence for preparing, building, and sustaining AI workflows: Assess, Greenlight, Engineer, Nurture, Track.
**AGENT Prioritisation Matrix**: The living dashboard at the centre of the framework. Designed to be agent-populated and agent-managed. Scores every workflow across multiple dimensions (workflow characteristics, implementation factors, organisational readiness) and continuously resurfaces what to work on next.
**AGENT Collaboration Spectrum**: Four levels defining how humans and AI work together on each step of a workflow: AI-run, AI-led with human verification, human-led with AI assistance, human-run.
**AGENTIC Vault**: The structured repository of every artefact the framework produces. Specifications, governance models, performance data, override logs, reusable patterns. Infrastructure, not a catalogue.
**Workflow capability**: The organisation's repeatable ability to produce a specific outcome, with a defined specification, a known autonomy level, and measurable performance characteristics. A capability might be a full workflow, a sub-workflow, or a reusable module. The atomic unit the framework discovers, builds, and governs.
**AGENTIC Orchestrator**: The person who runs the framework end to end. Leads Kickoff, coordinates the pipeline, manages the dashboard, connects the streams.
**Workflow Owner**: The person closest to the work. Their knowledge of how the workflow actually runs is the primary input to Assess.
**Assessment agent**: The agent designed to score Kickoff data, manage the Prioritisation Matrix, run quick capability tests, and resurface candidates as capability evolves.
**Greenlighting Agent**: The agent that would read specifications from Assess and produce first-pass scoring across all dimensions with evidence and confidence flags.
**Specification Generator**: The prompt-driven agent that translates Assess outputs into two-layer workflow specifications (human-readable playbook + structured specification). Also operates as "the Builder" at Nurture, revising specifications based on live data.
**Assessor**: The agent role at Nurture that monitors live workflows, logs overrides, catches recurring gaps, and feeds signals back through the pipeline.
**Builder**: The agent role at Nurture that acts on what the Assessor finds: revising specifications and proving fixes in sandbox before anything goes live. The Specification Generator in autonomous mode.
**Kickoff**: The entry point. Two passes: a task list sweep that a filtering agent uses to identify candidates, then deep conversations on the shortlisted tasks. Feeds the Prioritisation Matrix.
**Parallel-run**: The phase where an agent shadows the human process before any handoff. Outputs are compared side by side. Trust is built through evidence, not promises.
**Traffic-light system**: The readiness statuses in the Prioritisation Matrix: 🟢 Ready to Build, 🟡 Evaluate Further, 🔴 Not Yet, ⚫ Human-run, ⬜ Retired.
**Intake loop**: The feedback loop where the assessment agent resurfaces previously scanned workflows as capability evolves. Keeps the pipeline fed continuously.
**Reality loop**: The feedback loop where human overrides in Nurture reveal gaps that feed back into Assess or Greenlight for specification revision.
**Design loop**: The feedback loop between Greenlight and Assess, where collaboration model design reveals specification gaps that need revision.
**Refinement loop**: The feedback loop within Assess, where formalising a specification reveals missing rules or edge cases that send you back to the Workflow Owner.
**Workflow maturity states**: Six states describing where a workflow capability sits in the pipeline: Captured (Kickoff), Specified (Assess), Scored (Greenlight), Proven (Engineer), Live (Nurture), Compounding (Track).