PROJ - Docker Pi Agent Runtime - SQLite Container Setup - mnml's vault

# Docker Pi Agent Runtime - SQLite Container Setup This project builds a small local runtime for running [[pi]] as a Dockerized coding agent over a SQLite-backed input/output boundary. The goal is not to replace pi's agent loop. The goal is to place pi inside a controlled execution shell: a read-only input database, a writable output database, a Docker container, a Go HTTP API, and a dashboard that can observe runs as they happen. > [!summary] > The project is a local agent-container runtime with three central ideas: > 1. **Pi owns the agent loop.** The Go service starts and observes pi; it does not reimplement tool calling, turns, model selection, or session history. > 2. **SQLite defines the boundary.** The input database is mounted read-only, while the output database records runs, events, and dashboard messages. > 3. **Docker gives each run a clear execution envelope.** The container receives only the mounted files and pi configuration it needs. The system began as a hand-drawn architecture: project state flows from a central database into a read-only SQLite snapshot; an agent runs in a container; outputs flow into a writable database and a dashboard. The implementation deliberately starts smaller. Postgres and project export are future upstream concerns. The runtime today accepts an existing SQLite input file, starts Dockerized pi, and writes observable run state into a managed SQLite output database. ## Why this project exists A coding agent is most useful when it can act with enough context and enough tools to get work done. It is also most dangerous when it has too much ambient authority. If an agent has direct access to the full host filesystem, long-lived credentials, arbitrary network egress, and an unstructured log stream, then every run becomes difficult to inspect and difficult to reproduce. You may know that the agent did something useful, but you cannot easily answer: what was the initial state, what did it see, what did it emit, and what remains after the process exits? This project answers that problem with a simple discipline: the agent run has an explicit input, an explicit output, and a disposable execution envelope. The input is a SQLite database mounted read-only. The output is another SQLite database that records run metadata, events, and messages. The execution envelope is a Docker container running pi. The Go server is the coordinator that prepares these pieces, starts the container, and presents the result through an HTTP API and a small dashboard. That separation matters because it changes how we reason about agent work. A normal shell session is a blur of commands, stdout, files, and history. A run in this system is a record: - There was an input database at a path. - There was a run ID. - There was a Docker image. - There was a pi session file. - There were events with timestamps. - There were messages suitable for a dashboard. - There was a final status. The design is not complete sandboxing. Docker is not a hostile-code security boundary by itself, and the current prototype mounts `~/.pi/agent` for pi credentials and settings. But the project already gives us a much clearer shape than an ad hoc script. ## Current project status The repository is active and has a working prototype. The current implementation can: - build a Docker image named `claw-pi-agent:latest`, - bake pi and the configured pi provider/extension packages into that image, - mount a read-only input SQLite database into the container as `/data/input.db`, - mount a writable output database as `/data/output.db`, - mount a run session directory as `/session`, - mount the host pi configuration directory into `/root/.pi/agent`, - start pi in print mode, - stream process stdout and stderr into the output database, - ingest pi's JSONL session log after the run finishes, - expose an HTTP API and simple web dashboard, - manage the whole dev setup through `devctl`. The current runtime still uses `pi --print`. A new design ticket, `PI-RPC-SDK-RUNTIME`, plans the next step: switch to pi RPC or SDK integration so the dashboard can receive structured live events such as assistant text deltas, tool execution updates, queue updates, extension UI requests, aborts, and steering messages. ## Project shape At a high level, the project has five layers. ```mermaid flowchart TD User[User or browser dashboard] API[Go HTTP API and Glazed CLI] Runtime[Go runtime service] Docker[Docker container] Pi[pi-coding-agent] Input[(Read-only input SQLite)] Output[(Writable output SQLite)] Session[pi JSONL session log] Devctl[devctl startup system] Devctl --> API User --> API API --> Runtime Runtime --> Docker Docker --> Pi Input --> Docker Docker --> Output Pi --> Session Session --> Runtime Runtime --> Output API --> Output style Input fill:#d8f3dc,stroke:#2d6a4f style Output fill:#ffe8cc,stroke:#d9480f style Docker fill:#dbeafe,stroke:#1d4ed8 style Pi fill:#f3e8ff,stroke:#7e22ce ``` The Go API is the face of the system. It is what the browser and CLI talk to. The runtime service behind it knows how to validate paths, create the output database, assign run IDs, start Docker, stream process logs, and record completion. The Docker image contains pi and all pinned pi packages needed by the mounted `~/.pi/agent/settings.json`. SQLite is used twice: once as the input snapshot and once as the output ledger. This shape deliberately avoids a distributed queue, a separate Postgres dependency, and a bespoke agent protocol. Those may appear later. The prototype is a single-machine system that can be understood by reading a handful of files. ## The core mental model The simplest way to understand the runtime is to think of an agent run as a function: ```text run(input.db, prompt, pi-config, image) -> output.db + session.jsonl + status ``` That notation is not merely a simplification. It captures the reason the system is useful. The input database is the initial world. The prompt is the instruction. The pi config provides model/provider credentials and extensions. The image provides the software environment. The output database and session log are the observable result. A conventional process can mutate anything it can reach. This runtime tries to narrow what the process can reach. The agent sees the input database at a stable path, `/data/input.db`, but it is mounted read-only. The agent can write to the output database and session directory. The host can inspect the output database while the run proceeds. When the container exits, the host still has a compact, queryable record. ```text Before run: tmp/input.db # project snapshot or fixture ~/.pi/agent # pi settings, auth, models, packages claw-pi-agent:latest # software environment During run: /data/input.db # read-only view inside container /data/output.db # writable output DB inside container /session/session.jsonl After run: .claw-runs/.../claw-output.db .claw-runs/.../run_<id>/session.jsonl ``` The project uses SQLite not because SQLite is glamorous, but because it is exactly the right size for the boundary. A SQLite file is copyable, mountable, inspectable, and easy to archive. It gives the system a concrete artifact instead of a vague claim that "the agent had context." ## Architecture in detail ### The run lifecycle When a caller starts a run, the runtime follows a fixed sequence. Understanding this sequence is the key to understanding the whole project. ```mermaid sequenceDiagram participant Browser as Browser or CLI participant API as Go API participant RT as Runtime Service participant Store as Output SQLite participant Docker as Docker participant Pi as pi in container participant Session as session.jsonl Browser->>API: POST /v1/runs API->>RT: StartRun(request) RT->>Store: Init schema RT->>Store: Insert runs row status=starting RT->>Docker: docker run claw-pi-agent:latest RT->>Store: Insert event container_starting Docker->>Pi: pi --print --session /session/session.jsonl prompt Pi-->>RT: stdout/stderr lines RT->>Store: Insert process_stdout/process_stderr messages Pi->>Session: Write JSONL session entries Docker-->>RT: process exit RT->>Session: Read session.jsonl RT->>Store: Insert user/assistant/tool messages RT->>Store: Update run status=succeeded or failed Browser->>API: Poll events/messages API->>Store: Query rows Store-->>Browser: JSON response ``` There are three moments where the runtime writes useful state: 1. **At run creation**, it writes the run row and `run_created` event. This gives the dashboard something to display immediately. 2. **During execution**, it streams stdout and stderr as `process_stdout` and `process_stderr` messages. This makes setup logs, warnings, and pi output visible while the process is active. 3. **After execution**, it ingests pi's session JSONL file and records structured user, assistant, and tool result messages. The third step is the limitation that motivates the next ticket. The structured pi messages arrive after completion because print mode is a one-shot interface. RPC mode will let the system receive structured events while pi is running. ### The Docker boundary The Docker image is defined in `Dockerfile.pi-agent`. It uses `node:22-bookworm-slim`, installs shell basics, `git`, `ripgrep`, and `sqlite3`, then installs pi and the pi packages currently configured in the host pi settings. The pinned package list is not an implementation detail; it is a startup-time decision. Earlier versions of the prototype allowed pi to install packages at container startup. That worked, but it made the first seconds of every run noisy and slow. Baking the packages into the image moves that cost to build time. Current pinned packages: ```dockerfile RUN npm install -g \ @mariozechner/[email protected] \ [email protected] \ @thesethrose/[email protected] \ @imsus/[email protected] \ [email protected] ``` The image sets: ```dockerfile ENTRYPOINT ["pi"] ``` That means Docker arguments after the image name are pi arguments. The runtime must append `--print`, `--session`, and the prompt. It must not append another `pi` executable name. The Docker mount set is assembled in `internal/runtime/runtime.go`: ```text Host path Container path Mode --------- -------------- ---- <input DB> /data/input.db read-only <output DB> /data/output.db read-write <session directory> /session read-write ~/.pi/agent or --pi-home /root/.pi/agent read-write by default ``` The read-only input mount is the important one. It means pi can inspect the database but cannot rewrite it. The writable output mount gives the run a place to leave state. The pi home mount provides credentials and settings. That mount is convenient and necessary for the current prototype, but it is also the most sensitive part of the system. ## SQLite as the boundary The output database is intentionally small. It is not meant to be a full clone of pi's internal session model. Pi already writes a session JSONL file containing full turn history, tool calls, usage, and provider details. The output database is the dashboard index. The current schema lives in `internal/store/store.go`. ### runs A run row is the spine of the system. ```sql CREATE TABLE IF NOT EXISTS runs ( id TEXT PRIMARY KEY, input_db_path TEXT NOT NULL, output_db_path TEXT NOT NULL, session_path TEXT, agent_image TEXT NOT NULL, status TEXT NOT NULL, started_at TEXT NOT NULL, finished_at TEXT, error TEXT ); ``` A run answers the question: what process did we start, over what input, and how did it end? ### events Events are compact lifecycle facts. ```sql CREATE TABLE IF NOT EXISTS events ( id INTEGER PRIMARY KEY AUTOINCREMENT, run_id TEXT NOT NULL, ts TEXT NOT NULL, level TEXT NOT NULL, type TEXT NOT NULL, message TEXT NOT NULL, payload_json TEXT, FOREIGN KEY(run_id) REFERENCES runs(id) ); ``` The dashboard uses events for status display. Examples include: - `run_created` - `container_starting` - `session_ingested` - `run_succeeded` - `run_failed` ### messages Messages are transcript-like rows for the dashboard. ```sql CREATE TABLE IF NOT EXISTS messages ( id INTEGER PRIMARY KEY AUTOINCREMENT, run_id TEXT NOT NULL, session_entry_id TEXT, parent_entry_id TEXT, ts TEXT NOT NULL, role TEXT NOT NULL, content TEXT NOT NULL, raw_json TEXT, FOREIGN KEY(run_id) REFERENCES runs(id) ); ``` The same table holds two classes of messages: - Process messages streamed during execution, with roles such as `process_stdout` and `process_stderr`. - Pi session messages ingested after execution, with roles such as `user`, `assistant`, and `toolResult`. This dual use is pragmatic. It lets the dashboard present one timeline. But the distinction matters: process stdout is a stream of lines, while pi session messages are structured agent history. ## The Go runtime The runtime's central type is `Service` in `internal/runtime/runtime.go`: ```go type Service struct { RunsDir string DefaultOutputDB string } ``` The default output database is derived from the run directory: ```go return &Service{ RunsDir: runsDir, DefaultOutputDB: filepath.Join(runsDir, "claw-output.db"), } ``` This small default changed the user experience. Earlier, the dashboard required a user to enter an output database path and a run ID. The current server owns a default database under `.claw-runs/devctl/claw-output.db`, so the browser can start and select runs without asking the user to paste internal paths. ### StartRun `StartRun` is the entry point for a run. It resolves defaults, validates the input database, opens or creates the output database, initializes the schema, creates a run ID, creates the session directory, inserts the run row, and either runs synchronously or starts a goroutine. The flow can be read as pseudocode: ```text function StartRun(request): request.output_db = request.output_db or service.default_output_db request.image = request.image or "claw-pi-agent:latest" request.pi_home = request.pi_home or "$HOME/.pi/agent" request.prompt = request.prompt or default prompt validate input database opens read-only create output database parent directory initialize output schema run_id = timestamp-based id session_path = runs_dir/run_id/session.jsonl insert runs row with status "starting" insert event "run_created" if request.wait: execute run in this goroutine return final run row else: execute run in background return starting run row ``` The `--wait` option is important for smoke tests. Without it, a CLI invocation can return before the Docker process has finished. For the web server, background execution is correct. For tests, synchronous execution is easier to reason about. ### execute `execute` is the host-side run supervisor. It opens the output database, marks the run as running, starts the Docker command, attaches to stdout and stderr, waits for the process to finish, ingests the session file, and updates the final status. The key implementation idea is that stdout and stderr are streamed into SQLite while the process is still running: ```go go streamLines(ctx, &wg, stdout, st, runID, "process_stdout") go streamLines(ctx, &wg, stderr, st, runID, "process_stderr") ``` That makes the dashboard live enough to show process output during execution. It does not yet make pi's internal events live. That is why the next design moves to RPC. ### buildCommand `buildCommand` is where the container boundary becomes concrete. The current command is equivalent to: ```bash docker run --rm \ --mount type=bind,src=/abs/input.db,dst=/data/input.db,readonly \ --mount type=bind,src=/abs/output.db,dst=/data/output.db \ --mount type=bind,src=/abs/session-dir,dst=/session \ --mount type=bind,src=$HOME/.pi/agent,dst=/root/.pi/agent \ claw-pi-agent:latest \ --print \ --session /session/session.jsonl \ "<prompt>" ``` The actual Go code uses `exec.CommandContext`, not a shell string. That is the right choice because it avoids shell quoting bugs and makes each Docker argument explicit. ## The HTTP API and dashboard The HTTP server is intentionally simple. It uses Go's standard `net/http` mux and serves both JSON endpoints and a static HTML/JavaScript dashboard from `internal/api/server.go`. Important endpoints: ```text GET /healthz GET / GET /app.js POST /v1/runs GET /v1/runs GET /v1/runs/{id} GET /v1/runs/{id}/events GET /v1/runs/{id}/messages ``` The dashboard lets a user enter an input DB and a prompt, start a run, refresh the run list, click a run, and poll its events/messages. It uses the server's default output database unless the user provides an override. The dashboard is not meant to be the final UI. It is a thin teaching tool and debugging surface. Its value is that it proves the runtime loop works end to end: ```text browser -> Go API -> Dockerized pi -> SQLite output -> browser polling ``` ## devctl as the startup system The project now uses `devctl` to make local startup repeatable. That matters because this repo is no longer just a Go binary. To start it correctly, a developer must ensure Docker is available, the image is built, tests pass, and the dashboard process is supervised. The workflow is documented in `DEVCTL.md`: ```bash devctl up --force --timeout 900s ``` The devctl plugin does four things: 1. `config.mutate` defines stable config values such as the service port and image name. 2. `validate.run` checks for `go`, `docker`, `make`, Docker daemon availability, and the pi home directory. 3. `build.run` runs `go test ./...` and `make docker-build`. 4. `launch.plan` returns a `claw-dashboard` service for devctl to supervise. The service command is: ```bash go run ./cmd/claw-agent serve --addr :8787 --runs-dir .claw-runs/devctl ``` The health check is: ```text http://127.0.0.1:8787/healthz ``` This is the right division of responsibilities. The plugin computes what should be built and launched. devctl handles the process lifecycle, logs, state file, and shutdown. ## A worked example A minimal input database can be created with Python: ```bash mkdir -p tmp python3 - <<'PY' import sqlite3 conn = sqlite3.connect('tmp/input.db') conn.execute('create table if not exists project(id text primary key, name text)') conn.execute('insert or replace into project values (?,?)', ('p1', 'demo-project')) conn.commit() conn.close() PY ``` Start the environment: ```bash devctl up --force --timeout 900s ``` Then open: ```text http://127.0.0.1:8787/ ``` Use a prompt like: ```text Use bash to run: sqlite3 /data/input.db "select name from project limit 1;". Then answer exactly: OK <name> ``` The intended trace is: ```text run_created container_starting process_stdout / process_stderr lines, if any session_ingested run_succeeded ``` The final messages should include: ```text user: Use bash to run sqlite3 ... assistant: [tool call] bash toolResult: demo-project assistant: OK demo-project ``` This example is deliberately small. It proves the boundary. If pi can read `/data/input.db`, run `sqlite3`, and return the expected answer, then the Docker mount, pi configuration, output database, session ingestion, and dashboard are all connected. ## Design decisions and why they matter ### Pi owns the agent loop The Go runtime does not implement an agent loop. It does not decide when to call a model, how to parse tool calls, how to apply thinking levels, or how to manage session compaction. Pi already does those things. Reimplementing them would create a second agent framework and immediately make the project harder to maintain. The Go runtime is therefore an orchestrator. Its job is to define the boundary around pi, not to become pi. ### SQLite is the run ledger The output database is not a message queue and not a warehouse. It is a ledger for local development and dashboard display. SQLite is well suited to this because it keeps run state in a single file and gives us SQL inspection without requiring a server. The current schema is intentionally minimal. Adding too many tables too early would force the runtime to guess which pi details matter. The next ticket adds raw RPC events and tool execution tables only because the dashboard needs them for live streaming. ### Docker is a boundary, not a complete security model Docker gives us mount control, a disposable process, a pinned software environment, and repeatable startup. It does not make arbitrary agent behavior safe by itself. The host pi credentials are mounted into the container. The model can request tool calls. The system should be treated as a developer-convenience isolation boundary, not as a hostile sandbox. ### devctl captures operational knowledge Before devctl, startup knowledge lived in memory and scattered commands: build the image, run tests, start the server, check the port. The devctl plugin makes that knowledge executable. A new developer can run `devctl up` and get the same build and launch path. ## Current limitations The biggest limitation is that the system still uses `pi --print`. Structured pi session messages are available only after the run completes and the session JSONL file is ingested. The dashboard can stream process stdout/stderr during execution, but process output is a blunt instrument. Other limitations: - The pi home mount gives the container access to host pi configuration and credentials. - There is no per-run network policy yet. - The dashboard is a minimal static page, not a polished application. - There is no cancellation endpoint wired to a running Docker process yet. - The output database is a dashboard index, not a full structured representation of every pi event. - The Docker image's pinned package list must be kept in sync with `~/.pi/agent/settings.json`. ## The next step: pi RPC/SDK integration The project already has a follow-up ticket for this: `PI-RPC-SDK-RUNTIME`. The idea is to replace the opaque print-mode process with a structured integration. In RPC mode, pi runs as: ```bash pi --mode rpc --session-dir /session ``` The Go runtime sends JSON commands to stdin: ```json {"id":"req-1","type":"prompt","message":"Inspect /data/input.db and summarize it."} ``` Pi emits JSON events on stdout: ```json {"type":"message_update","assistantMessageEvent":{"type":"text_delta","delta":"Hello"}} ``` That changes the dashboard from a polling log viewer into a live agent console. It can show assistant text deltas, tool execution progress, queue changes, compaction, retries, and extension UI requests while the run is active. The SDK path goes one step further. A Node worker can import `@mariozechner/pi-coding-agent`, create an `AgentSession`, subscribe to events directly, and expose a worker protocol to Go. That is more powerful, but it adds a TypeScript service. The design therefore recommends RPC first and SDK worker later. ## Implementation map A new reader should start with these files: | File | Why it matters | |---|---| | `internal/runtime/runtime.go` | Shows how runs are started, how Docker is invoked, and how stdout/stderr/session ingestion works. | | `internal/store/store.go` | Defines the SQLite schema and all current DB read/write paths. | | `internal/api/server.go` | Defines the HTTP API and the static dashboard. | | `internal/session/ingest.go` | Converts pi JSONL session entries into dashboard messages. | | `Dockerfile.pi-agent` | Defines the container image and pinned pi package set. | | `scripts/devctl-plugin.py` | Encodes the devctl build/validate/launch pipeline. | | `DEVCTL.md` | Explains how to run the project in daily development. | | `ttmp/2026/04/29/PI-RPC-SDK-RUNTIME--pi-rpc-sdk-runtime-integration/design-doc/01-pi-rpc-sdk-runtime-integration-design-and-implementation-guide.md` | Plans the RPC/SDK streaming implementation. | ## Failure modes to remember ### Repeated npm installs at startup This happened when configured pi packages were not baked into the image. Pi saw packages in `~/.pi/agent/settings.json`, found them missing in the container, and installed them on startup. The result was slow runs and noisy stderr. The fix was to pin and install the packages in `Dockerfile.pi-agent`. ### Stale manual server on port 8787 A manually launched server can keep the port busy and confuse devctl. The working rule is to use devctl for lifecycle: ```bash devctl status devctl down devctl up --force --timeout 900s ``` ### JSON columns scanning as `json.RawMessage` SQLite returned JSON text columns as strings. Scanning directly into `json.RawMessage` caused API failures. The fix was to scan into strings and then convert to `json.RawMessage`. ### Losing structured events with print mode Print mode is not a streaming event API. It is useful for a prototype, but any dashboard that needs live tool events must move to RPC or SDK. ## Working rules - Treat the input database as immutable during a run. - Treat the output database as the dashboard index, not the full pi source of truth. - Keep pi's JSONL session file as the authoritative transcript for print-mode runs. - Do not reimplement pi's agent loop in Go. - Use devctl to start and stop the dashboard server. - Rebuild the Docker image after changing the pinned pi package list. - Keep Docker arguments explicit with `exec.CommandContext`; avoid shell-assembled Docker commands. - Preserve print mode as a fallback until RPC mode has its own smoke tests. ## Near-term next steps 1. Implement an `internal/pi/rpc_client.go` package with fake JSONL tests. 2. Add a `pi_mode` option so runs can choose `print` or `rpc`. 3. Store raw RPC events in a new `rpc_events` table. 4. Map tool execution events into a `tool_executions` table. 5. Add HTTP control endpoints for `abort`, `steer`, `follow-up`, and `state`. 6. Add a Server-Sent Events endpoint for live browser updates. 7. Update the dashboard to show active tool calls and streaming assistant text. 8. Decide whether a Node SDK worker is needed after RPC mode is stable. ## Closing thought The project is small, but it teaches a general pattern. When an agent becomes powerful enough to do useful work, the engineering problem shifts from "how do I call the model?" to "how do I make the run observable, bounded, and replayable?" This runtime answers that question with ordinary tools: Docker for process boundaries, SQLite for artifacts, Go for the control plane, pi for the agent loop, and devctl for repeatable startup. The value is not any single component. The value is the shape they create together.