AI in Action - Agents - mnml's vault

# [[LLM Agents]] ---  ## About this talk - assume we use LLMs at the core of our agents - focus on the fundamentals - how can agents be implemented? - patterns, new and old - think about production systems - change the way we think about implementing agents ---  ## Resources - Books - [Multiagent Systems](https://mitpress.mit.edu/9780262533874/multiagent-systems/) - [Artificial Intelligence: A Modern Approach, 4th US ed.](http://aima.cs.berkeley.edu) ---  ## Resources - Codebases - [LangChain](https://www.langchain.com) - [dust.tt](https://github.com/dust-tt/dust) - [microsoft autogen](https://github.com/microsoft/autogen) - [agency-swarm](https://github.com/VRSEN/agency-swarm/blob/main/README.md) - [generative agents (smallvile)](https://github.com/joonspk-research/generative_agents) - [ai-town](https://github.com/a16z-infra/ai-town) ---  ### Resources - Papers - [Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/pdf/2304.03442.pdf) - [AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation](https://arxiv.org/abs/2308.08155) - [Encouraging Divergent Thinking through Multiagent Debate](https://www.semanticscholar.org/paper/Encouraging-Divergent-Thinking-in-Large-Language-Liang-He/385c74957858e7d6856d48e72b5a902b4c1aa28c) - [LLMs as Tool Makers](https://www.semanticscholar.org/paper/Large-Language-Models-as-Tool-Makers-Cai-Wang/32dcd0887537cece54e214f531d2c384470b023f) - [Survey on LLM Autonomous Agents](https://www.semanticscholar.org/paper/A-Survey-on-Large-Language-Model-based-Autonomous-Wang-Ma/28c6ac721f54544162865f41c5692e70d61bccab) ---  ### Resources - Patterns - [Function Calling with LLMs](https://www.promptingguide.ai/applications/function_calling) - [Finite-state machine - Wikipedia](https://en.wikipedia.org/wiki/Finite-state_machine) - [From Callback to Future to Monad](https://medium.com/hackernoon/from-callback-to-future-functor-monad-6c86d9c16cb5) - [Blackboard system - Wikipedia](https://en.wikipedia.org/wiki/Blackboard_system) ---  ## What is an agent? ![[AI in Action - Agents-1708106592551.jpeg|600]] - From Russell & Norvig: - "anything that views the environment through sensors and acts through actuators" ---  ## What is an agent? - Further: - reasoning over a representation of the world (knowledge-based) - multiple agents that communicate with each other ---  ## Russel & Norvig - Part III: Knowledge, reasoning and planning - Part IV: Decision making - Chapter 10, 11: Knowledge representation, automated planning - Chapter 14, 17: Simple decisions, multiagent decision making - Part II: search - LLM killer feature: heuristics Patterns important, actual methods less so, since LLMs are so good. ---  ## Fundamentals with LLMs - environment perception - everything is in the prompt - knowledge: - world representation - special cases of world representation: - agent inner state - communication with other agents (other agents inner state) ---  ## Fundamentals with LLMs - reasoning step - call out to the LLM - trivial conceptually, tempting to implement trivially - at scale, different beast - acting upon the world - parse LLM response and do tool calls - update the world representation and agent inner state ---  ## So, what is an agent? - collection of tools - usually, async functions - constructing world representation from KB - a set of "selectors" and transformation rules - chunking, summarization, structured data extraction, prompt templates ---  ## What is an agent? - reasoning backend "router" - call to an LLM, to a more traditional reasoning engine - this is really just another "tool" though - updating world and knowledge of the world by parsing backend answer - a set of "selectors" ---  ## What is an agent? - a "meta-plan" that allows us to select what next step to execute - traditional program (think python code) - state machine - monadic representation - triggers that tell us when to execute the next step - user input, event trigger, HTTP call, etc... - KB update: event driven blackboard system ---  ## Multiagent systems ![[AI in Action - Agents-1708109256247.jpeg|600]] ---  ## Multiagents - two key components - communication / shared state - decision making - which agent does what - synchronization - when does which agent do what ---  ## Implement for production - LLM calls: - expensive - slow - resource contention - probabilistic We should base our engineering around this really problematic technical constraint, not about how our brains conceptualize agents. ---  ## Implement for production The hard part of building LLM based production systems is about how to chain a series of slow HTTP calls: - we know how to do that very well! that's all HTTP apps out there. - one long running python application per client is not how you scale. - microservices, event driven systems: - easily scale up or down and leverage constrained resources - observability and tracing - debuggability and simplified development ---  ## Implement for production We need to think about how "agents" as an autonomous "thing" that executes a series of steps can be converted to "exploded" event architectures. ---  ## Agent sketch ```typescript global KB; async function doLLMCall(input: Prompt): Promise<string> {} async function parseLLMResponse(kb: KB, input: string) Promise<StructuredData> {} async function executeLLMResponse(kb: KB, tools: Tools, input: StructuredData) Promise<{updatedKB: KB}> {} async function computeNextStep(kb: KB): Promise<{nextStep: Prompt, updatedKB: KB}> {} ``` Decomposing it this way shows that: - an agent is actually a set of independent functions ---  ## Agent sketch If you squint at the types: - multi agent systems have the exact same signature, except that nextStep is `[]Prompt`. - in fact, nothing keeps us from running the inference for multiple agents in a single LLM call! - this also bypasses the RLHF, which is usually beneficial ---  ## Agent sketch This impacts real-life implementation: - error handling - priority routing / retries / rate limiting - observability - debuggability (can be rerun entirely separately) - scalability (lambdas) ---  ## Agent sketch - snapshot of the KB is a snapshot of your entire agent world, ever. - free reproducibility and benchmarking! ### Main benefit - splitting an agent out this way: - allows for pattern reuse - easily experiment with different paradigm - easily mocked environment for dev and debugging ---  ## Code interpreter example - `triggerEvent`: user input: "write program XYZ" - loop `codeInterpreter` - until `checkCorrectness` ---  ### Code interpreter (cont.) - `codeInterpreter` - `doLLMCall`: - user input: "write program XYZ" - optional: "here is the result of our previous attempt" - previous program - result of previous execution - previous reasoning history ---  ### Code interpreter (cont.) - `codeInterpreter` - `parseLLMResponse`: parse output into program and reasoning steps - `executeLLMResponse`: - update history in KB with reasoning steps - run program in sandbox environment - store program output in KB ---  ### Code interpreter (cont.) - `codeInterpreter` - `computeNextStep`: call the LLM to check the program output ---  ### Code interpreter (cont.) - `checkCorrectness` - `doLLMCall`: is this correct: - user input: create program XYZ - previous program - previous output - `parseLLMResponse`: program correct/program incorrect and why - `executeLLMResponse`: - update KB with program correctness output ---  ### Code interpreter (cont.) - `checkCorrectness` - `computeNextStep`: - if program correct: - `finish` - else - goto `loop` codeInterpreter ---  ## Code interpreter - `computeNextStep_writeProgram`: - user input + program + program output + reasoning + correctness evaluation - `parseLLMResponse_writeProgram`: - extract program + reasoning - `executeLLMResponse_writeProgram`: - run program - update KB with reasoning and program ---  - `computeNextStep_correctnessEvaluation`: - "verify this output" + user input + program + program output + reasoning - `parseLLMResponse_correctnessEvaluation`: - extract yes/no + correctness evaluation - `executeLLMResponse_correctnessEvaluation`: - update KB with answer - `computeNextStep_correctnessEvaluation`: - finish() or return computeNextStep_writeProgram() ---  ## Code interpreter - Easy to update: - Add RAG step before checkCorrectness - Parallelize generation and select best - Add RAG before writing - Add user feedback loop after writing the code - Easy to run: - A/B testing, observability, tracing, ... ---  ## Patterns - **Function calling**: - call the (async) tool, write the results in the KB (async) - **RAG**: - call the RAG system, write the results in the KB (async) - aka: function calling ---  ## Patterns - **Deterministic computation**: - call out to mathematica / solvers / search engine, write results in the KB - aka: function calling - **Memory**: - `f(KB, results)` to KB - aka: function calling - usually: - pure function (no side effects) - or async step (recursive) ---  ## Patterns - **Communication**: - `f(KB, KB.otheragents)` to KB - give agents access to parts of the KB written by other agents - aka: function calling, blackboard system - usually: - pure function (no side effects) - async step (recursive) ---  ## Patterns - **Agent coordination**: - `f(KB, KB.otheragents)` to KB - give agents access to parts of the KB written by other agents, compute next step - compute agent coordination based on KB, write actions into KB (blackboard system) - aka: function calling, blackboard system - usually: - pure function (no side effects) - async step (recursive) ---  ## Conclusion - long running single threaded python agents seems obvious - terrible choice for debugging / production - fundamental patterns of agents are well known - lots of literature - LLMs drastically simplify many parts - model most of your agents as async functions - serverless / event-driven makes it easy to deal with GPU compute constraints