<!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208154776194252890/slono_desolate_cybernetic_landscape_with_little_critter_robots__30bd5b82-3e5f-45c7-a1db-9f3974f3ff3f.png?ex=65e240d1&is=65cfcbd1&hm=eeafcb8c5fb560e38cfe1b670426f977701c4bdbba41f42da2ae1f9ad23acb42&" data-background-opacity="80%" --> # [[LLM Agents]] --- <!-- slide bg="https://media.discordapp.net/attachments/1002025510194389062/1208151257144102972/slono_desolate_cybernetic_landscape_with_little_critter_robots__0645ff58-8499-49bf-a8e7-e735a6f346a2.png?ex=65e23d8a&is=65cfc88a&hm=0e626aa1da37c31652df442aa12ca9e9899a0902ff2dbc5313237ed11b842c27&=&format=webp&quality=lossless&width=930&height=700" data-background-opacity="30%" --> ## About this talk - assume we use LLMs at the core of our agents - focus on the fundamentals - how can agents be implemented? - patterns, new and old - think about production systems - change the way we think about implementing agents --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208151244380835840/slono_desolate_cybernetic_landscape_with_little_critter_robots__1ae64a5f-4e61-4dba-bdec-f63bf5dc73ea.png?ex=65e23d87&is=65cfc887&hm=1c01c937e8586fac724cdf6e34e2dc8d2a63aeb8d4f0ea729bb746258eb8db97&" data-background-opacity="30%" --> ## Resources - Books - [Multiagent Systems](https://mitpress.mit.edu/9780262533874/multiagent-systems/) - [Artificial Intelligence: A Modern Approach, 4th US ed.](http://aima.cs.berkeley.edu) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208151759864864858/slono_desolate_cybernetic_landscape_with_little_critter_robots__4f46d3f6-f8b6-4740-9166-6cd6687e2973.png?ex=65e23e01&is=65cfc901&hm=27f1132b7e8298077da0bd91f460efd653f42c42e70549b27692834f18d45e1a&" data-background-opacity="30%" --> ## Resources - Codebases - [LangChain](https://www.langchain.com) - [dust.tt](https://github.com/dust-tt/dust) - [microsoft autogen](https://github.com/microsoft/autogen) - [agency-swarm](https://github.com/VRSEN/agency-swarm/blob/main/README.md) - [generative agents (smallvile)](https://github.com/joonspk-research/generative_agents) - [ai-town](https://github.com/a16z-infra/ai-town) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208151762800738315/slono_desolate_cybernetic_landscape_with_little_critter_robots__0f3867aa-9130-4874-be51-0bea42270682.png?ex=65e23e02&is=65cfc902&hm=36eb72583985301f47316a785b426629a870dbf15271964b44e0c2af9b28f843&" data-background-opacity="30%" --> ### Resources - Papers - [Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/pdf/2304.03442.pdf) - [AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation](https://arxiv.org/abs/2308.08155) - [Encouraging Divergent Thinking through Multiagent Debate](https://www.semanticscholar.org/paper/Encouraging-Divergent-Thinking-in-Large-Language-Liang-He/385c74957858e7d6856d48e72b5a902b4c1aa28c) - [LLMs as Tool Makers](https://www.semanticscholar.org/paper/Large-Language-Models-as-Tool-Makers-Cai-Wang/32dcd0887537cece54e214f531d2c384470b023f) - [Survey on LLM Autonomous Agents](https://www.semanticscholar.org/paper/A-Survey-on-Large-Language-Model-based-Autonomous-Wang-Ma/28c6ac721f54544162865f41c5692e70d61bccab) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208152216410525727/slono_desolate_cybernetic_landscape_with_little_critter_robots__dfb4a8c6-f1d2-4883-80d4-c86718d4e0fc.png?ex=65e23e6e&is=65cfc96e&hm=0bf7709319c2d9c6ddf1d70c9a3583ffb7edc3c3aa93f5145640c86548dcc93f&" data-background-opacity="30%" --> ### Resources - Patterns - [Function Calling with LLMs](https://www.promptingguide.ai/applications/function_calling) - [Finite-state machine - Wikipedia](https://en.wikipedia.org/wiki/Finite-state_machine) - [From Callback to Future to Monad](https://medium.com/hackernoon/from-callback-to-future-functor-monad-6c86d9c16cb5) - [Blackboard system - Wikipedia](https://en.wikipedia.org/wiki/Blackboard_system) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208152219820752906/slono_desolate_cybernetic_landscape_with_little_critter_robots__dcd462c6-5128-43ae-926a-a57cadf078ab.png?ex=65e23e6f&is=65cfc96f&hm=d49fad0506e2f111dca4a5e652d9a762dc96df1a8d4f2c331fcb5f0badda7f91&" data-background-opacity="30%" --> ## What is an agent? ![[AI in Action - Agents-1708106592551.jpeg|600]] - From Russell & Norvig: - "anything that views the environment through sensors and acts through actuators" --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208152224530698270/slono_desolate_cybernetic_landscape_with_little_critter_robots__14df45bf-b302-4529-89a0-d57532a94902.png?ex=65e23e70&is=65cfc970&hm=c7e1fbe65df38d155ba1b51b996b5903c1e56e1395de666d89e67d045b6d684d&" data-background-opacity="30%" --> ## What is an agent? - Further: - reasoning over a representation of the world (knowledge-based) - multiple agents that communicate with each other --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208152227093418004/slono_desolate_cybernetic_landscape_with_little_critter_robots__d189a103-fa7e-4e57-948f-c5ec64ed5540.png?ex=65e23e71&is=65cfc971&hm=bf53570df0acd2452baf4595ce941922bd0c4f671ee32c675959e0be37d51b26&" data-background-opacity="30%" --> ## Russel & Norvig - Part III: Knowledge, reasoning and planning - Part IV: Decision making - Chapter 10, 11: Knowledge representation, automated planning - Chapter 14, 17: Simple decisions, multiagent decision making - Part II: search - LLM killer feature: heuristics Patterns important, actual methods less so, since LLMs are so good. --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208152236207771739/slono_desolate_cybernetic_landscape_with_little_critter_robots__382d7733-193b-4c3b-92df-57754cd5d5c5.png?ex=65e23e73&is=65cfc973&hm=a1f41bc809eb3654455425f3a390e46eb67dba78bf63fc00e37a47d5789563f1&" data-background-opacity="30%" --> ## Fundamentals with LLMs - environment perception - everything is in the prompt - knowledge: - world representation - special cases of world representation: - agent inner state - communication with other agents (other agents inner state) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208152247255703712/slono_desolate_cybernetic_landscape_with_little_critter_robots__3599d9a7-4d21-44fb-a5fe-d3b78c006a35.png?ex=65e23e76&is=65cfc976&hm=296da7df9b1c0930ee53cba80b0c8cd26cbd93eff2fe5b98e1ee365ffaa8b7ac&" data-background-opacity="30%" --> ## Fundamentals with LLMs - reasoning step - call out to the LLM - trivial conceptually, tempting to implement trivially - at scale, different beast - acting upon the world - parse LLM response and do tool calls - update the world representation and agent inner state --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208152252439597076/slono_desolate_cybernetic_landscape_with_little_critter_robots__5ef8e2a0-7d34-41db-af63-635a7eb37779.png?ex=65e23e77&is=65cfc977&hm=17356f270bd144bdb5849b2a07cbfbf244dcbb37b14cb94732f230a5d74329be&" data-background-opacity="30%" --> ## So, what is an agent? - collection of tools - usually, async functions - constructing world representation from KB - a set of "selectors" and transformation rules - chunking, summarization, structured data extraction, prompt templates --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153048552054864/slono_desolate_cybernetic_landscape_with_little_critter_robots__6b22732f-da50-4a41-89d4-30e4cd5ee1e7.png?ex=65e23f35&is=65cfca35&hm=e8b80714f22606dad549bea53524062bfc4b123f56a64c9c0c38693e3acca2f5&" data-background-opacity="30%" --> ## What is an agent? - reasoning backend "router" - call to an LLM, to a more traditional reasoning engine - this is really just another "tool" though - updating world and knowledge of the world by parsing backend answer - a set of "selectors" --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153060887629854/slono_desolate_cybernetic_landscape_with_little_critter_robots__0b0b1995-ff59-4ca6-b8ee-9de4fe156071.png?ex=65e23f38&is=65cfca38&hm=918d8e03ffbfb7c0885c40b8e0e1edf58225d1cb8ff6aea95c6e436a2664ce8e&" data-background-opacity="30%" --> ## What is an agent? - a "meta-plan" that allows us to select what next step to execute - traditional program (think python code) - state machine - monadic representation - triggers that tell us when to execute the next step - user input, event trigger, HTTP call, etc... - KB update: event driven blackboard system --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153069678891048/slono_desolate_cybernetic_landscape_with_little_critter_robots__f1ae56e0-5179-48ce-95ba-706374ea7275.png?ex=65e23f3a&is=65cfca3a&hm=aa0e63ad16b523d2c67a50085003851f29ff3722838b6946cf7060b00ee315da&" data-background-opacity="30%" --> ## Multiagent systems ![[AI in Action - Agents-1708109256247.jpeg|600]] --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153507933331497/slono_desolate_cybernetic_landscape_with_little_critter_robots__8849ecca-2049-4ecf-9a4e-523c8e9b69f1.png?ex=65e23fa2&is=65cfcaa2&hm=56d54ec2f96d3dd5df71d0e2da1e2ca9fd4326db4bd5f1322dc5c13a5061d7ca&" data-background-opacity="30%" --> ## Multiagents - two key components - communication / shared state - decision making - which agent does what - synchronization - when does which agent do what --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153511032787076/slono_desolate_cybernetic_landscape_with_little_critter_robots__047394e4-1030-47da-82d7-89938e0eb893.png?ex=65e23fa3&is=65cfcaa3&hm=f28487e9eb1d825c6728f93b1f42f208946d55de6c1169a30ae8d0491e218e44&" data-background-opacity="30%" --> ## Implement for production - LLM calls: - expensive - slow - resource contention - probabilistic We should base our engineering around this really problematic technical constraint, not about how our brains conceptualize agents. --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153515437072425/slono_desolate_cybernetic_landscape_with_little_critter_robots__435b42ee-a70f-48e6-950c-a18b425b20e7.png?ex=65e23fa4&is=65cfcaa4&hm=ceb4ea219633db93d4ee947df8bce611e086ef9d5c40f1baaa3990ed7bba927a&" data-background-opacity="30%" --> ## Implement for production The hard part of building LLM based production systems is about how to chain a series of slow HTTP calls: - we know how to do that very well! that's all HTTP apps out there. - one long running python application per client is not how you scale. - microservices, event driven systems: - easily scale up or down and leverage constrained resources - observability and tracing - debuggability and simplified development --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153520872890389/slono_desolate_cybernetic_landscape_with_little_critter_robots__004be441-d193-41d0-b250-35fd10f9306c.png?ex=65e23fa5&is=65cfcaa5&hm=e4a3dfaeaca51b83f5ad085b0346440819b35526d8c841df36218ab7b12a2798&" data-background-opacity="30%" --> ## Implement for production We need to think about how "agents" as an autonomous "thing" that executes a series of steps can be converted to "exploded" event architectures. --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153572521541723/slono_desolate_cybernetic_landscape_with_little_critter_robots__2eefab83-b4e8-407f-b7f3-132fe28370a9.png?ex=65e23fb2&is=65cfcab2&hm=e7d56dd327277cbed0b5eee9bd1e1537603e0a371784ecb9dae151e8eec7a11a&" data-background-opacity="30%" --> ## Agent sketch ```typescript global KB; async function doLLMCall(input: Prompt): Promise<string> {} async function parseLLMResponse(kb: KB, input: string) Promise<StructuredData> {} async function executeLLMResponse(kb: KB, tools: Tools, input: StructuredData) Promise<{updatedKB: KB}> {} async function computeNextStep(kb: KB): Promise<{nextStep: Prompt, updatedKB: KB}> {} ``` Decomposing it this way shows that: - an agent is actually a set of independent functions --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153577269231667/slono_desolate_cybernetic_landscape_with_little_critter_robots__fcde5491-272e-4aa6-a7a6-1b6869ba47af.png?ex=65e23fb3&is=65cfcab3&hm=1b6a15b63bda2332538c7887f87f93828b8cbf86379f15a10226f6535bd82b13&" data-background-opacity="30%" --> ## Agent sketch If you squint at the types: - multi agent systems have the exact same signature, except that nextStep is `[]Prompt`. - in fact, nothing keeps us from running the inference for multiple agents in a single LLM call! - this also bypasses the RLHF, which is usually beneficial --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153582877155430/slono_desolate_cybernetic_landscape_with_little_critter_robots__de3d4cbe-afe9-4689-b898-355d3b1d7b71.png?ex=65e23fb4&is=65cfcab4&hm=0fa1132812b81211d362b34c265b17d4fa9da4dd5a1b40a270d671563b815f8b&" data-background-opacity="30%" --> ## Agent sketch This impacts real-life implementation: - error handling - priority routing / retries / rate limiting - observability - debuggability (can be rerun entirely separately) - scalability (lambdas) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153588971347968/slono_desolate_cybernetic_landscape_with_little_critter_robots__1f295442-cde7-4721-92db-6e5842eae47f.png?ex=65e23fb6&is=65cfcab6&hm=f97f99cb59daad31e833c59f197fa95b1f0b30116b2b4f02fcceedfc606b3af4&" data-background-opacity="30%" --> ## Agent sketch - snapshot of the KB is a snapshot of your entire agent world, ever. - free reproducibility and benchmarking! ### Main benefit - splitting an agent out this way: - allows for pattern reuse - easily experiment with different paradigm - easily mocked environment for dev and debugging --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153840470200320/slono_desolate_cybernetic_landscape_with_little_critter_robots__6294175f-5719-4497-81fa-67c97c6369ed.png?ex=65e23ff1&is=65cfcaf1&hm=05c135b78db11acc33ebdabfa95b0d19cdfd715b86b81576351c6e5619ba9b3c&" data-background-opacity="30%" --> ## Code interpreter example - `triggerEvent`: user input: "write program XYZ" - loop `codeInterpreter` - until `checkCorrectness` --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153844236820500/slono_desolate_cybernetic_landscape_with_little_critter_robots__094fb8aa-ceea-4755-a8fc-46da40d339cd.png?ex=65e23ff2&is=65cfcaf2&hm=e052a5dd81e691cd9c309c011ea760d8c9d4c804b22e9740c6dfda98ec451a71&" data-background-opacity="30%" --> ### Code interpreter (cont.) - `codeInterpreter` - `doLLMCall`: - user input: "write program XYZ" - optional: "here is the result of our previous attempt" - previous program - result of previous execution - previous reasoning history --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153849106530324/slono_desolate_cybernetic_landscape_with_little_critter_robots__4ec06919-5825-42a2-b22b-f8998952c231.png?ex=65e23ff4&is=65cfcaf4&hm=7b2b996a3fec096454cc71751d6d6f42058d6202a1b387efd922a148f6e1e820&" data-background-opacity="30%" --> ### Code interpreter (cont.) - `codeInterpreter` - `parseLLMResponse`: parse output into program and reasoning steps - `executeLLMResponse`: - update history in KB with reasoning steps - run program in sandbox environment - store program output in KB --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153858790924319/slono_desolate_cybernetic_landscape_with_little_critter_robots__7773fb3b-6d69-4415-a598-4aeccc1dea57.png?ex=65e23ff6&is=65cfcaf6&hm=dbf17285b50eda478d0946a8f413066f4cb5565ae5a62a6f014dc2153279e9b5&" data-background-opacity="30%" --> ### Code interpreter (cont.) - `codeInterpreter` - `computeNextStep`: call the LLM to check the program output --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153861559418962/slono_desolate_cybernetic_landscape_with_little_critter_robots__a7eec2f4-c7d4-4d41-9818-704669feea86.png?ex=65e23ff7&is=65cfcaf7&hm=fdef2a4b05a34bf16bfce2e921178c4193b2e0c2fb93775e1e1ce3fa6ab3e105&" data-background-opacity="30%" --> ### Code interpreter (cont.) - `checkCorrectness` - `doLLMCall`: is this correct: - user input: create program XYZ - previous program - previous output - `parseLLMResponse`: program correct/program incorrect and why - `executeLLMResponse`: - update KB with program correctness output --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153866017706005/slono_desolate_cybernetic_landscape_with_little_critter_robots__25a9e380-d747-4163-b089-b5cdaa1a2e78.png?ex=65e23ff8&is=65cfcaf8&hm=f0a2837d56ead92eb1a3e58722799107d29d44d9628e967cac6e315495228455&" data-background-opacity="30%" --> ### Code interpreter (cont.) - `checkCorrectness` - `computeNextStep`: - if program correct: - `finish` - else - goto `loop` codeInterpreter --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153871286013952/slono_desolate_cybernetic_landscape_with_little_critter_robots__997d0cfa-9fac-484d-928f-484f80d05f5d.png?ex=65e23ff9&is=65cfcaf9&hm=74d420ad0ca1101a89fe3837c6708fb2cf1e5cb1d14eca1ae3e78ce08623a1bc&" data-background-opacity="30%" --> ## Code interpreter - `computeNextStep_writeProgram`: - user input + program + program output + reasoning + correctness evaluation - `parseLLMResponse_writeProgram`: - extract program + reasoning - `executeLLMResponse_writeProgram`: - run program - update KB with reasoning and program --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208153878051160084/slono_desolate_cybernetic_landscape_with_little_critter_robots__0deb940d-1ab6-4cb0-bbf5-8d688a263015.png?ex=65e23ffa&is=65cfcafa&hm=e890122ad089d85ffe1beeffc94b6154831d39b65383f1c18d1064e36435d2ce&" data-background-opacity="30%" --> - `computeNextStep_correctnessEvaluation`: - "verify this output" + user input + program + program output + reasoning - `parseLLMResponse_correctnessEvaluation`: - extract yes/no + correctness evaluation - `executeLLMResponse_correctnessEvaluation`: - update KB with answer - `computeNextStep_correctnessEvaluation`: - finish() or return computeNextStep_writeProgram() --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208154610146091058/slono_desolate_cybernetic_landscape_with_little_critter_robots__a1db88f8-3aaa-4dec-8efd-8f7abb640237.png?ex=65e240a9&is=65cfcba9&hm=21cccbe06b05e0a5820d8012004987499638f4bfd2a7aa00900e822f66ef718c&" data-background-opacity="30%" --> ## Code interpreter - Easy to update: - Add RAG step before checkCorrectness - Parallelize generation and select best - Add RAG before writing - Add user feedback loop after writing the code - Easy to run: - A/B testing, observability, tracing, ... --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208154632312856608/slono_desolate_cybernetic_landscape_with_little_critter_robots__d4c5a2a9-54f0-42bf-b00c-f5f7aa4a59a4.png?ex=65e240ae&is=65cfcbae&hm=336f855f0826a22826694d15bc62cc35e729f51ac705693e423934a2ac2639b4&" data-background-opacity="30%" --> ## Patterns - **Function calling**: - call the (async) tool, write the results in the KB (async) - **RAG**: - call the RAG system, write the results in the KB (async) - aka: function calling --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208154637757321297/slono_desolate_cybernetic_landscape_with_little_critter_robots__9cf8c43b-bfca-4339-b83d-1c51f2a47b5e.png?ex=65e240b0&is=65cfcbb0&hm=9f5264a2fbf7baab1c13246b5cecc8b0b4f270121dd03e8d1d6a6d627abf80f3&" data-background-opacity="30%" --> ## Patterns - **Deterministic computation**: - call out to mathematica / solvers / search engine, write results in the KB - aka: function calling - **Memory**: - `f(KB, results)` to KB - aka: function calling - usually: - pure function (no side effects) - or async step (recursive) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208154643130224650/slono_desolate_cybernetic_landscape_with_little_critter_robots__3c1ca76a-8d8f-4960-a5f3-ff50c7603999.png?ex=65e240b1&is=65cfcbb1&hm=127e29d46804df2f3725081303bc54d677436c538624c11fd6769b4c51bada5b&" data-background-opacity="30%" --> ## Patterns - **Communication**: - `f(KB, KB.otheragents)` to KB - give agents access to parts of the KB written by other agents - aka: function calling, blackboard system - usually: - pure function (no side effects) - async step (recursive) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208154763787767808/slono_desolate_cybernetic_landscape_with_little_critter_robots__24621ecb-ad23-479e-8beb-986fd9c62c44.png?ex=65e240ce&is=65cfcbce&hm=9109771207eb71c41decd0f3853e67d458ceccb34cdc821f87bc0e6cc9ee1eab&" data-background-opacity="30%" --> ## Patterns - **Agent coordination**: - `f(KB, KB.otheragents)` to KB - give agents access to parts of the KB written by other agents, compute next step - compute agent coordination based on KB, write actions into KB (blackboard system) - aka: function calling, blackboard system - usually: - pure function (no side effects) - async step (recursive) --- <!-- slide bg="https://cdn.discordapp.com/attachments/1002025510194389062/1208154767885467719/slono_desolate_cybernetic_landscape_with_little_critter_robots__b49dc525-68e0-449e-8d05-4724a087395c.png?ex=65e240cf&is=65cfcbcf&hm=ac0a1dcb07e2e42fa2f5c5523ed28089205a70f634eae7ec4413545662fcdc28&" data-background-opacity="30%" --> ## Conclusion - long running single threaded python agents seems obvious - terrible choice for debugging / production - fundamental patterns of agents are well known - lots of literature - LLMs drastically simplify many parts - model most of your agents as async functions - serverless / event-driven makes it easy to deal with GPU compute constraints