related: - [[LLM based Agents - detailed]] - [[LLM based Agents - concise]] - [[LLM are Greedy Agents - first draft]] - [[LLM are Greedy Agents - Claude final version]] - [[AI Agents - how LLM can grok the real world]] - [[LLM are Greedy Agents - Original chat with Claude]] - [[LLM - Detailed dive into LLM w Andrej Karpathy]] - [[LLM are Greedy Agents - Grok version]] --- share_link: https://share.note.sx/h0fymfvb#waoMuGDMcsMZPRxeBFw/QtY12MB30+5E6xrY50NNtn4 share_updated: 2025-09-17T15:37:24+09:00 --- 2025-04-13 gemini chatgpt # The Rise and Potential of Large Language Model Based Agents: A Survey #### Summary (3 sentences) This paper surveys the development of AI agents, with a focus on those built using large language models (LLMs). It introduces a general Brain–Perception–Action framework, reviews applications in single/multi-agent and human-agent scenarios, and explores agent societies. The paper also identifies key challenges and future directions, including the potential role of LLM-based agents in achieving Artificial General Intelligence (AGI). --- ### Detailed Summary The paper provides a comprehensive review of AI agents centered around large language models (LLMs). It traces the evolution of the agent concept from its philosophical roots to its contemporary instantiations in AI. The authors propose a structured framework composed of three components: the brain (LLM), perception modules (sensory input like text, vision, or sound), and action modules (physical, virtual, and verbal actions). This triadic model forms the architecture for LLM-based agents across a wide spectrum of applications. Applications include single-agent systems for task execution, multi-agent setups involving cooperation and competition, and scenarios where agents assist or partner with humans. The authors also explore the concept of agent societies—complex systems of interacting agents capable of exhibiting social behaviors, personalities, and even emergent cultural phenomena. Finally, the paper outlines unresolved issues such as evaluation difficulties, risks like bias or hallucination, and resource demands. It emphasizes LLM agents’ potential to evolve into AGI and recommends future research on embodiment, long-term planning, and robust alignment. --- ### Detailed Outline #### 1. Introduction - Evolution of AI agents - LLMs as the foundation for agents - The Brain–Perception–Action framework - Survey scope: single-agent, multi-agent, and human-agent systems - Emergence of agent societies - Overview of open problems #### 2. Background - **2.1 Philosophical Origins** - Debates on artificial agency - Agent as an autonomous entity - **2.2 Technological Trends** - Symbolic → reactive → RL → transfer/meta-learning → LLMs - **2.3 Why LLMs as Brains?** - Autonomy, reactivity, proactivity, social abilities #### 3. The Birth of an Agent - **3.1 Brain (LLM Core)** - **3.1.1 Natural Language** - High-quality generation, comprehension - **3.1.2 Knowledge** - Pretraining, commonsense/actionable knowledge, hallucination - **3.1.3 Memory** - Token limits, summarization, compression, retrieval methods - **3.1.4 Reasoning & Planning** - **3.1.5 Transferability & Generalization** - **3.2 Perception** - **3.2.1 Text** - **3.2.2 Vision** - **3.2.3 Audio** - **3.2.4 Multimodal Fusion** - **3.3 Action** - **3.3.1 Physical (robots)** - **3.3.2 Virtual (web, simulation)** - **3.3.3 Verbal (dialogue, instruction)** #### 4. Application Scenarios - **4.1 Single-Agent** - Task execution, exploration - **4.2 Multi-Agent** - Competitive and collaborative dynamics - **4.3 Human-Agent Cooperation** - Assistant roles - Creative and social partnerships #### 5. Agent Societies - **5.1 Behavior** - Reflexive, reasoned, emotional responses - **5.2 Personality** - Definition and synthesis strategies - **5.3 Social Phenomena** - Relationship formation, influence, culture - **5.4 Insights for Human Society** - Mirroring and analyzing human social dynamics #### 6. Open Problems & Future Work - **6.1 Evaluation** - Metrics and benchmarks - **6.2 Risks** - Misuse, bias, hallucination - **6.3 Future Directions** - Embodiment - Long-term planning - Scalable agent models - AGI pathways #### 7. Conclusion --- ### Genius - Framing LLMs as "brains" in agent architectures. - Adapting Brain–Perception–Action to LLMs. - Exploring agent societies and emergent behaviors. - Connecting LLM-based agents to the long-term goal of AGI. --- ### Interesting - NLP and agent architecture convergence. - Agent societies as a new research paradigm. - Philosophical roots of agency revisited through AI. - Human-agent collaboration as a major use case. - Ethical tensions, embodiment, and grounded cognition. --- ### Surprising - LLMs trained on language can reason and plan. - Social behavior emerges without consciousness. - The philosophical agent concept was once neglected. - Human-like thinking patterns in LLM agents. - Speed of LLM-to-agent transition. --- ### Significant - Timely and relevant survey of a fast-moving field. - Broad and deep coverage of agent capabilities. - Introduced a versatile framework for agent architecture. - Connected micro (individual agent) and macro (society) views. - Opened explicit discussion around AGI potential. --- ### Paradoxical - Language-only training enables non-linguistic intelligence. - Simulated agents showing social complexity without awareness. - Agents being autonomous yet requiring safety constraints. - Emergent unpredictability from deterministic systems. --- ### Well Done - Clear structure and visual aids. - Strong conceptual framing with intuitive triad. - Comprehensive treatment of history, state-of-the-art, and future. - Honest discussion of risks and unknowns. - Extensive and curated references. --- ### Could Be Improved - More concrete evaluation criteria and benchmarks. - Deeper dive into long-term planning challenges. - Expanded safety/reliability strategies. - Practical examples of implementations. - Analysis of computational bottlenecks. - More robust discussion on hallucination mitigation. --- ### Key Takeaways - LLMs are now central candidates for building intelligent agents. - A triadic framework (Brain–Perception–Action) offers modular design. - Agent societies are a powerful metaphor and experimental testbed. - The field must address evaluation, risk, and infrastructure challenges. - LLM-based agents might serve as precursors to AGI. --- Would you like this exported as a `.md` download for Obsidian?