Malleable Software Development with LLMs - Philippe Page

**Hierarchical Task Decomposition and Retrieval-Augmented Code Generation - A Tree-Based Approach to Malleable Software Development with LLMs** Bridging Human-AI Collaboration for Conversational, Self-Adapting Software Systems This process factors the product development workflow into a chained system that can co-develop software with end users. It allows users to build and modify applications by generating and editing individual components, which enables the software to evolve and scale according to their needs. It’s really a system of pretty standard technologies woven together in such a way that React projects can be created and edited conversationally. It’s driven by a few approaches that allow LLMs like gpt-4o or Claude Haiku to produce functional software that can even leverage external libraries and call third-party APIs. It's about putting more power in the hands of end users by allowing them to build and customize software conversationally. While it's a small proof of concept with its own limitations, its aim is to demonstrate the value and potential of malleable software, an emergent field rooted in an early philosophy of computing. ![[chat_edit_only.mp4]] *Requesting visual design changes to a basic chat-app* **Contexts** Up until around the 1940s, hardgoods were mostly tin, steel, iron, wood, ceramic or glass. Plastic, a new material at the time named for its flexibility in application, decreased the cost of parts to the point where many secondary products became more widely available, and eventually, hyperabundant. As LLMs and agentic architectures become more proficient in software development, the actual cost of software will diminish as the main input cost, dev-hours, becomes tokens-used. Diminished cost of a material tends to encourage the hyperavailability of its subsequent products. Seeing software as a material, this new material abundance opens up a ton of opportunity. **Malleable software** If software was free to develop, it wouldn’t require specialized teams to do top-down design. It wouldn’t need to be designed by a few people and then mass distributed. It could be custom tailored to you, but more importantly it could be constructed and edited by you, without needing to write all the code yourself. What if we could craft our own software, and shape it as we use it? This brings us to the emerging field of Malleable Software. While I recommend reading the original works, in a brief and lossy summary it essentially means developing software that allows end users to modify, adapt, and reconfigure it at a granular level, giving them the ability to shape the software they use, as they use it. Philip Tchernavskij (Human-Computer Interaction (cs.HC), Université Paris Saclay (COmUE), 2019), formalizes the term and demonstrates its use in his PhD thesis [Designing and Programming Malleable Software](https://www.researchgate.net/publication/341553733_Designing_and_Programming_Malleable_Software). “*Malleable software aims to increase the power of existing adaptation behaviors by allowing users to pull apart and re-combine their interfaces at the granularity of individual UI elements ... the goal is to erase the boundaries between apps and create an end-user accessible “physics of interfaces” that dictate how different interfaces and documents can be assembled.*” [Tchernavskij,p.64](https://www.researchgate.net/publication/341553733_Designing_and_Programming_Malleable_Software) The [Malleable Systems Collective](https://malleable.systems/), led by [J. Ryan Stinnett](https://convolv.es/), lists projects in the field and a few notable people advancing it [here](https://malleable.systems/catalog/). They offer a few guiding principles on their website: - Software must be as easy to change as it is to use it - All layers, from the user interface through functionality to the data within, must support arbitrary recombination and reuse in new environments - Tools should strive to be easy to begin working with but still have lots of open-ended potential - People of all experience levels must be able to retain ownership and control - Recombined workflows and experiences must be freely sharable with others - Modifying a system should happen in the context of use, rather than through some separate development toolchain and skill set - Computing should be a thoughtfully crafted, fun, and empowering experience [Geoffrey Litt](https://www.geoffreylitt.com/), a PhD graduate from MIT CSAIL in the Software Design Group and Senior Researcher at [Ink and Switch](https://www.inkandswitch.com/), focuses on the intersection of AI and malleable software, exploring how LLMs can enable people to create their own custom software, writing in great depth on the intersectional opportunities between LLMs and Malleable Systems in [Malleable software in the age of LLMs](https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming). "*LLMs will represent a step change in tool support for end-user programming: the ability of normal people to fully harness the general power of computers without resorting to the complexity of normal programming.*” The concept of end-user programming in general goes back to the 60's, with its rich history detailed in [End User Programming](https://www.inkandswitch.com/end-user-programming/) by Szymon Kaliski, Adam Wiggins, James Lindenbaum, at Ink and Switch, March 2019. **Proof-of-Concept** It’s really a system of pretty standard technologies woven together in such a way that React projects can be created and edited conversationally. It’s driven by a few approaches that allow LLMs like gpt-4o or Claude Haiku to produce functional software that can even leverage external libraries and call third-party APIs. **Composability** While the [[Felix plugin]] I made can generate code for a selected design in Figma, there are limitations to that approach. It can produce single-page-apps and individual React components, but coherence can be lost when mental load is too high, so it can't be asked for everything all at once. It just means there's an approximate limit to the size and complexity of the interfaces that can be produced in one response. I find that there's an approximate limit to the size and complexity of the interfaces that can be produced in one response. Composability is about breaking larger things into smaller elements that can be recomposed. In software development, it translates to building apps with modular, interchangeable, and focused components that can be combined, assembled, and reconfigured. It’s a model or approach that helps designers and developers manage the complexity of building an app. In this case, the goal is to build a script that represents a design and development team that can, with the user, co-develop a react-app conversationally. React is a fair choice, with a focus on modular components that are assembled; multiple smaller files, rather than one monolithic single file. **Chains & Pipelines** A chain is a sequence of calls or operations where the outputs of one feed into the inputs of the next, and subsequent calls wait for their priors to finish before they get started. Implementing one involves breaking a larger task, like developing an app, into a sequence of many smaller subtasks. Minimizing mental load and the information available to the most granular and relevant possible can help improve the quality of the final output. Anthropic [walkthrough here](https://docs.anthropic.com/en/docs/chain-prompts). Sequentializing and codifying software design and development, an otherwise recursive and parallel series of tasks, is a challenge. But building software generally follows a sequence, and while work is often done in parallel, generally speaking the process flows in a linear path where downstream work like development relies on upstream work like research and design that has been completed prior. This script progresses from one task to another, using the output of previous tasks as input for subsequent tasks, as it works through each step one at a time until the entire codebase is created according to the user's request. **Actions and Tools** The output of some tasks are fed directly into a function, so while the LLM doesn’t actually determine *which* tool to use, it’s still able to decide *what* exactly to return into the function based on the given context. Whether it’s editing, saving files, generating search queries, or determining the directory structure, these functions give the LLM the ability to search its knowledgebase, and read and write files to a local file directory. The rigidity of the task flow reduces the risk of recursive deviation from the narrow set of expectations we have for the result. The "product designer" and "developer" determine their own tasks and then carry them out, so it's more of a hybrid of an agentic system and a chain. **Hierarchical Task Decomposition** The process involves breaking down tasks into smaller, manageable subtasks, similar to how a tree structure branches out from a root node to leaves. “*A major feature of components is the ability to compose components of other components. As we nest components, we have the concept of parent and child components, where each parent component may itself be a child of another component.*” - [Your UI as a tree](https://react.dev/learn/understanding-your-ui-as-a-tree). Central to the approach is the the systems self-guided process of determination, decomposition, and sorting of tasks. The overall plan outlines the project structure and functionality of the files, which is then translated into a structured representation of the directory structure to create. This representation of the files to create is then used to set the subsequent tasks to recursively think, recall, and program each file. **Knowledge Curation** Developer docs are essentially codified and curated information on how to create, maintain and integrate software with libraries, SDKs or APIs. It's important when building a project to check if there's any updated docs or changes to the SDK/API since you've last researched and updated your mental model. The same is true for LLMs: they may have trained on project documentation, but the precision of the memory and understanding can be unstable and if the project it's depending on or integrating with has changed since training, it won't be able to program it directly from intuition alone. Older, more well-known and well-documented projects will be more likely to be represented in the training data and weights of the LLM. But novel frameworks, those with recently updated documentation, or projects less reflected in the training corpus benefit most from additional ((updated and relevant)) information. The answer to that challenge here is to search, scrape, clean, and chunk the most recent documentation for any necessary project integrations, API calls, or libraries ((as a separate step of building knowledge)). The domain of the documentation becomes a filterable metadata tag, and the embeddings of the text/code content the method for similarity search. This way, the LLM-developers can search and recall existing snippets of text and code from the dev-docs based on whatever they have to program. Creating a knowledge base for the generation of code ends up being an interesting mirroring of the typical dev process, and brings us to retrieval augmented code generation. **Retrieval Augmented Code Generation: Developer Documentation as a Source of Truth** RAG is often associated with the use of vector databases for memory in chatbots, but it’s really about the tech-stack agnostic process of bringing in information to ‘augment’ or otherwise enhance the quality, reliability, or accuracy of any type of response from a language model. We know RAG is also applicable to so much more than just chat memory, and that it can help language models do Q&A, retrieve relevant facts, and cite their sources, so the natural progression is to use official developer documentation as an additional grounding source of truth. Retrieval-augmented code generation is a few-part problem. To program an app that depends on recent knowledge that is not likely to be perfectly known to a naive LLM, there needs to be some form of self-determined search and retrieval. Search means we need queries, which means we need to generate thoughts about the request on which to base those queries. Thus, the retrieval process becomes think>generate_queries>get_embeddings>query_vectorstore. This is the basic retrieval process used, and it returns relevant snippets of the curated developer doc knowledge base. The relevant knowledge can then be used to plan further and program files based on up to date developer docs. ![[HTD_RACG.png|Malleable Systems Graph]] **Implementation** The modules are currently grouped into scripts/modules `LLM`, `Study`, `Chat`, `Build`, & `Edit`. **LLM** LLM script houses the API calls to different language models, currently supports any model provided by Anthropic, OpenAI, TogetherAI, and Openrouter. **Study** The study script is responsible for scraping websites, processing the content, and storing it in a Pinecone vector database. This script serves as a knowledge acquisition component, allowing the system to learn and expand its knowledge base. It enables the system to acquire knowledge from websites, store it in a structured manner, and make it available for retrieval and use. It either recursively scrapes the webpage content of a given set of urls, or allows you to navigate the browser yourself to train the knowledge base on the pages you visit, whether general guidance, how-to's, or official developer documentation. For each URL visited, parses the site content, collects the text and code snippets, groups/chunks them based on the site architecture, generates embeddings, and upserts them with the text/code content to a vector database. Over time, the agent knowledgebase grows and shapes itself to your needs. While web search is available to switch out the local database for, it can actually introduce risks in terms of occasionally retrieving out of date or unwanted information from the internet. This careful information curation process helps steer and improve the reliability of the final outcome. **Chat** The chat.py script serves as a conversational interface for users to discuss their current directory or the external knowledge base and learn how to code or tweak the code themselves if they wanted to. The chat.py script integrates with the Pinecone database to retrieve relevant information based on user queries. It uses an LLM to generate responses, provide explanations, and help users in understanding their codebase or the knowledge acquired through study.py. **Build** Build is the module that creates the React app based on the user request. The process begins with the product designer thinking about the request and articulating a few major pillars of design process in order of the typical design process. While it’s a rough representation of the human-centric design process, it considers the user’s needs, the ideal user experience, the information architecture of the features, defines the user interface and its elements, and notes any additional information that might be required for the development process. These designer’s thoughts are then given to the developer and used to retrieve knowledge from the knowledge based on the fact that it will have to develop the app, and any aforementioned needs for additional information. The knowledgebase can contain docs on API integrations or even react components from an SDK with a pre-existing design system. In this case, it searches the components it needs, any information necessary for API integration, and general best practices in the field. The developer then sets off given the user input, designer thoughts, and retrieved knowledge. The lead dev plans the overall project, formalizes the directory and component architecture, and lays out the files to create. This high-level blueprint details and informs and aligns subsequent file-specific processes. Next we ask for an LLM to return a JSON of the directory structure based on the plan. This JSON is then used to create the directories and set up the recursive programming of the individual files. Now that the directory structure is formalized, files selected, and the overall plan is established, the developer then loops through each file starting with the smallest child components, retrieves relevant knowledge for programming that specific file, then programs it. Once it has saved all of the files, you can read the readme file and run the react app to test it. **Edit** Edit is the module that allows users to request changes or extensions to their app. In essence it retrieves the current directory structure of the software’s current state in order to then edit existing files and add new ones to accomplish the users request for changes. In edit.py, the current directory structure and contents are provided to the product designer and developer as well as the user input requirements. One difference from build.py is the way it selects files to edit or add. The LLM is asked to return a JSON with two lists: one for existing files that require updates and another for new files, if any, that need to be created. This step is crucial for transitioning from planning to execution. By identifying specific files, it sets up the task environment for each file to be processed individually, following the requirements and direction outlined in the overall plan. It then can iterate through the files given the context in a process similar to the build script. The LLM then responds with the updated code, and each file is saved to the appropriate folder. **Design Systems and Component Libraries** These are pre-existing in developer docs but many companies have internal only design systems and libraries that can be leveraged in the build process. Internal documentation, principles, code snippets and examples can all be uploaded to the knowledgebase for the design and development chain to use, enabling brand-specific rapid prototyping tools. Here is a demo of a script tuned for using the Polaris design system. I asked for a simple weather app that talks to weatherapi.com and displays the weather of a given location. While I had to paste my API key in and paste a set of errors into the edit script, it actually managed to build a weather app with some basic Polaris system components. ![[final_polaris_weather_app.mp4]] What I find surprising here is that it was able to build an app, use a design system and component library, and integrate with a third-party API in essentially just two steps. One challenge was overcoming the models inclination to use deprecated components, likely due to an update to the library components after the training date of the LLM. This made it even more important to curate up-to-date documentation and filter information related to deprecated components. ![[polaris_imports_1.png]] Editing your app built with a given design system is a similar process to editing one with custom CSS. Here we ask for more data to be called for from the API and displayed in the interface. ![[final_polaris_weather_edit.mp4]] And here it integrates with OpenAI API to build a chat application with Polaris components. ![[gpt4o_chat_build_polaris.mp4]] *Programming a simple chat app using Shopify Polaris design system and library* It hallucinated the name of the Send icon which I had to fix, but other than that all I had to do was drop in my API key and the components were imported and used pretty well, from text, card, button, to the input field. It was able to build a chat app that actually had conversational memory enabling multi-message conversations. **Ideal State** Ideally they’ll be wrapped together into one software where the conversation takes place in the app itself relative to the interface. I.e. it would allow all 4 processes in the place of use. So you'll eventually open a blank canvas, talk to it, and shape a new app for your OS. **Caveats** It’s currently a rough proof of concept so depending on the model used, some debugging is required too, a process that could be improved. There are many caveats to make around limitations though, as more affordable models like 7B LLMs can struggle to produce the codebase without a single bug. A simple 10-file chat app can cost under 5 cents with Haiku, and a few dollars with Opus or GPT-4. While the natural language interface lowers the barrier to entry, some people might still find the idea of conversationally developing software challenging without any form of guidance or introduction. This script uses a simple conversational interface, but a production-ready tool would need to offer flexibility in how users build and shape their software, possibly integrating options like a paired GUI. **Software in an era of hyperabundant cognition** Language models are democratizing the process of creating software, blurring the lines between user, designer and engineer. That said, I don't think AI will "replace" or displace human creativity or capability. The term "Computer" used to be a job title [after all.](https://en.wikipedia.org/wiki/Computer_(occupation)) It was used interchangeably with "mathematician", and was often used to describe early astronomers as well. But the digitization and automation of the task of computing didn't obsolesce the roles of mathematician or astronomer. It actually just allowed those people to accomplish more. Eventually, any tasks that become automated by "AI" will not long after come to be forgotten as even being manually performed by people, lost in our collective memory like the job title of "computer". The integration of automation into the software development process is centered around the idea that the process will extend our capacity to achieve our goals. Designers, software developers and even end users may not be hand-typing code, they will likely be more concerned with orchestrating, conducting, curating, expressing intent and sensibilities in a way that our systems can build on. --- [[2019.md]] [[7b Llms.md]] [[Accessibility And Potential.md]] [[Adaptation.md]] [[Adapted.md]] [[Agentcoder.md]] [[Agenticarchitectures.md]] [[Agility.md]] [[Ai Developer.md]] [[Ai.md]] [[Api Calls.md]] [[Api Integrations.md]] [[Approaches.md]] [[Apps.md]] [[Article.md]] [[Assembled.md]] [[Astronomer.md]] [[Automation Integration Into Software Development.md]] [[Automation.md]] [[Backend Integrations.md]] [[Boundaries.md]] [[Browser.md]] [[Bug.md]] [[Chain System.md]] [[Chains.md]] [[Challenge.md]] [[Challenges.md]] [[Chat App.md]] [[Clean Documentation.md]] [[Code Reusability.md]] [[Code.md]] [[Codebase.md]] [[Collaboration Among Designers Software Developers And End Users.md]] [[Command-line Interface.md]] [[Companies.md]] [[Components.md]] [[Composability In Software.md]] [[Composability.md]] [[Computer.md]] [[Computers.md]] [[Computing Experience.md]] [[Comue.md]] [[Concept.md]] [[Cs.hc.md]] [[Css.md]] [[Custom Css Files.md]] [[Custom Features.md]] [[Custom Tailored.md]] [[Debugging.md]] [[Demo.md]] [[Deprecated Components.md]] [[Design System.md]] [[Design Systems And Component Libraries.md]] [[Design Systems.md]] [[Designers.md]] [[Developer Docs.md]] [[Development Chain.md]] [[Development Context.md]] [[Devhours.md]] [[Documentation.md]] [[Documents.md]] [[End Users.md]] [[End-user Programming.md]] [[Enduser.md]] [[Errors.md]] [[Existing Adaptation Behaviors.md]] [[Felix.md]] [[Flexibility.md]] [[Geoffrey Litt.md]] [[Github.md]] [[Goal.md]] [[Gpt-4.md]] [[Granular Control.md]] [[Granularity.md]] [[Guidance.md]] [[Guiding Principles.md]] [[Haiku.md]] [[Hierarchical Task Decomposition And Retrieval-augmented Code Generation - A Tree-based Approach To Software Development With Llms Caveats.md]] [[High Cohesion.md]] [[HTD_RACG.png]] [[Human Ai Creativity.md]] [[Human-computer Interaction.md]] [[Hyperabundant Cognition.md]] [[Hyperavailability.md]] [[Integrations.md]] [[Interface Customization.md]] [[Interfaces.md]] [[Interoperability.md]] [[J. Ryan Stinnett.md]] [[Js.md]] [[Jsx.md]] [[Kanban.md]] [[Knowledgebase.md]] [[Libraries.md]] [[Limitations.md]] [[Llm.md]] [[Llms.md]] [[Location.md]] [[Loose Coupling.md]] [[Maintenance.md]] [[Malleable Software.md]] [[Malleable Systems Collective.md]] [[Malleable Systems With Llms And Retrieval Augmented Code Generation.md]] [[Malleable Systems.md]] [[Malleablesoftware.md]] [[Materialabundance.md]] [[Mathematician.md]] [[Memory.md]] [[Methods.md]] [[Mit Csail.md]] [[Model.md]] [[Modified.md]] [[Modularity.md]] [[Natural Language Interface.md]] [[Normal Programming.md]] [[Opus.md]] [[Ownership And Control.md]] [[P.t.p64.md]] [[Phd Thesis Designing And Programming Malleable Software.md]] [[Philip Tchernavskij.md]] [[Physics Of Interfaces.md]] [[Physics.md]] [[Polaris System Components.md]] [[print_files.py]] [[Process.md]] [[Product Design And Development Process.md]] [[Product Design And Development.md]] [[Product Development Workflow.md]] [[Product Owner.md]] [[Production-ready Software.md]] [[Projects.md]] [[Proof-of-concept.md]] [[Python Scripts.md]] [[Python.md]] [[Qa Agent.md]] [[Rapid Prototyping Tools.md]] [[React Project.md]] [[React Projects.md]] [[Recombination And Reuse.md]] [[Recombined Workflows.md]] [[Retrieval Augmented Code Generation.md]] [[Reusability.md]] [[Rough Proof Of Concept.md]] [[Scalability.md]] [[Script.md]] [[Scrum.md]] [[Sdk Integration.md]] [[Selenium.md]] [[Self-editing Systems.md]] [[Self-generating Systems.md]] [[Semantic Search.md]] [[Software Components.md]] [[Software Creation.md]] [[Software Design And Development.md]] [[Software Design Group.md]] [[Software Design.md]] [[Software Developers.md]] [[Software Development.md]] [[Software Engineering.md]] [[Software In The Era Of Hyperabundant Cognition.md]] [[Software Malleability.md]] [[Software.md]] [[Specialized Teams.md]] [[Stand-up Meetings.md]] [[Strategies.md]] [[Testing.md]] [[Tokensused.md]] [[Top-down Design.md]] [[Ui Design.md]] [[Ui.md]] [[Universite Paris Saclay.md]] [[User Designer Engineer Roles.md]] [[User Empowerment.md]] [[User Experience.md]] [[User Input Requirements.md]] [[User Inputs.md]] [[User Requirements.md]] [[Users.md]] [[Vectordbs.md]] [[Weather App.md]] [[Weatherapi.com.md]]