HANDOUT - 2024-10-18 - LLMS, ASTs and DSLs - mnml's vault

# Workshop Handout: Abstract Syntax Trees, Domain-Specific Languages, and LLMs - [[TALK - SLIDES - 01 - ASTs and python ast]] - [[TALK - SLIDES - 02 - DSLs]] ## Introduction to Abstract Syntax Trees (ASTs) Abstract Syntax Trees (ASTs) are hierarchical tree representations of the abstract syntactic structure of source code. They play a crucial role in various aspects of programming and language processing. ### Key Concepts: - **Definition**: ASTs abstract away syntax details, focusing on the meaningful elements of code. - **Purpose**: Simplify code analysis and transformation. - **Applications**: Used in compilers, interpreters, code analysis tools, and more. ## Trees in Computer Science and ASTs Trees are fundamental structures in computer science, used to represent hierarchical relationships in various contexts. Abstract Syntax Trees (ASTs) are a specific application of tree structures in programming language processing. ### Trees in Different Contexts 1. **JSON (JavaScript Object Notation)** JSON is a lightweight data interchange format that naturally represents a tree structure. Example: ```json { "name": "John Doe", "age": 30, "address": { "street": "123 Main St", "city": "Anytown" }, "hobbies": ["reading", "cycling"] } ``` Tree representation: ``` Object / | \ \ name age address hobbies | | | | "John 30 Object Array Doe" / \ / \ street city reading cycling | | "123 Main "Anytown" St" ``` 2. **YAML (YAML Ain't Markup Language)** YAML is a human-friendly data serialization standard that can represent complex hierarchical structures. Example: ```yaml person: name: Alice Smith age: 28 skills: - Python - JavaScript education: degree: Bachelor's major: Computer Science ``` Tree representation: ``` person / | \ \ name age skills education | | | | Alice 28 List Object Smith / \ / \ Python degree major JavaScript | | Bachelor's Computer Science ``` 3. **HTML (HyperText Markup Language)** HTML documents are represented as a tree structure called the Document Object Model (DOM). Example: ```html <html> <head> <title>My Page</title> </head> <body> <h1>Welcome</h1> <p>This is a <em>sample</em> page.</p> </body> </html> ``` Tree representation (DOM): ``` html / \ head body | / \ title h1 p | | / \ "My Page" Welcome text em | | "This is a" "sample" | "page." ``` 4. **Programming Language AST** Abstract Syntax Trees represent the structure of source code in a tree format, abstracting away syntax details. Python code example: ```python def greet(name): return f"Hello, {name}!" ``` AST representation: ``` FunctionDef / | \ name args body | | | "greet" arg Return | | "name" JoinedStr / \ Constant FormattedValue | | "Hello, " Name | "name" ``` ## Representing ASTs in Python: ```python class ASTNode: def __init__(self, value, children=None): self.value = value self.children = children or [] # Example usage a = ASTNode('a') b = ASTNode('b') c = ASTNode('c') d = ASTNode('d') multiply = ASTNode('*', [c, d]) add = ASTNode('+', [b, multiply]) assign = ASTNode('=', [a, add]) ``` ### AST Walkers: - Programs that traverse an AST, performing operations on each node. - Used for code analysis, transformation, and evaluation. - Typically implement recursive algorithms to visit each node in the tree. ## Parsing Python Code with `ast` Python's built-in `ast` module provides tools for working with ASTs: ```python import ast code = "a = b + c * d" tree = ast.parse(code) print(ast.dump(tree, indent=2)) ``` ### Modifying ASTs: The `ast` module allows for programmatic modification of ASTs: ```python import ast class PrintAdder(ast.NodeTransformer): def visit_FunctionDef(self, node): print_stmt = ast.Expr( value=ast.Call( func=ast.Name(id='print', ctx=ast.Load()), args=[ast.Constant(value=f"Entering {node.name}")], keywords=[] ) ) node.body.insert(0, print_stmt) return node # Usage tree = ast.parse("def greet(): return 'Hello'") transformer = PrintAdder() modified_tree = transformer.visit(tree) modified_tree = ast.fix_missing_locations(modified_tree) modified_code = ast.unparse(modified_tree) print(modified_code) ``` ## AST Visiting and Modification AST visiting and modification are powerful techniques for analyzing and transforming Python code. The `ast` module provides two main classes for this purpose: `NodeVisitor` and `NodeTransformer`. ### AST Visiting with `NodeVisitor` `NodeVisitor` is used for traversing the AST without modifying it. It's useful for code analysis tasks. ```python import ast class FunctionNameVisitor(ast.NodeVisitor): def __init__(self): self.function_names = [] def visit_FunctionDef(self, node): self.function_names.append(node.name) self.generic_visit(node) # Usage code = """ def foo(): pass def bar(): pass """ tree = ast.parse(code) visitor = FunctionNameVisitor() visitor.visit(tree) print(visitor.function_names) # Output: ['foo', 'bar'] ``` ### AST Modification with `NodeTransformer` `NodeTransformer` allows you to modify the AST by replacing or removing nodes. ```python import ast class ConstantFolder(ast.NodeTransformer): def visit_BinOp(self, node): self.generic_visit(node) if isinstance(node.left, ast.Constant) and isinstance(node.right, ast.Constant): if isinstance(node.op, ast.Add): return ast.Constant(value=node.left.value + node.right.value) return node # Usage code = "x = 2 + 3" tree = ast.parse(code) transformer = ConstantFolder() modified_tree = transformer.visit(tree) print(ast.unparse(modified_tree)) # Output: x = 5 ``` ## Creating Pylint Rules with AST Visiting Pylint is a popular Python linter that uses AST analysis to detect issues in code. You can create custom Pylint rules by implementing AST visitors. ### Example: Custom Pylint Rule Here's an example of a custom Pylint rule that detects the use of `print` statements: ```python from pylint.checkers import BaseChecker from pylint.interfaces import IAstroidChecker from astroid import nodes class PrintStatementChecker(BaseChecker): __implements__ = IAstroidChecker name = 'print-statement' msgs = { 'W0001': ( 'Use of print statement', 'print-statement-used', 'Used when a print statement is detected.' ), } def visit_call(self, node): if isinstance(node.func, nodes.Name) and node.func.name == 'print': self.add_message('print-statement-used', node=node) def register(linter): linter.register_checker(PrintStatementChecker(linter)) ``` To use this custom rule: 1. Save the code in a file (e.g., `print_checker.py`). 2. Run Pylint with the custom checker: `pylint --load-plugins=print_checker your_code.py` This rule will now detect and report the use of `print` statements in your code. ## Tree-sitter: A Powerful AST Matcher Library Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter is language-agnostic and can be used with many programming languages. ### Key Features of Tree-sitter: 1. **Fast and Incremental**: Tree-sitter can parse code quickly and update the AST incrementally as changes are made. 2. **Language-Agnostic**: Supports many programming languages out of the box. 3. **Robust**: Can produce a valid syntax tree even for incomplete or invalid code. 4. **Query Language**: Provides a powerful query language for searching syntax trees. ### Using Tree-sitter in Python Here's an example of using Tree-sitter in Python to parse and query JavaScript code: ```python from tree_sitter import Language, Parser # Load the JavaScript language JS_LANGUAGE = Language('build/my-languages.so', 'javascript') # Create a parser parser = Parser() parser.set_language(JS_LANGUAGE) # Parse some code code = """ function greet(name) { console.log("Hello, " + name + "!"); } """ tree = parser.parse(bytes(code, "utf8")) # Query the AST query = JS_LANGUAGE.query(""" (function_declaration name: (identifier) @function_name parameters: (formal_parameters (identifier) @param_name)) """) captures = query.captures(tree.root_node) for capture in captures: print(f"{capture[1]}: {capture[0].text.decode('utf8')}") ``` This script will output: ``` function_name: greet param_name: name ``` ### Advantages of Tree-sitter: 1. **Performance**: Tree-sitter is designed to be fast and efficient, making it suitable for real-time applications like text editors. 2. **Flexibility**: The query language allows for complex pattern matching across the syntax tree. 3. **Multi-language Support**: Tree-sitter can be used with many programming languages, making it versatile for polyglot projects. 4. **Integration**: Can be easily integrated into various tools and environments, including text editors and IDEs. ### Use Cases: - Syntax highlighting - Code navigation - Refactoring tools - Static analysis - Code search and replace ## Using LLMs for Generating AST Visitors and Transformers Large Language Models (LLMs) can be powerful tools for generating AST visitors and transformers. However, it's important to be aware of their limitations and potential for hallucinations. Here are some strategies for effectively using LLMs in this context: ### Generating AST Visitors and Transformers with LLMs 1. **Provide Clear Context**: Give the LLM a clear description of the AST structure and the desired transformation or analysis. 2. **Use Examples**: Provide example inputs and outputs to guide the LLM's generation. 3. **Iterative Refinement**: Generate initial code, then ask the LLM to refine or extend specific parts. 4. **Combine with Templates**: Use LLMs to fill in specific parts of pre-defined templates for visitors or transformers. Example prompt: ``` Generate a Python AST visitor that counts the number of function definitions in a given AST. Use the ast module and subclass ast.NodeVisitor. Include a simple example of how to use the visitor. ``` ### Improved Approach for LLM-Assisted AST Code Generation To further mitigate the risks of hallucinations and improve the quality of LLM-generated AST code, follow this process: 1. **Generate a Tutorial**: - Ask the LLM to create a comprehensive tutorial with many self-contained examples covering various AST operations. - Ensure the tutorial includes a wide range of node types and common transformation patterns. 2. **Create a Reference Document**: - Have the LLM generate a detailed reference document that outlines the structure of the AST, available node types, and common visitor/transformer patterns. 3. **Run and Fix Examples**: - Implement and run the examples provided in the tutorial. - Identify and fix any errors or inconsistencies. - This process helps validate the LLM's understanding and output. 4. **Seed LLM Generation**: - Use the corrected tutorial and reference document to seed subsequent LLM interactions. - When generating new AST visitors or transformers, refer back to these validated resources. ## Introduction to Domain-Specific Languages (DSLs) Domain-Specific Languages are programming languages tailored to a particular application domain. They provide specific abstractions and notations to simplify programming tasks within that domain. ASTs play a crucial role in the implementation and use of DSLs, serving as the backbone for parsing, interpretation, and code generation. ### Key Concepts: - **Purpose**: Simplify programming tasks within a specific domain. - **Types**: Internal DSLs (embedded in a host language) and External DSLs (standalone languages). - **AST Role**: Central to parsing, analyzing, and executing DSL code. ### Example (YAML DSL for simple data processing): ```yaml - load_file: users.csv - filter: column: age condition: greater_than value: 18 - sort: column: last_name - export: format: json file: adult_users.json ``` ### AST Representation of the DSL: ``` Program ├── LoadFile │ └── Filename: "users.csv" ├── Filter │ ├── Column: "age" │ ├── Condition: "greater_than" │ └── Value: 18 ├── Sort │ └── Column: "last_name" └── Export ├── Format: "json" └── Filename: "adult_users.json" ``` ### ASTs in DSL Implementation: 1. **Parsing**: Convert DSL code into an AST for easier manipulation. 2. **Validation**: Traverse the AST to check for semantic correctness. 3. **Interpretation**: Walk the AST to execute the DSL commands. 4. **Code Generation**: Transform the AST into target language code. 5. **Optimization**: Analyze and modify the AST to improve performance. ### Design Patterns for DSLs (leveraging ASTs): - **Visitor Pattern**: - Separate interpretation logic from AST structure - Easily extend with new operations on the AST - **Strategy Pattern**: - Encapsulate different AST traversal strategies - Useful for handling different DSL versions or dialects - **Command Pattern**: - Represent DSL actions as AST nodes - Useful for undo/redo functionality or logging - **Template Pattern**: - Define AST-based templates for the DSL targets - Interpreter can use these templates to generate the final output - **Intermediate DSL** / **Intermediate Representation** (IR): - Create an intermediate AST that is easier to transform - IRs can also easily be transformed to other targets ## DSL Examples and Techniques Domain-Specific Languages (DSLs) can be applied to a wide variety of domains, each with its own unique requirements and structures. Let's explore some more advanced examples of DSLs and techniques for working with them. ### Test Case Description DSL Test case description DSLs allow developers and QA engineers to define test scenarios in a structured, readable format. This approach can improve test coverage, readability, and maintainability. ```yaml test_suite: User Registration cases: - name: Valid Registration input: username: john_doe email: [email protected] password: securePass123 expected: status: success message: User registered successfully - name: Invalid Email input: username: jane_doe email: invalid-email password: pass123 expected: status: error message: Invalid email format ``` ### UI Layout DSL UI Layout DSLs can simplify the process of designing user interfaces by providing a declarative way to describe UI components and their properties. ```yaml window: title: My Application size: [800, 600] components: - type: button text: Click me! position: [10, 10] size: [100, 30] on_click: handle_button_click - type: text_input position: [10, 50] size: [200, 25] placeholder: Enter your name - type: label text: Welcome to my app position: [10, 90] font: name: Arial size: 16 style: bold ``` ### Data Validation DSL Data validation is a common requirement in many applications. A DSL for data validation can provide a concise way to define validation rules for different data types. ```yaml schema: user: fields: - name: username type: string rules: - min_length: 3 - max_length: 20 - pattern: ^[a-zA-Z0-9_]+$ - name: email type: string rules: - format: email - name: age type: integer rules: - min: 18 - max: 120 required: - username - email ``` ### Network Configuration DSL Network configuration often involves complex setups with multiple devices and subnets. A DSL for network configuration can simplify this process and reduce errors. ```yaml network: name: MyHomeNetwork devices: - name: MainRouter type: router ip: 192.168.1.1 connections: - Switch1 - WiFiAP - name: Switch1 type: switch ports: 24 - name: WiFiAP type: access_point ssid: MyHomeWiFi security: wpa2 password: securePassword123 subnets: - name: MainSubnet range: 192.168.1.0/24 dhcp: start: 192.168.1.100 end: 192.168.1.200 ``` ### Internal vs. External DSLs DSLs can be categorized into two main types: internal and external. Each has its own advantages and use cases. #### Internal DSLs Internal DSLs are embedded within a host programming language. They leverage the syntax and features of the host language to create domain-specific constructs. Here's an example of an internal DSL in Python for UI layout: ```python from ui_builder import Window, Button, TextInput, Label, Font window = ( Window("My Application") .size(800, 600) .add( Button("Click me!") .position(10, 10) .size(100, 30) .on_click(handle_button_click) ) .add( TextInput() .position(10, 50) .size(200, 25) .placeholder("Enter your name") ) .add( Label("Welcome to my app") .position(10, 90) .font(Font("Arial", 16, "bold")) ) ) ``` Internal DSLs are often easier to implement and integrate with existing codebases. They can leverage the host language's tooling and don't require separate parsing or compilation steps. #### External DSLs External DSLs are standalone languages with their own syntax and parsing rules. They are typically more flexible and can be tailored precisely to the domain's needs. The YAML examples shown earlier are all examples of external DSLs. External DSLs often require more upfront work to implement, including creating parsers and interpreters or compilers. However, they can provide a more focused and streamlined experience for domain experts who may not be programmers. ### Interpreters vs. Compilers for DSLs When working with DSLs, you can choose between an interpreter-based or compiler-based approach. Each has its own advantages and trade-offs. #### Interpreter Approach An interpreter executes the DSL code directly, processing and acting on DSL statements at runtime. Here's a simplified example of an interpreter for a query DSL: ```python import yaml from collections import namedtuple, defaultdict QueryNode = namedtuple('QueryNode', ['select', 'from_table', 'join', 'where', 'group_by']) WhereNode = namedtuple('WhereNode', ['field', 'operator', 'value']) JoinNode = namedtuple('JoinNode', ['table', 'on']) class QueryInterpreter: def __init__(self, database): self.database = database def parse_query(self, yaml_string): data = yaml.safe_load(yaml_string) query = data['query'] select = query['select'] from_table = query['from'] join = None if 'join' in query: join_data = query['join'] join = JoinNode(join_data['table'], join_data['on']) where = None if 'where' in query: where_data = query['where'] field, condition = next(iter(where_data.items())) operator, value = next(iter(condition.items())) where = WhereNode(field, operator, value) group_by = query.get('group_by', []) return QueryNode(select, from_table, join, where, group_by) def execute_query(self, query_node): # Step 1: Get data from the main table data = self.database[query_node.from_table] # Step 2: Apply join if present if query_node.join: data = self.apply_join(data, query_node.join) # Step 3: Apply where clause if present if query_node.where: data = self.apply_where(data, query_node.where) # Step 4: Apply group by if present if query_node.group_by: data = self.apply_group_by(data, query_node.group_by, query_node.select) # Step 5: Select fields result = self.apply_select(data, query_node.select) return result def apply_join(self, data, join): joined_data = [] join_table = self.database[join.table] for row in data: for join_row in join_table: if all(row[k] == join_row[v] for k, v in join.on.items()): joined_data.append({**row, **join_row}) return joined_data def apply_where(self, data, where): ops = { 'greater_than': lambda x, y: x > y, 'less_than': lambda x, y: x < y, 'equals': lambda x, y: x == y, } return [row for row in data if ops[where.operator](row[where.field], where.value)] def apply_group_by(self, data, group_by, select): grouped_data = defaultdict(list) for row in data: key = tuple(row[field] for field in group_by) grouped_data[key].append(row) result = [] for key, group in grouped_data.items(): new_row = {field: key[i] for i, field in enumerate(group_by)} for field in select: if isinstance(field, dict): # Aggregate function func, column = next(iter(field.items())) values = [row[column['field']] for row in group] if func == 'avg': new_row[f"{func}_{column['field']}"] = sum(values) / len(values) elif field not in group_by: new_row[field] = group[0][field] # Just take the first value result.append(new_row) return result def apply_select(self, data, select): result = [] for row in data: new_row = {} for field in select: if isinstance(field, dict): # Aggregate function func, column = next(iter(field.items())) new_row[f"{func}_{column['field']}"] = row[f"{func}_{column['field']}"] else: new_row[field] = row[field] result.append(new_row) return result # Example usage database = { 'employees': [ {'id': 1, 'name': 'John', 'age': 30, 'salary': 50000, 'dept_id': 1}, {'id': 2, 'name': 'Jane', 'age': 35, 'salary': 60000, 'dept_id': 2}, {'id': 3, 'name': 'Bob', 'age': 40, 'salary': 70000, 'dept_id': 1}, {'id': 4, 'name': 'Alice', 'age': 25, 'salary': 45000, 'dept_id': 2}, ], 'departments': [ {'id': 1, 'name': 'Sales'}, {'id': 2, 'name': 'Engineering'}, ] } yaml_query = """ query: select: - name - age - avg: field: salary from: employees join: table: departments "on": dept_id: id where: age: greater_than: 30 group_by: - dept_id """ interpreter = QueryInterpreter(database) query_node = interpreter.parse_query(yaml_query) result = interpreter.execute_query(query_node) print("Query Result:") for row in result: print(row) ``` ```output Query Result: {'name': 'Engineering', 'age': 35, 'avg_salary': 60000.0} {'name': 'Sales', 'age': 40, 'avg_salary': 70000.0} ``` Interpreters are often more flexible and easier to implement. They can handle dynamic changes to the DSL code at runtime and are well-suited for interactive or scripting environments. #### Compiler Approach A compiler translates the DSL code into another language (e.g., SQL, machine code, or an intermediate representation). Here's an example of a compiler for the same query DSL: ```python class QueryCompiler: def compile(self, query_ast): select_clause = self.compile_select(query_ast.select) from_clause = self.compile_from(query_ast.from_table) join_clause = self.compile_join(query_ast.join) where_clause = self.compile_where(query_ast.where) group_by_clause = self.compile_group_by(query_ast.group_by) sql_parts = [ select_clause, from_clause, join_clause, where_clause, group_by_clause ] return ' '.join(part for part in sql_parts if part) def compile_select(self, select_fields): compiled_fields = [] for field in select_fields: if isinstance(field, dict): # For aggregate functions func, column = next(iter(field.items())) compiled_fields.append(f"{func.upper()}({column['field']}) AS {func}_{column['field']}") else: compiled_fields.append(field) return f"SELECT {', '.join(compiled_fields)}" def compile_from(self, from_table): return f"FROM {from_table}" def compile_join(self, join_node): if join_node: on_condition = next(iter(join_node.on.items())) return f"JOIN {join_node.table} ON {on_condition[0]} = {on_condition[1]}" return "" def compile_where(self, where_node): if where_node: sql_operator = self.get_sql_operator(where_node.operator) return f"WHERE {where_node.field} {sql_operator} {where_node.value}" return "" def compile_group_by(self, group_by_fields): if group_by_fields: return f"GROUP BY {', '.join(group_by_fields)}" return "" def get_sql_operator(self, operator): operators = { 'greater_than': '>', 'less_than': '<', 'equals': '=', 'not_equals': '!=', 'greater_than_or_equal': '>=', 'less_than_or_equal': '<=' } return operators.get(operator, operator) # Usage compiler = QueryCompiler() sql_query = compiler.compile(ast) print(sql_query) ``` The compiled SQL query: ```sql SELECT name, age, AVG(salary) AS avg_salary FROM employees JOIN departments ON employees.dept_id = departments.id WHERE age > 30 GROUP BY dept_id ``` Compilers can often produce more efficient code, as they can perform optimizations during the compilation process. They're well-suited for DSLs that need to be translated into existing languages or systems. ### Choosing Between Interpreters and Compilers The choice between an interpreter and a compiler depends on your specific use case: - **Interpreters** are better for: - Interactive environments - Rapid prototyping - DSLs that need to be highly dynamic - Situations where runtime flexibility is more important than performance - **Compilers** are better for: - Performance-critical applications - DSLs that map well to existing languages or systems - Situations where static analysis and optimization are important ## Using Intermediate DSLs: LLMs can be particularly effective when used with intermediate DSLs: 1. Design a custom, easy-to-generate DSL. 2. Use LLMs to translate natural language to this intermediate DSL. 3. Transform the intermediate DSL into the target language or system configuration. Example: ```yaml # High-level DSL employee_query: find: - name - age - average_salary in_department: Sales age_above: 30 group_by_department: true # Intermediate DSL query: select: - name - age - avg: field: salary from: employees join: table: departments on: employees.dept_id: departments.id where: age: greater_than: 30 group_by: - dept_id ``` The interpreter is reasonably simple and uses a template for the intermediate DSL. ```python def convert_employee_query_to_intermediate(employee_query): intermediate_query = { "query": { "select": [], "from": "employees", "join": { "table": "departments", "on": { "employees.dept_id": "departments.id" } }, "where": {}, "group_by": [] } } # Handle 'find' fields for field in employee_query["find"]: if field == "average_salary": intermediate_query["query"]["select"].append({"avg": {"field": "salary"}}) else: intermediate_query["query"]["select"].append(field) # Handle 'in_department' if "in_department" in employee_query: intermediate_query["query"]["where"]["departments.name"] = {"equals": employee_query["in_department"]} # Handle 'age_above' if "age_above" in employee_query: intermediate_query["query"]["where"]["age"] = {"greater_than": employee_query["age_above"]} # Handle 'group_by_department' if employee_query.get("group_by_department", False): intermediate_query["query"]["group_by"].append("dept_id") return intermediate_query # Example usage employee_query = { "employee_query": { "find": ["name", "age", "average_salary"], "in_department": "Sales", "age_above": 30, "group_by_department": True } } intermediate_query = convert_employee_query_to_intermediate(employee_query["employee_query"]) print(yaml.dump(intermediate_query, default_flow_style=False)) ``` This approach combines the ease of generation for LLMs with the precision and control needed for complex systems. ## Using LLMs for DSL Generation and Interpretation Large Language Models (LLMs) can be powerful tools for working with DSLs: ### Creating New DSLs: 1. Use LLMs to generate initial DSL designs. 2. Regenerate, refine and iterate on the generated designs. 3. Use LLMs to generate a clean specification document. 4. Use LLMs to create many use cases (for testing and documentation purposes). 3. Use LLMs to help implement the DSL interpreter or compiler. Often, valuable to do it by hand. ### Advantages of Using LLMs with DSLs: - Less prone to hallucinations due to richer syntactical structure. - Can handle larger codebases with fewer tokens. - Readable by non-programmers. - Safer to sandbox. - Enables streaming interpretation. ### LLM Applications for DSLs: - Create new DSL instances. - Transform human language into formal DSL code. - Validate, preview, and interpret DSL code. ## Recommended Literature For those interested in diving deeper into the topics of interpreters, compilers, and programming language design, the following books are highly recommended: 1. **"Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp" by Peter Norvig** - More than AI, it's a deep dive into software engineering with a focus on pattern matching, language creation, interpreters and compilers. - The perfect companion to LLMs. - Offers insights into symbolic computation and language processing - Best programming book ever written. 2. **"Crafting Interpreters" by Robert Nystrom** - A comprehensive guide to building interpreters from scratch - Covers both tree-walk and bytecode virtual machine interpreters - Provides practical, hands-on examples in Java and C 3. **"Modern Compiler Implementation in ML" by Andrew W. Appel** - A thorough treatment of compiler construction using ML - Covers advanced topics like dataflow analysis and instruction selection - Provides a solid theoretical foundation with practical implementations 4. **"Structure and Interpretation of Computer Programs" (SICP) by Harold Abelson and Gerald Jay Sussman** - A classic text on programming concepts and techniques - Includes sections on metalinguistic abstraction and implementing interpreters - Uses Scheme to illustrate fundamental principles of computation 5. **"Principles of Program Analysis" by Flemming Nielson, Hanne R. Nielson, and Chris Hankin** - Focuses on static program analysis techniques - Covers data flow analysis, abstract interpretation, and type-based analysis - Provides a theoretical foundation for understanding and implementing program analysis tools