PLIH Lecture Notes Published - Obsidian Publish

# Modeling Languages --- ## Languages are a solved problem - They were solved in the 2010s with the emergence of JavaScript and Rust and various frameworks * They were solved in the 2000s with the emergence of Python and perl and C# * They were solved in the 1990s with the emergence of C++ and Java. * They were solved in the 1980s with the emergence of C. We have C. We wrote Unix and Linux with C. We don't need anything else. * They were solved in the 1970s with the emergence of Pascal and Simula. Some really smart people told me that's all I would need to know an I learned them well. I never wrote a line of deliverable code in either language. * They were solved in the 1960s with the emergence of Lisp, COBOL and FORTRAN. We still use all three * They were solved in the 1950s with ENIAC and MANIAC and JONNIAC * They were solved in the 1930s with the Universal A-Machine and Lambda calculus. We still use both. Hopefully you get the picture --- ## Every application of any reasonable size has a built in customization language: - emacs - elisp - autocad - lisp - word - visual basic - excel - scripting language - unix - shells and shell scripts - browsers - HTML - all the things that read XML - gaming engines - protocols - yacc, lex, and friends - any program that reads a config file Studying languages makes us better programmers. Gives us insights into the code we write. Language is how we represent information and knowledge. ## How we think of languages When we learn or use languages as programmers, we tend to think about them in several ways: - syntax - behavior associated with syntax (semantics) - libraries - programming idioms Syntax is a solved problem and largely social. Syntax tells us very little about a language: ```text a [25] - Java access of array element 25 (vector-ref a 25) - Scheme access of vector element 25 a [25] - C access of array element 25 with no boundary checks a [25] - Haskell call of a on a list of length 1 ``` Syntax is a fickle friend. Libraries help us program, but actually can make language study harder. Idioms, like syntax are a study of sociological issues. Semantics - what programs mean - is what we're interested in. Precisely defining and implementing what a program means. ## Describing Meaning Describe the meaning of each syntactic element of a language. This is critical for developing compilers and interpreters. We have to know what a language is *supposed* to do before we can determine if our tools are implemented correctly. - Define the new language concrete syntax - Define the meaning of each syntactic element using a known language - Evaluation semantics tells us what it does (execution) - Static semantics tells us what we can predict (type checking) Three ways of doing this (EECS 762) - denotational - map each language structure to a mathematical function - operational - define how legal strings in a language are evaluated - axiomatic - define pre- and post-conditions on execution of language constructs We're going to do something very close to operational semantics - Shriram Krishnamurthi calls this *interpreter semantics* - where we define a golden interpreter for our language. This is how many languages include Verilog and OCaml are defined ## Compilers and Interpreters Two primary styles for language processing: - *Compilers* translate language structures into an executable form and throw the rest away - The *source* language is the language being translated - The *target* language is the language being targeted - *Interpreters* define a function that executes language syntax directly. - The *embedded* language is the language being interpreted - The *host* language is the known language defining the interpreter We will be building interpreters. Our *host* language will be Haskell while our *embedded* language will evolve over the course of the semester. Most real languages are neither purely interpreted or compiled. ## Defining Syntax _Programs are data structures._ ```haskell AE ::= num | AE + AE | AE - AE | (AE) ``` It's a set AE How big? - Infinite Recursive! Inductive Examples ```haskell 4 1 + 3 (2 + 2) + (5 - 7) 1 + 3 - (5 + (8 - 4)) ``` - *concrete syntax* - what programmers write - *abstract syntax* - interpreter operates over ```haskell data AE where Num :: Int -> AE Plus :: AE -> AE -> AE Minus :: AE -> AE -> AE deriving (Show,Eq) ``` `AE` - type name `Num`, `Plus`, ... - Constructors construct elements of the type. ALL elements of the type. `AE -> AE ...` - Signature We'll write an interpreter over AE This is not the standard syntax for a Haskell algebraic type, but instead uses the GADT form. It is equivalent to: ```haskell data AE = Num Int | Plus AE AE | Minus AE AE deriving (Show,Eq) ``` - *parser* - concrete syntax -> abstract syntax "1+3" == (Plus 1 3) ```haskell expr :: Parser AE expr = buildExpressionParser operators term operators = [ [ inFix "+" Plus AssocLeft , inFix "-" Minus AssocLeft ] ] numExpr :: Parser AE numExpr = do i <- integer lexer return (Num (fromInteger i)) term = parens lexer expr <|> numExpr -- Parser invocation parseAE = parseString expr ``` _Note_: This is not the abstract syntax we will actually use Examples ```haskell (parse "3") == (Num 3) (parse "3 + 4") == (Plus (Num 3) (Num 4)) (parse "((3 - 4) + 7)" == (Plus (Minus (Num 3) (Num 4)) (Num 7)) ``` Parsers are solved problems and this is the last we will speak of them in detail. We're going to skip the parser because the abstract syntax will be as easy to read as the concrete syntax. What a bonus. # Interpreters --- ## Monadic Interpreters _We will learn about languages by building interpreters for them in Haskell_ The general notion of an interpreter maps a _language_ to a _value_. Mathematically: $E: L\rightarrow V$ $E$ is our interpreter $L$ is our language and $V$ is our value. Values are good results. Cannot be evaluated further. Let's start with the simplest language ever: ```haskell AE ::= num ``` ```haskell data AE where Nat :: Int -> AE (deriving Eq,Show) ``` `Nat` - constructor ```haskell eval :: AE -> Int ``` A parser will translate numbers into `AE`: ! - Bang ? - Hook `*` - Splat #! - Shebang ```haskell parse "1" == (Nat 1) parse "2" == (Nat 2) parse "a" == ! parse "1+2" == ! ``` An interpreter will translate `AE` into values: ```haskell eval::AE -> Int eval (Nat x) = x ``` all together now: ```haskell interp x = eval (parse x) interp "1" == 1 interp "3" == 3 ``` or ```haskell interp = eval . parse ``` This is goofy. In and out and that's it. Now let's add addition to our language. Just another term: ```haskell data AE where Nat :: Int -> AE Plus :: AE -> AE -> AE (deriving Eq,Show) ``` This is not much harder: ```haskell eval::AE -> Int eval (Nat x) = x eval (Plus x y) = (eval x) + (eval y) ``` `x` and `y` in `Plus` bound to input arguments. ```haskell eval (Plus (Nat 1) (Nat 3)) == (eval (Nat 1)) + (eval (Nat 3)) == 1 + 3 == 4 ``` Do programs in AE terminate? Yes and that's okay. Do programs in AE ever crash? No But you can't do anything powerful. Let's add another operator, `Minus` ```haskell data AE where Nat :: Int -> AE Plus :: AE -> AE -> AE Minus :: AE -> AE -> AE ``` and extend `eval` with a new case: ```haskell eval::AE -> Int eval (Nat x) = x eval (Plus l r) = (eval l) + (eval r) eval (Minus l r) = (eval l) - (eval r) --- Not good - could be negative ``` What does `Minus` force us to deal with? - Errors Simple error handling using `error`: ```haskell eval (Minus l r) = let x = (eval l) - (eval r) in if x<0 then error "!" else x ``` What if I don't want to crash? Return an error value: ```haskell eval (Minus l r) = let x = (eval l) - (eval 2) in if x < 0 then -1 else x ``` Whatever we choose, it must be of type `int`. Why is that a problem? Magic Value like `-1`, but easily introduces errors. ## Maybe Two constructors: - `Just x` - where `x` is the result of a computation - `Nothing` - is not the result of a successful computation `Maybe` is parameterized over type: ```haskell data Maybe A = Just :: A -> Maybe A Nothing :: Maybe A ``` What does `Maybe` do to `A`? Using the `Maybe` in a traditional way: ```haskell eval::AE -> (Maybe Int) eval (Nat x) = Just x eval (Plus l r) = case (eval l) Nothing -> Nothing (Just l') -> case (eval r) (Just r') -> (Just l'+r') Nothing -> Nothing eval (Minus l r) = case (eval l) Nothing -> Nothing (Just l') -> case (eval r) Nothing -> Nothing (Just r') -> if (l'<r') then Nothing else Just l'-r' ``` ((2-3) + 4) How does `Maybe` help here? ## Maybe the Monad Using `Maybe` as a Monad ```haskell eval::AE -> Maybe Int eval (Nat x) = Just x eval (Plus l r) = do { x <- eval l; y <- eval r; Just (x+y) } eval (Minus l r) = do { x <- eval l; y <- eval r; if x<y then Nothing else Just x-y} ``` `x <- e` is called *bind* and we're binding the result of evaluating `e` to `x`. Bind only works when `e` is a monadic data structure. The bind arrow does this with `Maybe`: 1. Evaluates the right side 2. If the right side is `Just a`, assign `a` to `x` and go to the next line 3. If the right side is `Nothing`, fall through and return `Nothing` `return a` == `Just a` ```haskell eval::AE -> Maybe Int eval (Nat x) = return x eval (Plus l r) = do { x <- eval l; y <- eval r; return (x+y) } eval (Minus l r) = do { x <- eval l; y <- eval r; if x<y then Nothing else return x-y} ``` This is pretty cool. The Monad and the `do` notation capture the shunting of control around a case when `Nothing` appears. We don't have to worry about it anymore. It gets cooler: ```haskell eval :: AE -> Maybe AE ``` What changed? ```haskell eval (Nat x) = return (Nat x) eval (Plus l r) = do { (Nat x) <- eval l; (Nat y) <- eval r; return (Nat x+y)} eval (Minus l r) = do { (Nat x) <- eval l; (Nat y) <- eval r; if x<y then Nothing else return (Nat x-y)} ``` Pattern matching _inside the bind_. This will come in very handy later, but file it away for now. ```haskell (Boolean Bool) ``` If patterns do not match, then `Nothing` is returned. ---- ## Language Properties - Completeness - every wff that we put into `eval` will get evaluated - Determinicity - every wff we put into `eval` will produce only one value - Normalizing - every wff we put into `eval` will terminate in a value - Value - a good computation result wff - Well Formed Formula ("woof") ## Inference Rules and Axioms - Axioms - Things we know. Givens. - Inference Rules - Things we can deduce from what we know in 1 step - Derivations - Sequences of inference rule applications. An inference rule is a set of _antecedents_ and a _consequent_. If the antecedents are true, the consequent follows immediately: $\begin{prooftree}\AXC{$A$}\AXC{$B$}\RLS{Inference Rule}\BIC{$C$}\end{prooftree}$ Means *if we know $A$ and we know $B$ then we know $C$* An _axiom_ is an inference rule with no antecedents. $\begin{prooftree}\AXC{}\RLS{Axiom}\UIC{$A$}\end{prooftree}$ Means *if we know nothing then we know $A$*. So $A$ is always try with no need for proof. We can define all kinds of things with inference rules: $\begin{prooftree}\AXC{$t_1\in L$}\AXC{$t_2\in L$}\RLS{Syntax}\BIC{$t_1$+``-''+$t_2\in L$} \end{prooftree}$ t ::= Nat | t1-t2 $\begin{prooftree}\AXC{$A\Rightarrow B$}\AXC{$A$}\RLS{Logic}\BIC{$B$}\end{prooftree}$ $\begin{prooftree}\AXC{$A\wedge B$}\RLS{Logic}\UIC{$B$}\end{prooftree}$ And we can build trees that define proofs. $\begin{prooftree} \AXC{$A\Rightarrow B$} \AXC{$A\wedge A$}\UIC{$A$}\RLS{Proofs} \BIC{$B$} \end{prooftree}$ Hilbert defined this system to define all of mathematics. It did not go well. However, Hilbert Systems and inference rules are pretty cool creatures will great utility We will use them to define languages mathematically where they are the dominant definition mechanism. First let's define some notational conventions: - $v$ is a variable representing _values_ - $t$ is a variable representing _terms_ - $\underline{+}$ is an operation in our concrete syntax while $+$ is an operation in Haskell $t_1\Downarrow t_2$ is an evaluation relation and is read "$t_1$ evaluates to $t_2$ in one step" The Haskell function we've defined called `eval` corresponds with $\Downarrow$ - `eval` is a function, not a relation - $t_1\Downarrow t_2$ == `eval t1 = t2` Let's walk through some inference rules for our first little language AE: Values evaluate to themselves. Note the underline. $\begin{prooftree}\AXC{}\RLS{NumE}\UIC{$\underline{v} \Downarrow v$}\end{prooftree}$ Addition in AE is addition in Haskell. $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\RLS{PlusE}\BIC{$t_1 \underline{+} t_2 \Downarrow v_1+v_2$}\end{prooftree}$ If $t_1$ evaluates to $v_1$ and $t_2$ evaluates to $v_2$, then $t_1+t_2$ evaluates to $v_1+v_2$. $3\underline{+}5\Downarrow 3+5$ $3\Downarrow 3$ $5\Downarrow 5$ $8$ Subtraction in AE is subtraction in Haskell $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\RLS{MinusE}\BIC{$t_1 \underline{-} t_2 \Downarrow v_1-v_2$}\end{prooftree}$ If $t_1$ evaluates to $v_1$ and $t_2$ evaluates to $v_2$, then $t_1-t_2$ evaluates to $v_1-v_2$. But should it be? What does this say? $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\AXC{$v_1\geq v_2$}\RLS{MinusE+}\TIC{$t_1 \underline{-} t_2 \Downarrow v_1-v_2$}\end{prooftree}$ What happens if we add this rule to what we already have? $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\AXC{$v_1 < v_2$}\RLS{MinusEZero}\TIC{$t_1 \underline{-} t_2 \Downarrow 0$}\end{prooftree}$ Another alternative. $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\AXC{$v_1 < v_2$}\RLS{MinusEBottom}\TIC{$t_1 \underline{-} t_2 \Downarrow \bot$}\end{prooftree}$ This definitional style is called _Big Step Semantics_ or _Natural Semantics_ But there's more: We can define evaluation with our rules: $\begin{prooftree}\AXC{$\underline{5}\Downarrow 5$}\AXC{$\underline{2}\Downarrow 2$}\RLS{PlusE}\BIC{$\underline{5+2} \Downarrow 5+2 $}\AXC{$\underline{3} \Downarrow 3$}\RLS{PlusE}\BIC{$\underline{5 + 2+ 3} \Downarrow 10$}\end{prooftree}$ So is this a proof or an evaluation? ## Our First Language A complete definition that allows: - Parsing - Evaluation - Reasoning ### Concrete Syntax ```other AE ::= num | AE + AE | AE - AE | (AE) ``` ### Inference Rules $\begin{prooftree}\AXC{}\RLS{NumE}\UIC{$\underline{v} \Downarrow v$}\end{prooftree}$ $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\RLS{PlusE}\BIC{$t_1 \underline{+} t_2 \Downarrow v_1+v_2$}\end{prooftree}$ $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\AXC{$v_1\geq v_2$}\RLS{MinusE+}\TIC{$t_1 \underline{-} t_2 \Downarrow v_1-v_2$}\end{prooftree}$ $\begin{prooftree}\AXC{$t_1 \Downarrow v_1$}\AXC{$t_2 \Downarrow v_2$}\AXC{$v_1 < v_2$}\RLS{MinusEBottom}\TIC{$t_1 \underline{-} t_2 \Downarrow \bot$}\end{prooftree}$ ### Abstract Syntax ```haskell data AE = Nat Int | Plus AE AE | Minus AE AE deriving (Show,Eq) ``` ### Interpreter ```haskell eval (Nat x) = return (Nat x) eval (Plus l r) = do { (Nat x) <- eval l; (Nat y) <- eval r; return (Nat x+y)} eval (Minus l r) = do { (Nat x) <- eval l; (Nat y) <- eval r; if x<y then Nothing else return (Nat x-y)} ``` ## Adding Booleans to AE Definition of ABE is AE with Booleans added: ```other ABE ::= Nat | ABE + ABE | ABE - ABE | (ABE) | true | false | if ABE then ABE else ABE | ABE <= ABE | ABE && ABE | isZero ABE v ::= Nat | true | false Nat ::= 0 | succ Nat ``` And the abstract syntax. ```haskell data ABE where Num :: Int -> ABE Plus :: ABE -> ABE -> ABE Minus :: ABE -> ABE -> ABE Boolean :: Bool -> ABE And :: ABE -> ABE -> ABE Leq :: ABE -> ABE -> ABE IsZero :: ABE -> ABE If :: ABE -> ABE -> ABE -> ABE deriving (Show,Eq) ``` What changed? This is Project 0. # Adding Identifiers --- ## bind and identifiers Things we need to do: 1. Concrete Syntax (`t::=t...`) 2. Inference Rules (antecedents and consequents) 3. Abstract Syntax (`data ABE ...`) 4. Interpreter (`eval t`) `bind` Creates a _binding_ between an _identifier_ and _value_. Normally called `let`. Some examples to ponder when defining a variable using `bind`. ```other bind x=5 in x+x == 5+5 == 10 ``` ```other bind x=5 in bind y=6 in x+y == bind y=6 in 5+y == 5+6 == 11 ``` ```other bind x=5 in bind x=6 in x+x == bind x=6 in x+x == 6+6 == 12 ``` ```other bind x=5 in bind x=6+x in x+x == bind x=6+5 in x+x == bind x=11 in x+x == 11+11 == 22 ``` ```other bind x=5 in x + bind y=6 in x+y == 5 + bind y=6 in 5+y == 5 + 5 + 6 == 5 + 11 == 16 ``` ```other bind x=5 in x + bind x=6 in x+x == 5 + bind x=6 in x+x == 5 + 6 + 6 == 5 + 12 == 17 ``` ```other bind x=5 in x + y == 5 + y == BOOM ``` ```other bind x=x+1 in x == BOOM ``` ## Concrete Syntax ``` BAE ::= num | BAE + BAE | BAE - BAE | (BAE) | bind ID = BAE in BAE | ID ID ::= string ``` ### Useful Definitions - instance - occurrence of an identifier ```other bind ->x = ->x+5 in ->y-4 ``` - binding instance - where an identifier is declared and given a value ```other bind ->x = x+5 in y-4 ``` - bound value - value given to an identifier in a binding instance ```other bind x = ->(x+5) in x-4 ``` - scope - the region where an identifier is defined and can be used ```other bind x = x+5 in [x-4] ``` - bound instance - where an identifier is used _in scope_ ```other bind x = x+5 in [->x-4] ``` - free instance - where an identifier is used _outside scope_ ```other bind x = ->x+5 in x-4 ``` ```other bind x=5 in bind y=6 in x+y+z ``` What is the scope for a variable defined with `bind`? Everything after `in` ## Inference Rules for `bind` and Identifiers First definition will use substitution. ### Substitution operator - $[x \rightarrow v]t$ - Replace all _free_ instances of $x$ in $t$ with $v$ - $[x\rightarrow 5]3 == 3$ - $[x\rightarrow 5]x == 5$ - $[x\rightarrow 5]5+5 ==$ - $[x\rightarrow 7]

bind x=7 in bind y=5 in` $x+y$ $==

bind x=7 in bind y=5 in` $x+y$ - $[x\rightarrow 7]

bind y=5 in` $x+y$ $==$ `bind y=5 in` $7+y$ - $[x\rightarrow 5]

x + bind x=10 in x` $==$ `5 + bind x=10 in x` Substitution is a common mathematical operator that we will assume exists. ### Inference Rules - `bind` $\begin{prooftree} \AXC{$a\Downarrow v_a$}\AXC{$[i\rightarrow v_a]s\Downarrow v_s$}\RLS{BindE} \BIC{$\mathsf{bind}\ i=a\ \mathsf{in}\ s\Downarrow v_s$} \end{prooftree}$ `bind x=3+2 in x+x` #### Explanation - $a$ evaluates to $v_a$ - substitute $v_a$ for $i$ in the body of `bind`, evaluate to $v_s$ - Result is $v_s$ $\begin{prooftree} \AXC{$a\Downarrow v_a$}\AXC{$[i\rightarrow v_a]s\Downarrow v_s$}\RLS{BindE} \BIC{$\mathsf{bind}\ i=a\ \mathsf{in}\ s\Downarrow v_s$} \end{prooftree}$ ### Inference Rules - Identifiers $\begin{prooftree} \AXC{}\RLS{IDE} \UIC{$x\Downarrow\bot$} \end{prooftree}$ #### Explanation - Evaluating an identifier means the identifier was not replaced - No `bind` defined the identifier or it would no longer be there ``` bind x = 3 in y == ??? ``` ## Abstract Syntax Add constructors for new constructs in concrete syntax ```haskell data AE where Nat :: Int -> AE ID :: String -> AE Plus :: AE -> AE -> AE Minus :: AE -> AE -> AE Bind :: String -> AE -> AE -> AE deriving (Show,Eq) ``` ## Evaluation ```haskell eval (Nat x) = return (Nat x) eval (Id s) = Nothing eval (Plus l r) = do { (Nat x) <- eval l; (Nat y) <- eval r; return (Nat x+y)} eval (Minus l r) = do { (Nat x) <- eval l; (Nat y) <- eval r; if x<y then Nothing else return (Nat x-y)} eval (Bind i a b) = do { a' <- eval a; Just (eval (subst i a' b))} ``` We will need to define substitution for this to work: ```haskell subst :: String -> BAE -> BAE -> BAE subst x v (Nat x) = (Nat x) subst x v (Id x') = if x==x' then v else (Id x') subst x v (Plus l r) = (Plus (subst x v l) (subst x v r)) subst x v (Minus l r) = (Minus (subst x v l) (subst x v r)) subst x v (Bind x' v' t') = if x==x' then (Bind x' v' t') else (Bind x' v' (subst x v t')) ``` ```haskell eval (Bind “x” (Num 5) (Bind “y” (Num 6) (Plus (Id “x”) (Id “y”)))) == (Bind "y" (Num 6) (Plus (Num 5) (Id "y"))) == (Plus (Nun 5) (Num 6)) == (Num 11) ``` ```haskell eval (Bind “x” (Num 5) (Plus (Id “x”) (Id “x”)))) == (Plus (Num 5) (Num 5)) == (Num 10) ``` ```haskell eval (Bind “x” (Num 5) (Bind “x” (Num 6) (Plus (Id “x”) (Id “x”)))) == (Bind "x" (Num 6) (Plus (Id "x") (Id "x"))) == (Plus (Num 6) (Id "x")) == (Num 12) ``` ```haskell eval (Bind “x” (Num 5) (Bind “y” (Num 5) (Bind “x” (Num 6) (Bind “y” (Num 4) (Bind “z” (Num 7) (Id “z”)))))) == ``` Anything wrong here? This is a _reference interpreter_ that defines what we want BAE to be, but not necessarily its implementation. - Natural language - Inference rules - Reference interpreter # Deferring Substitution How would you build this interpreter "for real"? Instead of immediately substituting, let's remember bindings of identifiers to values. `[("x",5),("y",7)]` Old Way: ```other eval bind x = 5 in bind y = 6 in x + y == bind y=6 in 5+y == 5+6 == 11 ``` New Way: ```other eval bind x = 5 in [("x",5)] -- "x" is 5 in this scope bind y = 6 in [("x",5),("y",6)] 5 + 6 [("x",5),("y",6)] == 11 ``` - Environment - list of identifiers and values currently in scope ```haskell eval (Bind “x” (Num 5) [("x",5)] (Bind “y” (Num 6) [("x",5),("y",6)] (Plus (Id "x") (Id "y")))) [("x",5),("y",6)] ``` ```haskell eval (Bind “x” (Num 5) [("x",5)] -- Environment (Bind “x” (Num 6) [("x",6),("x",5)] -- Shadowing "x" (Plus 6 6)) [("x",6),("x",5)] ``` ```haskell eval (Bind “x” (Num 5) [("x",5)] (Bind “x” (Num 6) [("x",6),("x",5)] (Plus 6 (Id "y")))) [("x",6),("x",5)] -- Lookup of y fails == BOOM ``` ```haskell eval (Bind “x” (Num 5) [("x",5)] (Plus 5 [("x",5)] (Bind “x” (Num 6) [("x",6)("x",5)] (Plus 6+6))) [("x",6)("x",5)] == 17 ``` ```haskell eval (Bind “x” (Num 5) [("x",5)] (Plus (Bind “x” (Num 6) [("x",6),("x",5),] (Plus 6 6)) [("x",6),("x",5)] 5) [("x",5)] == 17 ``` Now more inference rules? $\begin{prooftree} \AXC{}\AXC{} \BIC{$\mathsf{bind}\; x=v\;\mathsf{in}\ t\Downarrow v$} \end{prooftree}$ Why or why not? ## eval with env Haskell built-ins cool functions: ```haskell Type Env = [(String,BAE)] lookup :: A -> [(A,B)] -> Maybe B x:xs -- Adds x to front of xs -- Pattern matches with non-empty list (that's kinda cool) ``` Let's define `eval`: `return` == `Just` ```haskell eval :: Env -> BAE -> Maybe BAE eval _ (Num n) = Just (Num n) eval e (Plus l r) = do { (Num l') <- (eval e l) ; (Nun r') <- (eval e r) ; return (l'+r') } eval e (Minus l r) = do { (Num l') <- (eval e l) ; (Nun r') <- (eval e r) ; if r'<=l' thern return (l'-r') else Nothing } eval e (Bind x a s) = do { a' <- (eval e a) ; (eval (x,a'):e s) } eval e (Id x) = (lookup x e) ``` Good things about `eval` using deferred substitution: - No more `subst` - No more walking the code We will say our original eval function performs _direct substitution_ and will use the name `evals` We will say our new eval function performs _deferred substitution_ and will us the name `eval` How do we know our new `eval` is correct? $\forall t:BAE\cdot \mathsf{eval}\; []\; t = \mathsf{evals}\; t$ What does this say? Can we do it? # Predicting Behavior Start with a tiny language with representative terms: ```haskell data BAE where | Nat :: Int -> BAE EOVal -> BAE | Plus :: BAE -> BAE -> BAE | Bind :: String -> BAE -> BAE -> BAE ``` ## Abstract Interpretation Can we predict when a returned value is odd or even? ```haskell data EOVal where | Even | Odd ``` ```haskell predict :: [("String",EOVal)] -> BAE -> Maybe EOVal predict _ (Nat n) = return (if even n then Even else Odd) -- Just Even -> n is even return n predict c (Plus l r) = do { l' <- predict c l ; r' <- predict c r ; return (if (l'==r') then Even else Odd) } predict c (Bind x a s) = do { a' <- (eval c a) ; (predict (x,a'):c s) } predict c (Bind x a s) = -- c is the environment predict c (Id x) = (lookup x c) ``` The `predict` function implements _abstract interpretation_ that makes predictions without running code. Let's add some new structures to our language and look at a different kind of prediction: ```haskell data BAE where | Nat :: Int -> BAE | Boolean :: Bool -> BAE | Plus :: BAE -> BAE -> BAE | And :: BAE -> BAE -> BAE | If :: BAE -> BAE -> BAE -> BAE | Bind :: String -> BAE -> BAE -> BAE | Id :: String -> BAE deriving (Show,Eq) ``` Instead of `Odd` and `Even` let's use `TNum` and `TBool` for numbers and booleans: ```haskell data TBAE where | TNum | TBool ``` Can we calculate the number-ness or boolean-ness of a term? ```haskell predict :: [(String,TVal)] -> BAE -> Maybe TVal predict _ (Nat _) = return TNum predict _ (Boolean _) = return TBool predict g (Plus l r) = do { TNum <- predict g l ; TNum <- predict g r ; return TNum } predict g (And l r) = do { TBool <- predict g l ; TBool <- predict g r ; return TBool } predict g (If c t e) = do { TBool <- predict g c ; t' <- predict g t ; e' <- predict g e ; if t'==e' then return t' else Nothing } predict g (Bind x a s) = do { a' <- (predict g a) ; (predict (x,a'):g s) } predict g (Id x) = (lookup x g) ``` What is `predict` in this case? ## Type Checking As you might have guessed we just wrote a type checker for our little language. Lets formalize this idea using inference rules. First, a few definitions: - $x : T$ - typing relation saying `x` is of type `T`. Just like $x\Downarrow t$ $ \begin{prooftree} \AXC{}\RLS{NumT} \UIC{$N:TNum$} \end{prooftree} $ $ \begin{prooftree} \AXC{$l:TNum$}\AXC{$r:TNum$}\RLS{PlusT} \BIC{$l+r : TNum$} \end{prooftree} $ $ \begin{prooftree} \AXC{$TBool$}\AXC{$TBool$}\RLS{AndT} \BIC{$l \wedge r : TBool$} \end{prooftree} $ $ \begin{prooftree} \AXC{$c:TBool$}\AXC{$t:T$}\AXC{$e:T$}\RLS{IfT} \TIC{$if\ c\ then\ t\ else\ e : T$} \end{prooftree} $ - _context_ - a list of identifiers and _types_ in scope represented by $\Gamma$ - $\Gamma\vdash x:T$ - Gamma derives $x$ type $T$ given context $\Gamma$. _Context_ is the same as _Environment_ except it contains types. $ \begin{prooftree} \AXC{$(x:T)\in \Gamma$}\RLS{Name} \UIC{$\Gamma\vdash x : T$} \end{prooftree} $ $ \begin{prooftree} \AXC{$\Gamma\vdash a:T_a$}\AXC{$(x:T_a):\Gamma\vdash t:T$}\RLS{BindT} \BIC{$bind\ x=a\ in\ t : T$} \end{prooftree} $ ## Optimization We know several identities over our calclulations: - $0+x==x$ - $True \wedge x == x$ - `if` $True$ `then` $t$ `else` $e == t$ Can we optimize such expressions out of an program? ```haskell optimize :: BAE -> BAE optimize (Nat n) = (Nat n) optimize (Boolean b) = (Boolean b) optimize (Plus l r) = if (optimize l)==(Nat 0) then (optimize r) else (Plus (optimize l) (optimize r)) optimize (And l r) = optimize (If c t e) = if c==(Boolean True) then (optimize t) else (If (optimize c) (optimize t) (optimize e)) optimize (Bind x a s) = (Bind x (optimize a) (optimize s)) optimize (Id x) = (Id x) ``` ## General Purpose Evaluators Let's change our data structure just a bit: ```haskell data BAE A where | Nat :: A -> BAE A | Plus :: BAE A -> BAE A -> BAE A | Bind :: String -> BAE A -> BAE A -> BAE A | Id :: String -> BAE A ``` ``` predict :: ??? predict _ (Nat n) = predict c (Plus l r) = predict c (Bind x a s) = ``` # Lambda The Ultimate! Now things get serious... ```other inc x = x + 1 ``` ```other inc 3 == 3 + 1 == 4 ``` ```other inc ((5+1)-3) == ((5+1)-3)+1 == 4 or == inc 3 == 3+1 == 4 ``` Some nifty definitions: `inc x = x + 1` - `x` - formal parameter - `x + 1` - body ```other inc 3 ``` - `3` - actual parameter - `inc 3` - application of `inc` to `3` How do we evaluate function application? - $inc\ 3 == [x\rightarrow 3]x+1$ ## Kinds of Functions in Languages - First order functions - Cannot take other functions as arguments. Have special representation in the language that is not accessible ``` int foo(int x){x++}; ``` - Higher order functions - Can take functions as arguments. Can return functions. ```other map foo [1,2,3] = [2,3,4] ``` - First class functions - Functions are values like any other value in the language ```haskell foo = \x -> x+1 foo x = x +1 ``` - We will look at languages with first-class functions - First-class functions are the current trajectory of modern language - ... it's been a long time coming. ## Concrete Syntax - `lambda x in x + x` - Defines a function value over _formal parameter_ `x` - Called a `lambda` or an `abstraction`. - Lambdas are values - From a logical perspective, `lambda` introduces a variable - `(l a)` - Applies a function `l` to _actual paramater_ or _argument_ `a` - Called an _application_ or simply an _app_ - From a logical perspective, `app` eliminates a variable. ```other ((lambda x in x+x) 3) == 3+3 == 6 ``` Have you seen this kind of evaluation before? ``` ((lambda x in x+x) 3) == bind x=3 in x+x ``` ```other FBAE ::= V|FBAE+FBAE|FBAE-FBAE |bind id = FBAE in FBAE |lambda id in FBAE |(FBAE FBAE) |id V ::= Nat|lambda id in FBAE ``` ``` (lambda x in x) == (lambda x in x) ``` ``` ((lambda x in x) 5) == [x->5]x == 5 ``` ``` ((lambda x in x) (lambda x in x)) == [x->(lambda x in x)]x == (lambda x in x) ``` ``` ((((lambda x in (lambda y in x+y)) 3) 2) == ([x->3](lambda y in x+y) 2) == ([y->2](3+y)) == 3+2 == 5 x->3 (lambda y in x+y) y x+y (lambda x y z ...)== (lambda x in (lambda y in (lambda z in ...))) ``` ``` (((lambda x in (lambda y in x+y)) 1) == [x->1](lambda y in x+y) == (lambda y in 1+y) Nat->Nat->Nat Nat->Nat ``` ``` ((lambda x in (x 3)) (lambda x in x)) == [x->(lambda x in x)](x 3) == ((lambda x in x) 3) == [x->3]x == 3 ``` ``` bind inc=(lambda x in x+1) in (inc 3) == [inc->(lambda x in x+1)](inc 3) == ((lambda x in x+1) 3) ... == 3+1 == 4 ``` ``` bind inc=(lambda x in x+1) in bind dec=(lambda x in x-1) in bind sqr=(lambda x in x*x) in (inc (sqr (sqr 3))) ... ``` Alternate concrete syntax: $\lambda x.s$ More common in mathematical presentations ## Abstract Syntax ```haskell data FBAE where | Num :: Int -> FBAE | Plus :: FBAE -> FBAE -> FBAE | Minus :: FBAE -> FBAE -> FBAE | Bind :: String -> FBAE -> FBAE -> FBAE | Lambda :: String -> FBAE -> FBAE | App :: -> FBAE -> FBAE -> FBAE | Id :: String -> FBAE deriving (Show,Eq) ``` ## Inference Rules $ \begin{prooftree} \AXC{$f\Downarrow (\mathsf{lambda}\ i\ \mathsf{in}\ s)$} \AXC{$a\Downarrow v_a$} \AXC{$[i\rightarrow v_a]s\Downarrow v_s$}\RLS{Beta} \TIC{$(f\ a)\Downarrow v_s$} \end{prooftree} $ $ \begin{prooftree} \AXC{}\RLS{LambdaE} \UIC{$\mathsf{lambda}\ i\ \mathsf{in}\ s\Downarrow \mathsf{lambda}\ i\ \mathsf{in}\ s$} \end{prooftree} $ Remember bind? $\begin{prooftree} \AXC{$a\Downarrow v_a$}\AXC{$[i\rightarrow v_a]s\Downarrow v_s$}\RLS{BindE} \BIC{$\mathsf{bind}\ i=a\ \mathsf{in}\ s\Downarrow v_s$} \end{prooftree}$ What's the difference here? We can define `bind` using `lambda` ``` bind i=a in s == ((lambda i in s) a) ``` - This is called a _derived form_ ## Church's Lambda Calculus - Can represent _any_ computable function - Equivalent to Turing Machines - Basis for functional programming - Turing Machines give rise to _imperative programming_ - Lambda Calculus gives rise to _functional programming_ ### Concrete Syntax ```other LC ::= |id |lambda id in LC |(LC LC) V ::= lambda id in LC ``` $ \begin{prooftree} \AXC{$f\Downarrow \mathsf{lambda}\ i\ \mathsf{in}\ s$} \AXC{$a\Downarrow v_a$} \AXC{$[i\rightarrow v_a]s\Downarrow v_s$}\RLS{CBV Beta} \TIC{$(f\ a)\Downarrow v_s$} \end{prooftree} $ $ \begin{prooftree} \AXC{$f\Downarrow \mathsf{lambda}\ i\ \mathsf{in}\ s$} \AXC{$[i\rightarrow a]s\Downarrow v_s$}\RLS{CBN Beta} \BIC{$(f\ a)\Downarrow v_s$} \end{prooftree} $ That's it! ### Fun with Lambda ``` (lambda x in x) is a value ``` ``` (lambda y in y)(lambda x in x) == (lambda x in x) ``` ``` (lambda x in x x) 3 == [x->3](x x) == (3 3) ``` ``` (lambda x in x x)(lambda y in y) == [x->(lambda y in y)](x x) == ((lambda y in y) (lambda y in y)) == [x->(lambda y in y)]y == (lambda y in y) ``` ``` (lambda x in x x)(lambda y in y y) == [x->(lambda y in y y)](x x) == ((lambda x in x x) (lambda x in x x)) == [x->(lambda x in x x)](x x) == ((lambda x in x x) (lambda x in x x)) ``` Omega combinator ## Curried Functions ``` f = (lambda x in (lambda y in x + y)) f 3 4 == ((f 3) 4) == ``` ## Evaluation First lets add to our direct substitution interpreter. Call-by-value ```haskell eval Lambda i s = return (Lambda i s) eval (Id _) = Nothing eval App f a = do {(Lambda i s) <- eval f ; a' <- eval a ; eval (subst a' i s)} eval Bind i a s = do {a' <- eval a ; eval (subst a' i s)} ``` ==Midterm Stops Here== [[2023-10-03-Midterm-Discussion|Midterm Discussion]] ``` ((Lambda i s) a) == bind i a s ``` ``` bind f = (lambda x in x) in (f 2) == ``` ``` bind n = 1 in bind f = (lambda x in x + n) in bind n = 2 in f 1 == bind f = (lambda x in x + 1) in bind n = 2 in f 1 == bind n = 2 in (lambda x in x + 1) 1 == (lambda x in x+1) 1 == [x->1]x+1 == 1+1 == 2 ``` `(f a) -> (App f a)` ``` f(x)=x+1 == bind f = (lambda x in x+1) f(x,y)=x+y == bind f = (lambda x in (lambda y in x+y)) == f(x)=(lambda y in x+y) ``` Same problem as before - inefficient and kind of clumsy Try again with environments: ``` Env = [(string,FBAE)] eval e (Lambda i s) = return (Lambda i s) eval e (App f a) = do { (Lambda i s) <- eval e f; a' <- eval e a; eval ((i,a'):e) s } eval e (Id i) = (lookup i e) ``` ``` bind f = (lambda x in x) in [(f,(lambda x in x))] (f f) '' == ((lambda x in x)(lambda x in x)) [(x,(lambda x in x)),(f,(lambda x in x))] == (lambda x in x) ``` ``` bind n = 1 in [(n,1)] bind f = (lambda x in x + n) in [(f,(lambda x in x + n)),(n,1)] bind n = 2 in [(n,2),(f,(lambda x in x + n)),(n,1)] f 1 == (lambda x in x + n) 1 [(x,1),(n,2),(f,(lambda x in x + n)),(n,1)] == x + n [(x,1),(n,2),(f,(lambda x in x + n)),(n,1)] == 1 + 2 == 3 ``` Oops... ## Static and Dynamic Scoping - Static Scoping - Identifier scope in a lambda is the scope where a lambda is _defined_. - Dynamic Scoping - Identifier scope in a lambda is the scope where a lambda is _used_. ``` bind n = 1 in. [] bind f = (lambda x in x + n) in [] bind n = 2 in [] f 1 ``` How do we fix this problem? *Closures* implement static scoping by including an environment in the function value. Naively: ``` data FBAE where ... Closure :: String -> FBAE -> Env -> FBAE ... ``` ``` (lambda x in x+1 [(n,3))] ``` We will keep a copy of the definition environment in the closure - `Env` in the argument list. Closures are now function values. Introducing a Value type returned by the interpreter ``` data FBAEVal where NumV :: Int -> FBAEVal ClosureV :: String -> FBAE -> Env -> FBAEVal eval :: Env -> FBAE -> (Maybe FBAEVal) ``` How do we: 1. Use the return value 2. Use the closure for static scoping ``` eval e (Num n) = return (NumV n) eval e (Plus l r) = do { (NumV l') <- (eval e l) ; (NumV r') <- (eval e r) ; return (NumV (l'+r'))} eval e (Lambda i s) = return (ClosureV i s e) eval e (App f a) = do {(ClosureV i s e') <- (eval e f) ; a' <- (eval e a) ; eval ((i,a'):e') s }} ``` Where are closures in the evaluator? Let's try the evaluation one more time. ``` bind n = 1 in [(n,1)] bind f = (lambda x in x + n) in [(f,(ClosureV x x+n [(n,1)])),(n,1)] bind n = 2 in [(n,2),(f,(ClosureV x x+n [(n,1)])),(n,1)] f 1 == (ClosureV x x+n [(n,1)]) 1 == x+n [(x,1),(n,1)] == 1+1 == 2 ``` - Immediate substitution $\Rightarrow$ static scoping - Deferred substitution $\Rightarrow$ dynamic scoping - Deferred substitution + closures $\Rightarrow$ static scoping ## Derived Forms Defining new language constructs in terms of existing language constructs `unless` as an example ``` unless ... 1. unless c e or e unless c 2. Unless BBAE -> BBAE -> BBAE 3. unless c e == if c then false else e 4. ``` $ \begin{prooftree} \AXC{$c\Downarrow False$}\AXC{$e\Downarrow v$}\RLS{UnlessF} \BIC{$\mathsf{unless}\ c\ e\Downarrow v$} \end{prooftree} $ $ \begin{prooftree} \AXC{$c\Downarrow True$}\RLS{UnlessT} \UIC{$\mathsf{unless}\ c\ e\Downarrow False$} \end{prooftree} $ ```haskell 5. eval Unless c e = do { c’ <- eval c; if c’ return false else eval e } 6. ``` $ \begin{prooftree} \AXC{$\Gamma\vdash c:TBool$} \AXC{$\Gamma\vdash e:TBool$} \RLS{TUnless} \BIC{$\mathsf{unless}\ c\ e:TBool$} \end{prooftree} $ - _Elaboration_ - Defining new language constructs in terms of existing constructs 1. Define concrete syntax 2. Define abstract syntax 3. _Define elaboration function_ 4. Define type rules 5. Extend type inference function - Embedded Language - New language with extensions - Host Language - Target language for translation ```haskell elab :: Embedded Language AST -> Host Language AST ``` To evaluated translate embedded language to host language and execute as usual: ```haskell evale t = eval [] (elab t) ``` or ```haskell evale t = eval [] . elab ``` Let's introduce a new expression in FBAE that performs increment. Sort of like ++ in C. In fact, we could use the notation `t++` to represent this if we wanted. To keep things simple, let's use `inc t`. Here is the abstract syntax for the host language, FAE: ```haskell data FAE where | Num :: int -> FBAE | Plus :: FBAE -> FBAE -> FBAE | Lambda :: String -> FBAE -> FBAE | App :: FBAE -> FBAE -> FBAE | Id :: String -> FBAE ``` This is just FBAE without `bind` Here is the abstract syntax for our embedded language ```haskell data FAEX where | NumX :: int -> FAEX | PlusX :: FAEX -> FAEX -> FAEX | LambdaX :: String -> FAEX -> FAEX | AppX :: FAEX -> FAEX -> FAEX | IdX :: String -> FAEX | IncX :: FAEX -> FAEX ``` What changed? Concrete syntax for the new `inc` function: ```haskell inc x == x+1 ``` Now an inference rule for `inc`: $ \begin{prooftree} \AXC{$x\Downarrow v-1$} \UIC{$inc\ x\Downarrow v$} \end{prooftree} $ Does this work? Do we even need it? Now our elaborator: ```haskell elab :: FAEX -> FAE elab NumX n = Num n elab PlusX l r = Plus (elab l) (elab r) elab LambdaX i b = Lambda i (elab b) elab AppX f a = App (elab f) (elab a) elab IdX s = Id s elab IncX x = Plus (elab x) (Num 1) ``` And finally `eval` that calls `elab` before evaluating `t`: ```haskell evalX t = eval [] (elab t) ``` What about type checking? Anything wrong with this? ```haskell typeofX t = typeof [] (elab t) ??? ``` We really do need a type checker for our embedded language. ```haskell Con = [(string,FAEType)] typeofX :: Con -> FAEX -> Maybe FAETNum typeofX _ (NumX _) = TNum typeofX c (PlusX l r) = do { TNum <- typeofX c l; TNum <- typeofX c r; return TNum } ... typeofX c (IncX t) = do { TNum <- typeofX t ; return TNum } typeofX c (IdX s) = lookup s c interpX t = do { typeofX [] t; evalX [] t } ``` What about `bind`? `bind i = v in t == (app (lambda i t) v)` 1. Concrete Syntax - already defined for FBAE 2. Abstract Syntax - already defined for FBAE 3. Evaluation Rules - already defined for FBAE 4. Elaboration - some work required 5. Type Rules - already defined for FBAE 6. typeof - already for FBAE Let's define the `elab` case for `bind`: ``` elab (Bind i a t) = (App (Lambda i (elab t)) (elab a)) ``` # Recursion ## Dynamically Scoped Recursion ``` bind fact = [] lambda x in if x=0 then 1 else x * (fact x-1) in (fact 3) == (fact 3) [(fact,(lambda x in ...))] == ((lambda x in if x=0 then 1 else x * (fact x-1)) 3) == if x=0 then 1 else x * (fact x-1) [(x,3),(fact,(lambda x in ...))] == if 3=0 then 1 else 3 * ((lambda x in ...) 3-1) == 3 * ((lambda x in ...) 2) == 3 * if x=0 then 1 else x * (fact x-1) [(x,2),(x,3),(fact,(lambda x in ...))] == 3 * if 2=0 then 1 else 2 * ((lambda x in ...) 2-1) == 3 * 2 * ((lambda x in ...) 1) == 3 * 2 * if x=0 then 1 else x * (fact x-1) [(x,1),(x,2),(x,3),(fact,(lambda x in ...))] == 3 * 2 * if 1=0 then 1 else 1 * ((lambda x in ...) 0) == 3 * 2 * 1 * ((lambda x in ...) 0) == 3 * 2 * 1 * if 0=0 then 1 else 0 * (fact x-1) [(x,0),(x,1),(x,2),(x,3),(fact,(lambda x in ...))] == 3 * 2 * 1 * 1 == 6 ``` ### Execution Sequence ((lambda x in if x=0 then 1 else x * (fact x-1)) 3) [(fact,(lambda x in ...))] Recursion works like recursion should. Cool ## Statically Scoped Recursion ``` bind fact = [] lambda x in if x=0 then 1 else x * (fact x-1) in [(fact,closure x ... [])] (fact 3) == (fact 3) [(fact,closure x ... [])] == ((closure x ... []) 3) [(fact,closure x ... [])] == if 3=0 then 1 else 3 * (fact 3-1) [(x,3),(fact,closure x ... [])] == 3 * (fact 2) [(x,3),(fact,closure x ... [(fact,closure x ... [])])] == 3 * ((closure x ... []) 2) [] --- CAN'T HAPPEN fact is not in the environment == ``` ### Execution Sequence ((closure x (if x=0 then 1 else x*(fact x-1))) 3) [(fact,closure x ... [])] == if x=0 then 1 else x*(fact x-1) [(x,3)] == 3 * fact 2 [(x,3)] ## Omega (redux) ``` bind o = lambda x in x x o o == (lambda x in x x)(lambda x in x x) == (x x) [(x,(lambda x in x x))] == (lambda x in x x)(lambda x in x x) ``` Does this work with closures? ``` bind o = lambda x in x x o o == o o [(o,(closure x in x x []))] == (closure x in x x [])(closure x in x x []) [(o,(closure x in x x []))] == (x x) [(x,(closure x in x x []))] == (closure x in x x [])(closure x in x x []) [(o,(closure x in x x []))] ``` ## The Y Combinator ``` bind Y = (lambda f (lambda x in (f (x x))) (lambda x in (f (x x)))) in (Y F) == ((lambda x in (F (x x)))) (lambda x in (F (x x)))) == (F (x x)) [(x,(lambda x in (F (x x)))] == (F ((closure x in (F (x x)) [] (closure x in (F (x x)) [])) == (F (F (x x))) [(x,(lambda x in (F (x x)))] == (F (F (F (x x)))) [(x,(lambda x in (F (x x)))] ``` - `f` is the function being called recursively - `F` the function we're applying recursively, not an identifier The Y Combinator calculates a _fixed point_ for F Calculating `sum` as an example. ``` bind F = (lambda g in (lambda z in if z=0 then z else z + (g (z-1)))) in bind Y = (lambda f (lambda x in (f (x x))) (lambda x in (f (x x)))) in ((Y F) 5) ``` Ummm. Where is `sum` defined here? ``` == (((lambda x in (F (x x)))) (lambda x in (F (x x)))) 5) Apply the first lambda to the second lambda == ((F (x x)) 5) [(x,(lambda x in (F (x x))))] Expand F == (((lambda g in (lambda z in if z=0 then ...)) (x x)) 5) [(x,(lambda x in (F (x x))))] Bind g to (x x) and evaluate the body == (((lambda z in if z=0 then z else z + (g (z-1))) 5) [(g,(x x)),(x,lambda x in (F (x x))))] Bind 5 to z == (z + (g z-1)) [(z,5),(g,(x x)),(x,(lambda x in (F (x x))))] Substitute z == (5 + ((x x) 4))) [(z,5),(g,(x x)),(x,(lambda x in (F (x x))))] Substitute x == (5 + (lambda x in (F (x x)))(lambda x in (F (x x))) 4)) repeat... == (5+(4+(3+(2+(1+(lambda x in (F (x x)))(lambda x in (F (x x))) 0)))))) [(z,1),(z,2),(z,3),(z,4),(z,5),(g,(x x)),(x,(lambda x in (F (x x))))] == (5+(4+(3+(2+(1+(((lambda z in if z=0 then 0 else z + (g (z-1))) 0)))))) == (5+(4+(3+(2+(1+0))))) == 15 ``` # Midterm Review - Interpreters - Concrete and Abstract Syntax - Parsing and Interpretation - Values and terms - Predicting types - Optimization - Extending Languages - Simple Inference Rules - Identifiers and Substitution - Binding identifiers to values - Substitution and Interpretation - Environments and Deferring Substitution - Identifier Scoping - Static and Dynamic Scoping - Functions - Taxonomy of functions - first order, higher order, first class - Interpreting functions - Closures for functions There will be very little if any Haskell on the exam. You will be asked to analyze code written in our little languages like FBAE, BAE and friends. I will provide those language definitions for you, so there is no need to memorize them. # Typing functions ``` bind inc = lambda x in x + 1 in (inc inc) ``` This is a problem, but why? ``` bind x=1 in [(x,1)] Evaluation x+1 ``` ``` bind x=1 in [(x,TNum)] Type checking x+1 == x+1 [(x,TNum)] == TNum + TNum == TNum ``` Where do 1 and `TNum` come from? Let's do the same with lambda expressions: ``` (lambda x in [(x,???)] Evaluation x+1) == x+1 [(x,???)] (lambda x in [(x,???)] Type checking x+1) ((lambda x in x+1) 1) Now what??? ``` What's the issue here? - `T->T` is a _function type_ or _signature_ Our concrete syntax for types becomes: ``` T ::= TNum | TBool | T -> T ``` `->` can be thought of as a type constructor. We will often write `D->R` where: - D - domain type - R - range type What's different about a function type? Function types represent promises: - `TBool->TNum` - Input a Boolean and get back a number - `TNum->TNum->TNum` - Input a number and get back a function from number to number - `TNun->TNum->TBool` - Input a number and get back a function from number to Boolean Curry-Howard says function types are also theorems. We'll come back to that. Unsurprisingly `lambda` as a function type: `lambda x in t : D->R` ## Finding D and R Let's start with `bind` What is the type of `bind`? ``` [] |- bind x=1 in x+1 : T ``` $\Gamma=[\ ]$ derives `bind...` is of some type `T` `T` is the type of the _body_ of bind. That's what `bind` returns. To find the type of the body, add type of `x` to the empty context: ``` == [(x,Tnum)] |- x+1 : T == TNum + TNum : T == TNum ``` Where did we get that type? The type of `bind` is the type of the _body_ with the type of the _identifier_ added to it's context Does the same thing work for lambda? Let's look at the type of `bind` as `app` `bind x = 3 in x+1 == ((lambda x in x+1) 3)` - Where do we get `D`? - Where do we get `R`? - What happens when we pull the `lambda` out of the `app`? `(lambda x in x+1):D->R` - Where do we get `D`? - Where do we get `R`? ``` [] |- lambda x in x+1 : D->R == [(x,???)] |- x+1 : == ``` - We need the type of `x` to compute the type of the function body. - Where do we get it? - `D` is the type of the input parameter that we don't have. - `R` is the type of the body _given the parameter_ type. Just like `bind`. Let's assume the type of `x`: ``` [] |- lambda x:TNat in x+1 : TNat->TNat == [(x,TNat)] |- x+1 : TNat ``` Can we find `D` and `R`? - The type of `x` is `D` because we assumed it so - The type of the body assuming `x:D` is `R` ``` [] |- lambda x:TNum in [(x,TNum)] x+1 [(x,TNum)] == TNum + TNum == TNum == TNum -> TNum ``` If you give the lambda `x:D` you will get `z:R` What is the scope of `x`? `f::Int->Int->Int` `f x::Int->Int` `f x y::Int` ``` lambda x:TNum in lambda y:TNum in x+y lambda x:TNum in [(x,TNum)] lambda y:TNum in [(y,TNum),(x,TNum)] x+y:TNum == TNum -> TNum == TNum -> TNum -> TNum ``` ``` lambda f:TBool->TNum in [(f,TBool->TNum)] lambda a:TBool in [(a,TBool),(f,TBool->TNum)] f a : TNum == TBool->TNum == (TBool -> TNum) -> TBool -> TNum ``` ## Type Inference for Lambda The Haskell code: ```haskell typeof c (Lambda x d t) = do {r <- typeof (x,d):c t; return (d:->:r)} ``` Type Rule: $ \begin{prooftree} \AXC{$(x:D):\Gamma\vdash t:R$}\RLS{LambdaT} \UIC{$\Gamma\vdash(\mathsf{lambda}\ x:D\ \mathsf{in}\ t):D\rightarrow R$} \end{prooftree} $ `app` makes us keep our promise What is the type of: `((lambda x:TNum in x+1) 2 : T` We know the type of the lambda and the actual parameter: - `(lambda x:TNum in x+1) : TNum -> TNum` - `2 : TNum` What is the type of the `App`? What is the type of: `((lambda x:TNum in x+1) True : T` We know the type of the lambda and the actual parameter: - `(lambda x:TNum in x+1) : TNum -> TNum` - `True : TBool` What is the type of the `App`? Did we keep our promise? ## Type Inference for App The Haskell code: ```haskell typeof c (App f a) = do { D:->:R <- typeof c f; A <- typeof c a; if A=D then return R else Nothing} ``` Type rule: $ \begin{prooftree} \AXC{$\Gamma\vdash f:D\rightarrow R$}\AXC{$\Gamma\vdash a:D$}\RLS{TApp} \BIC{$\Gamma\vdash (f\ a):R$} \end{prooftree} $ Examples: ``` bind inc = lambda x:TNum in x+1 in [(inc,TNum->TNum)] inc 3 == inc : TNum->TNum, 3:TNum == TNum ``` ``` bind plus = (lambda x:TNum in [(x,TNum)] (lambda y:TNum in [(y,TNum),(x,TNum)] x+y)) in plus 3 4 : TNum [(plus,TNum->TNum->TNum)] plus 4 : (TNum->TNum) (plus 3) 4 : (TNum->TNum)->TNum TNum->(TNum->TNum) ``` ``` (bind app = lambda f:(TBool->TNum) in [(f,(TBool->TNun)] lambda a:TBool in [(a,TBool),(f,(TBool->TNun)] f a:TNum in [(app,(TBool->TNum)->TBool->TNum)] app (lambda x:TBool in if x then 1 else 0) : TBool -> TNum 3 :TNum) : TNum ``` # Simply Typed Lambda Calculus Functions and function types. Nothing else. ``` STC ::= id | lambda id in STC | STC STC T ::= T -> T ``` ``` lambda x:T in lambda y:T in x : T->T->T -- BAD No T ``` ``` lambda x:T in lambda y:T in ((x false) y) -- BAD No Booleans ``` ``` omega = (lambda x:T->T->T->T in (x x))(lambda x:T in (x x)) ``` We cannot define types for recursive functions. Actually, we cannot write functions at all. A perfectly fine language that we can write no programs in. Mathematical oddities abound... What would fix this problem? # Recursion (again) What we know so far about recursion 1. Recursion works in untyped, dynamically scoped FBAE (Lambda, Bind, AE) 2. Recursion does not work in untyped, statically scoped FBAE 3. Recursion does not work in typed FBAE ``` bind f=(lambda x in if x=0 then 1 else x*(f (x-1))) in [(f,(ClosureV “x” (if ...) []))] f 0 == (ClosureV “x” (if x=0 then 1 else x*(f (x-1))) []) 0 == if 0=0 then 1 else ..) == 1 f 1 == (ClosureV “x” if x=0 then 1 else x*(f (x-1)) []) 1 ==if 1=0 then 1 else 1*(f 1) [] ``` ### Solution ``` == (if x=0 then 1 else x*(f (x-1))) [(x,1),(f,(ClosureV ... []))] == 1*(f 0) [(x,1),(f,ClosureV ... []))] == 1*if 0=0 then 1 else == 1 ``` ## How to _fix_ this problem? ``` bind f=lambda x in if x=0 then 1 else x*(f (x-1)) ``` Interpret the lambda and get: ``` (ClosureV “x” TNum (if x=0 then 1 else x * (f (x-1))) []) ``` Apply to 0: ``` (ClosureV “x” TNum (if x=0 then 1 else x * (f (x-1))) []) 0 e == 1 ``` Now Apply to 1: ``` (ClosureV “x” TNum (if x=0 then 1 else x * (f (x-1))) []) 1 e == ``` Let's spike the closure with its own definition! ``` ((ClosureV “x” TNum (if x=0 then 1 else x*(fact (x-1)) [(fact,(ClosureV “x” TNum (if ...) []))]) 1 e == if 1=0 then 1 else 1*(fact 0) == 1*(ClosureV "x" TNum (if ...) []) == 1*1 ((ClosureV ...) 2 e) [(fact,(ClosureV “x” TNum (if ...) [(fact,(ClosureV “x” TNum (if ...) [])]))] == ``` Let's spike the closure's closure with its own definition! ``` ((ClosureV “x” TNum (if x=0 then 1 else x*(fact (x-1)) [(fact,(ClosureV “x” TNum (if ...)) [(fact,(ClosureV “x” TNum (if ...) []))]))]) 2 e == ``` What about 3? Pre-seeding the environment doesn't work. Why? What's the problem? ## The Fix Let's define a thing called `fix` that takes a function as its argument. When evaluated, `fix` will substitute its function argument into itself _before_ evaluation. What might that look like? ``` bind f = (lambda g in (lambda x in (if x=0 then 1 else x * (g (x-1)))) in ((fix f) 2) == ``` `f` takes two arguments, the first is a function `g` that defines the recursive call. Have we seen this? `fix` will replace `g` with `(fix f)`. `(fix f)` - not `f` - Then evaluate What does that do? Let's look at an inference rule: $ \begin{prooftree} \AXC{$[g\rightarrow(\mathsf{fix}\ (\mathsf{lambda}\ g\ \mathsf{in}\ t))]t\Downarrow v$}\RLS{FixE} \UIC{$(\mathsf{fix}\ (\mathsf{lambda}\ g\ \mathsf{in}\ t))\Downarrow v$} \end{prooftree} $ `fix` is copying itself _before_ evaluating. Remember substitution! Not quite as simple as throwing `fix` in front of a function. - We are effectively passing in the recursive call as a parameter - $g$ is the recursive function, not the data parameter - Need to extend the actual function to account for this new parameter ``` bind f = (lambda g in (lambda x in (if x=0 then 1 else x * (g (x-1)))) in (fact 2) == ((lambda x in (if x=0 then 1 else x * ((fix f) (x-1)))) 2) == (if 2=0 then 1 else 2 * ((fix f) (2-1))) == 2 * ((fix f) 1) == 2 * ((lambda x in (if x=0 then 1 else x * ((fix f) (x-1)))) 1) == 2 * (if 1=0 then 1 else 1 * ((fix f) (1-1))) == 2 * 1 * ((fix f) 0) == 2 * 1 * ((lambda x in (if x=0 then 1 else x * ((fix f) (x-1)))) 0) == 2 * 1 * (if 0=0 then 1 else 1 * ((fix f) 0-1))) == 2 * 1 * 1 ``` - `fix f` creates one recursive step at a time. - nothing is recursive? Really? Apply fix to the extended function and we get... ### Old Example ``` bind f = (lambda g in (lambda x in (if x=0 then 1 else x * (g (x-1)))) in ((fix f) 2) == [g->fix (lambda g b)]b == (lambda x in (if x=1 then 1 else x * (g (x-1)))) == (lambda x in (if x=1 then 1 else x * ((fix (lambda g b)) (x-1)) 2 == 2 * ((fix (lambda g b)) 1) == 2 * ((fix (lambda g (lambda x in if x=0 then 1 else x * (g (x-1))))) 1 == 2 * (lambda x in if x=0 then 1 else x (fix (lambda g b)) (x-1)))))) 1 == 2*1 == 2*1*1 == 2 ``` ## Extending FBAE for `fix` We have everything we need. First, the eval case: ```haskell eval e (Fix f) = do { (ClosureV g b e’) <- (eval e f); eval e’ (subst g (Fix (Lambda g b)) b) } ``` This is the same `subst` that we defined before Look carefully at what is being substituted. What is it?? ## Typing Fix `f` == the non-recursive form of `f` `(fix f)` == the recursive form of function `f` What is the type of `(fix f)`? ``` (fix f):D->R ``` What is the type of the lambda for factorial? ``` f = (lambda g:(D->R)->D->R in (lambda x:TNat in if x=0 then 1 else x*(g (x-1)))) ``` And the fix? ``` (fix f):D->R ``` $ \begin{prooftree} \AXC{$\Gamma\vdash f:D\rightarrow R$} \UIC{$\Gamma\vdash (\textsf{fix}\ f):R$} \end{prooftree} $ What is the type of the lambda for factorial? ``` f = (lambda g:TNat->TNat in (lambda x:TNat in if x=0 then 1 else x*(g (x-1)))) (fix f):TNat->TNat f:(TNat->TNat)->(TNat->TNat) ``` And now for something completely different... # Mutable State The big difference between Haskell and C ``` ... x=x+3 ; y=x++ ... ``` What does the `“;”` do? ``` bind x=(f a) in (g 3) == ((lambda _ in (g 3)) (f a)) ``` What’s happening besides substitution? ``` do { l’ <- eval l ; r’ <- eval r ; return (op l’ r’) } ``` What’s happening besides binding? Order! We tend to think of two paradigms: - functional - ordering is achieved via calling convention - imperative - ordering is achieved via sequencing mutable state What’s happening in this sequence? Where does x get a value? ``` x := 1; x := x + 1; x := x + 1; ... ``` `:=` is an assignment that updates a variable in the next state ``` x := x+1 == x’ = x+1 ``` The tick notation traditionally means next. The next `x` is the current `x` plus 1 ## Adding State to FBAE How is _state_ different than _environment_? 1. A Seq sequence operator 2. Define a store 3. Update eval to maintain state in addition to environment New concrete syntax: `t ; t` New abstract syntax: `Seq::FBAE->FBAE->FBAE` Define sequence by elaboration: `Seq l r` == `(App (Lambda x r) l)` ``` eval Seq l r = do { s <- eval q l; eval s r } ``` Interesting syntax! We'll skip typing for now. ## New operations for state ``` FBAE := new FBAE | deref FBAE | set loc FBAE ``` - `loc` - new type of value for locations - `new t` - creates a new location, puts t there, returns the location - `deref l` - retrieves a value from l and returns it - `set l t` - stores t in location l and returns l ``` deref (new 5) == 5 ``` ``` deref (set (new 5) 6) == 6 ``` ``` bind l = new 2+3 in deref l == 5 ``` ``` bind l = new 2+3 in set l ((deref l) + 1) ; (deref l) == ``` ``` l := r == (set l (deref r)) ``` ``` bind m = new 5 in bind n = m in set m 6 ; deref n == 6 ``` ``` bind m = new 5 in bind n = m in bind n = new 5 in set m 6 ; deref n == 5 ``` ``` bind inc = (lambda l in (set l ((deref l) + 1))) bind n = new 5 inc n ; deref n == 6 ``` ## Implementing State Store as a function with location as a number We could use an array or a sequence, but let's try a technique used in formal modeling. ``` type Sto = Loc -> Maybe FBAEVal type Loc = Int ``` `Sto` is the store and is a function from `Loc` to `Maybe FBAEVal`. Why a `Maybe`? - `Just x` - a good location is accessed. - `Nothing` - a bad location is accessed. What are good and bad memory locations? What does this have to do with language design? Memory contains `FBAEVal`. - How is this different from a C pointer? - How is this different from a Java reference? What does this have to do with language design? Dereferencing is just using the store as a function: ``` derefSto s l = (s l) ``` The initial store is a store with nothing in it. ``` initSto :: Sto initSto x = Nothing ``` - What is `initSto 3`? - What is `initSto` for any value? Updating the store is a bit trickier... `m0 = \l -> if l=3 then 1 else Nothing` `m1 = \l -> if l=1 then 2 else (m0 l)` `m2 = \l -> if l=2 then 0 else (m1 l)` `m3 = \l -> if l=3 then 4 else (m2 l)` ``` setSto :: Sto -> Loc -> FBAEVal -> Sto setSto s l v = \m -> if m==l then (Just v) else (s m) ``` - `\x -> t` is `lambda x in t` in FBAE - Given that, how does this work? - Why would I ever do this? The store becomes a collection of nested if statements. Not the most efficient, but it does what it’s supposed to for our purposes. Putting things together to model memory: ``` type Stor (Loc,Sto) ``` A location, store pair. - `loc` is the next memory element - `sto` is the current memory ``` derefStore (_,s) l = derefSto s l ``` Dereferencing is accessing the memory location ``` initStore = (0,initSto) ``` Initialize memory is the initial, empty memory with a next value of 0. ``` setStore (m,s) l v = (m,setSto s l v) ``` Storing a value ``` newStore (l,s) = (l+1,s) ``` Allocating memory location. - Now we know what `l` is for - Keep track of where the next memory location should be. ## Integrating Mutable Store How does store differ from environment? - Changes to store persist across scopes - Locations are values Let's add a constructor for locations ``` data FBAEVal = ... Loc :: Int -> FBAEVal ``` Seems simple enough - `eval` can return a location - Locations can be calculated? Maybe? - Locations can be stored? Maybe? Our choices are important. Even the tiny ones. How to implement mutability of `Store`?? Start with a new return value: ``` type Retval = (FBAEVal,Store) ``` `Retval` is the return value for our interpreter - A value from FBAE - The resulting store What's this for?? ``` eval :: Env -> Store -> FBAE -> Maybe Retval ``` `eval` now takes an environment _and_ a store. - `Env` - identifiers and values in scope - `Store` - contents and current location in memory. What should an initial call to `eval` look like? - environment - store - term `x:=x+1 ; x:=x+1` ``` eval e s (Seq l r) = do {(v,s') <- (eval e s l) ; (v',s'') <- (eval e s' r) ; return (v',s'')} ``` - One thing, then the other.... ``` eval e s (Set l t) = do {(v,s') <- (eval e s t) ; (v,s'') <- (setStore s' l v) ; return (v,s'')} ``` - All that work finally pays off - `Set` calls `setStore` - What is the pattern matching in the `do` clause doing? ``` eval e s (Plus l r) = do {((NumV l'),s') <- (eval e s l) ; ((NumV r');,s'') <- (eval e s' r) ; return ((NumV l'+r'),s'')} ``` This passing of state is the new idiom: ``` eval e s (Deref t) = do {((Loc l'),s') <- (eval e s t) ; (derefStore (Loc l') s'} ``` Let's unpack `New` and figure out what's going on: ``` eval e s (New t) = do { (t’,(l,s’)) <- eval e s t ; return ((Loc l),(setStore (newStore (l,s’)) l t’)) } ``` Wow. What the heck? - `(t',(l,s'))` - `t'` value to be stored in `s'` in `l` - `(Loc l+1)` - next location after `l` - `(newStore (l,s'))` - new store with `l` allocated - `(setStore ...)` - stores `t' `` # Variables and Assignment Let's do this together... We have utilities for mutable store. What things do we need? - Variable declaration - Variable dereference - Variable assignment We want to control the way that memory gets accessed. - No more `set`, `deref`, `new` ## Variable Declaration - What tools might we have? `var x:=t1` == `bind x=new t1 in t2` ``` var x:=0 var y:=1 var z:=3 ``` ## Variable Dereference - It would be nice to just say `x+1` - Is that possible if `x` is a variable 1. Get var location value from e 2. Get value value from s by dereferencing the location `(deref (lookup x e) s)` - value stored in the location stored in `x` ## Variable Assignment - What tools might we have? `x := t == (set x t) ``` Asn :: FBAE -> FBAE -> FBAE x := x+y+z ``` - Evaluate the right side - Already declared variable - Variable - Identifier whose value can be changed `(set x t)` ## Type Checking Variables - What tools might we have? - `bind x=new 3 in x:=x+1` - `var x:=3; [x:NumT] `x:=x+1` ``` (LocT tl) <- typeof c l tr <- typeof c l (Loc tv) <- lookup x c ``` `LocT :: TFBAE -> TFBEA` ## Throwing Errors - What tools might we have? `eval _ bang = Nothing` ## While Loop - What tools might we have? `while c do t == if c then (t ; while c do t) else skip` `while c do t == lambda f in lambda ` ``` while x<5 do x:=x+1 == w = lambda _ in if x<5 then x:=x+1 ; w _ else skip == lambda w in lambda _ in if x<5 then x:=x+1 ; w _ else skip == fix (lambda w in (lambda _ in if x<5 then x:=x+1 ; w _ else skip)) ``` ## Goto - Should we bother? ## Lists ## Objects & Object Oriented ``` let F=(lambda f in (lambda x in if x=0 then 1 else x*(f x-1))) in let fact=(fix F) (fact 3) ``` ``` let fact = (lambda x in if x=0 then 1 else x*(fact (x-1))) in (fact 3) ``` ``` typeofM c (Plus t1 t2) = do {NumT <- typeofM c t1; NumT <- typeofM c t2; return NumT } ``` `D -> R` ``` typeofM c (Lambda x D b) = do {R <- typeofM (x,D):c b; return (D :->: R)} ``` ``` typeofM c (Bind x t1 t2) = do {tx <- (typeofM c t1); (typeofM (x,tx):c t2)} ``` ``` bind n=3 in [(n,3)] bind f=(lambda x in x+n) in [(f,lambda...),(n,3)] bind n=1 in [(n,1),(f,lambda...),(n,3)] (f n) == x+n [(x,1),(n,1),(f,lambda...)(n,3)] == 2 Dynamic Scoping ``` ``` bind n=3 in [(n,3)] bind f=(lambda x in x+n) in [(f,(ClosureV x x+n [(n,3)]),(n,3)] bind n=1 in [(n,1),(f,(ClosureV x x+n [(n,3)]),(n,3)] (f n) == x+n [(x,1),(n,3)] == 4 Static Scoping ``` ``` bind x=v in b == ((lambda x in b) v) ``` ``` bind z=3+4 in z+z match with left side of elaboration rule x=z v=3+4 b=z+z substitute into right side ((lambda z in z+z) 3+4) ```