diff --git a/llvm/docs/tutorial/OCamlLangImpl1.rst b/llvm/docs/tutorial/OCamlLangImpl1.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl1.rst +++ /dev/null @@ -1,285 +0,0 @@ -================================================= -Kaleidoscope: Tutorial Introduction and the Lexer -================================================= - -.. contents:: - :local: - -Tutorial Introduction -===================== - -Welcome to the "Implementing a language with LLVM" tutorial. This -tutorial runs through the implementation of a simple language, showing -how fun and easy it can be. This tutorial will get you up and started as -well as help to build a framework you can extend to other languages. The -code in this tutorial can also be used as a playground to hack on other -LLVM specific things. - -The goal of this tutorial is to progressively unveil our language, -describing how it is built up over time. This will let us cover a fairly -broad range of language design and LLVM-specific usage issues, showing -and explaining the code for it all along the way, without overwhelming -you with tons of details up front. - -It is useful to point out ahead of time that this tutorial is really -about teaching compiler techniques and LLVM specifically, *not* about -teaching modern and sane software engineering principles. In practice, -this means that we'll take a number of shortcuts to simplify the -exposition. For example, the code leaks memory, uses global variables -all over the place, doesn't use nice design patterns like -`visitors `_, etc... but -it is very simple. If you dig in and use the code as a basis for future -projects, fixing these deficiencies shouldn't be hard. - -I've tried to put this tutorial together in a way that makes chapters -easy to skip over if you are already familiar with or are uninterested -in the various pieces. The structure of the tutorial is: - -- `Chapter #1 <#language>`_: Introduction to the Kaleidoscope - language, and the definition of its Lexer - This shows where we are - going and the basic functionality that we want it to do. In order to - make this tutorial maximally understandable and hackable, we choose - to implement everything in Objective Caml instead of using lexer and - parser generators. LLVM obviously works just fine with such tools, - feel free to use one if you prefer. -- `Chapter #2 `_: Implementing a Parser and - AST - With the lexer in place, we can talk about parsing techniques - and basic AST construction. This tutorial describes recursive descent - parsing and operator precedence parsing. Nothing in Chapters 1 or 2 - is LLVM-specific, the code doesn't even link in LLVM at this point. - :) -- `Chapter #3 `_: Code generation to LLVM IR - - With the AST ready, we can show off how easy generation of LLVM IR - really is. -- `Chapter #4 `_: Adding JIT and Optimizer - Support - Because a lot of people are interested in using LLVM as a - JIT, we'll dive right into it and show you the 3 lines it takes to - add JIT support. LLVM is also useful in many other ways, but this is - one simple and "sexy" way to shows off its power. :) -- `Chapter #5 `_: Extending the Language: - Control Flow - With the language up and running, we show how to - extend it with control flow operations (if/then/else and a 'for' - loop). This gives us a chance to talk about simple SSA construction - and control flow. -- `Chapter #6 `_: Extending the Language: - User-defined Operators - This is a silly but fun chapter that talks - about extending the language to let the user program define their own - arbitrary unary and binary operators (with assignable precedence!). - This lets us build a significant piece of the "language" as library - routines. -- `Chapter #7 `_: Extending the Language: - Mutable Variables - This chapter talks about adding user-defined - local variables along with an assignment operator. The interesting - part about this is how easy and trivial it is to construct SSA form - in LLVM: no, LLVM does *not* require your front-end to construct SSA - form! -- `Chapter #8 `_: Conclusion and other useful - LLVM tidbits - This chapter wraps up the series by talking about - potential ways to extend the language, but also includes a bunch of - pointers to info about "special topics" like adding garbage - collection support, exceptions, debugging, support for "spaghetti - stacks", and a bunch of other tips and tricks. - -By the end of the tutorial, we'll have written a bit less than 700 lines -of non-comment, non-blank, lines of code. With this small amount of -code, we'll have built up a very reasonable compiler for a non-trivial -language including a hand-written lexer, parser, AST, as well as code -generation support with a JIT compiler. While other systems may have -interesting "hello world" tutorials, I think the breadth of this -tutorial is a great testament to the strengths of LLVM and why you -should consider it if you're interested in language or compiler design. - -A note about this tutorial: we expect you to extend the language and -play with it on your own. Take the code and go crazy hacking away at it, -compilers don't need to be scary creatures - it can be a lot of fun to -play with languages! - -The Basic Language -================== - -This tutorial will be illustrated with a toy language that we'll call -"`Kaleidoscope `_" (derived -from "meaning beautiful, form, and view"). Kaleidoscope is a procedural -language that allows you to define functions, use conditionals, math, -etc. Over the course of the tutorial, we'll extend Kaleidoscope to -support the if/then/else construct, a for loop, user defined operators, -JIT compilation with a simple command line interface, etc. - -Because we want to keep things simple, the only datatype in Kaleidoscope -is a 64-bit floating point type (aka 'float' in OCaml parlance). As -such, all values are implicitly double precision and the language -doesn't require type declarations. This gives the language a very nice -and simple syntax. For example, the following simple example computes -`Fibonacci numbers: `_ - -:: - - # Compute the x'th fibonacci number. - def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2) - - # This expression will compute the 40th number. - fib(40) - -We also allow Kaleidoscope to call into standard library functions (the -LLVM JIT makes this completely trivial). This means that you can use the -'extern' keyword to define a function before you use it (this is also -useful for mutually recursive functions). For example: - -:: - - extern sin(arg); - extern cos(arg); - extern atan2(arg1 arg2); - - atan2(sin(.4), cos(42)) - -A more interesting example is included in Chapter 6 where we write a -little Kaleidoscope application that `displays a Mandelbrot -Set `_ at various levels of magnification. - -Lets dive into the implementation of this language! - -The Lexer -========= - -When it comes to implementing a language, the first thing needed is the -ability to process a text file and recognize what it says. The -traditional way to do this is to use a -"`lexer `_" (aka -'scanner') to break the input up into "tokens". Each token returned by -the lexer includes a token code and potentially some metadata (e.g. the -numeric value of a number). First, we define the possibilities: - -.. code-block:: ocaml - - (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) - type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - -Each token returned by our lexer will be one of the token variant -values. An unknown character like '+' will be returned as -``Token.Kwd '+'``. If the curr token is an identifier, the value will be -``Token.Ident s``. If the current token is a numeric literal (like 1.0), -the value will be ``Token.Number 1.0``. - -The actual implementation of the lexer is a collection of functions -driven by a function named ``Lexer.lex``. The ``Lexer.lex`` function is -called to return the next token from standard input. We will use -`Camlp4 `_ to -simplify the tokenization of the standard input. Its definition starts -as: - -.. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - - let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - -``Lexer.lex`` works by recursing over a ``char Stream.t`` to read -characters one at a time from the standard input. It eats them as it -recognizes them and stores them in a ``Token.token`` variant. The -first thing that it has to do is ignore whitespace between tokens. This -is accomplished with the recursive call above. - -The next thing ``Lexer.lex`` needs to do is recognize identifiers and -specific keywords like "def". Kaleidoscope does this with a pattern -match and a helper function. - -.. code-block:: ocaml - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - ... - - and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -Numeric values are similar: - -.. code-block:: ocaml - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - ... - - and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -This is all pretty straight-forward code for processing input. When -reading a numeric value from input, we use the ocaml ``float_of_string`` -function to convert it to a numeric value that we store in -``Token.Number``. Note that this isn't doing sufficient error checking: -it will raise ``Failure`` if the string "1.23.45.67". Feel free to -extend it :). Next we handle comments: - -.. code-block:: ocaml - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - ... - - and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] - -We handle comments by skipping to the end of the line and then return -the next token. Finally, if the input doesn't match one of the above -cases, it is either an operator character like '+' or the end of the -file. These are handled with this code: - -.. code-block:: ocaml - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -With this, we have the complete lexer for the basic Kaleidoscope -language (the `full code listing `_ for the -Lexer is available in the `next chapter `_ of the -tutorial). Next we'll `build a simple parser that uses this to build an -Abstract Syntax Tree `_. When we have that, we'll -include a driver so that you can use the lexer and parser together. - -`Next: Implementing a Parser and AST `_ - diff --git a/llvm/docs/tutorial/OCamlLangImpl2.rst b/llvm/docs/tutorial/OCamlLangImpl2.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl2.rst +++ /dev/null @@ -1,899 +0,0 @@ -=========================================== -Kaleidoscope: Implementing a Parser and AST -=========================================== - -.. contents:: - :local: - -Chapter 2 Introduction -====================== - -Welcome to Chapter 2 of the "`Implementing a language with LLVM in -Objective Caml `_" tutorial. This chapter shows you how to -use the lexer, built in `Chapter 1 `_, to build a -full `parser `_ for our -Kaleidoscope language. Once we have a parser, we'll define and build an -`Abstract Syntax -Tree `_ (AST). - -The parser we will build uses a combination of `Recursive Descent -Parsing `_ and -`Operator-Precedence -Parsing `_ to -parse the Kaleidoscope language (the latter for binary expressions and -the former for everything else). Before we get to parsing though, lets -talk about the output of the parser: the Abstract Syntax Tree. - -The Abstract Syntax Tree (AST) -============================== - -The AST for a program captures its behavior in such a way that it is -easy for later stages of the compiler (e.g. code generation) to -interpret. We basically want one object for each construct in the -language, and the AST should closely model the language. In -Kaleidoscope, we have expressions, a prototype, and a function object. -We'll start with expressions first: - -.. code-block:: ocaml - - (* expr - Base type for all expression nodes. *) - type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - -The code above shows the definition of the base ExprAST class and one -subclass which we use for numeric literals. The important thing to note -about this code is that the Number variant captures the numeric value of -the literal as an instance variable. This allows later phases of the -compiler to know what the stored numeric value is. - -Right now we only create the AST, so there are no useful functions on -them. It would be very easy to add a function to pretty print the code, -for example. Here are the other expression AST node definitions that -we'll use in the basic form of the Kaleidoscope language: - -.. code-block:: ocaml - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -This is all (intentionally) rather straight-forward: variables capture -the variable name, binary operators capture their opcode (e.g. '+'), and -calls capture a function name as well as a list of any argument -expressions. One thing that is nice about our AST is that it captures -the language features without talking about the syntax of the language. -Note that there is no discussion about precedence of binary operators, -lexical structure, etc. - -For our basic language, these are all of the expression nodes we'll -define. Because it doesn't have conditional control flow, it isn't -Turing-complete; we'll fix that in a later installment. The two things -we need next are a way to talk about the interface to a function, and a -way to talk about functions themselves: - -.. code-block:: ocaml - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = Prototype of string * string array - - (* func - This type represents a function definition itself. *) - type func = Function of proto * expr - -In Kaleidoscope, functions are typed with just a count of their -arguments. Since all values are double precision floating point, the -type of each argument doesn't need to be stored anywhere. In a more -aggressive and realistic language, the "expr" variants would probably -have a type field. - -With this scaffolding, we can now talk about parsing expressions and -function bodies in Kaleidoscope. - -Parser Basics -============= - -Now that we have an AST to build, we need to define the parser code to -build it. The idea here is that we want to parse something like "x+y" -(which is returned as three tokens by the lexer) into an AST that could -be generated with calls like this: - -.. code-block:: ocaml - - let x = Variable "x" in - let y = Variable "y" in - let result = Binary ('+', x, y) in - ... - -The error handling routines make use of the builtin ``Stream.Failure`` -and ``Stream.Error``s. ``Stream.Failure`` is raised when the parser is -unable to find any matching token in the first position of a pattern. -``Stream.Error`` is raised when the first token matches, but the rest do -not. The error recovery in our parser will not be the best and is not -particular user-friendly, but it will be enough for our tutorial. These -exceptions make it easier to handle errors in routines that have various -return types. - -With these basic types and exceptions, we can implement the first piece -of our grammar: numeric literals. - -Basic Expression Parsing -======================== - -We start with numeric literals, because they are the simplest to -process. For each production in our grammar, we'll define a function -which parses that production. We call this class of expressions -"primary" expressions, for reasons that will become more clear `later in -the tutorial `_. In order to parse an -arbitrary primary expression, we need to determine what sort of -expression it is. For numeric literals, we have: - -.. code-block:: ocaml - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) - parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - -This routine is very simple: it expects to be called when the current -token is a ``Token.Number`` token. It takes the current number value, -creates a ``Ast.Number`` node, advances the lexer to the next token, and -finally returns. - -There are some interesting aspects to this. The most important one is -that this routine eats all of the tokens that correspond to the -production and returns the lexer buffer with the next token (which is -not part of the grammar production) ready to go. This is a fairly -standard way to go for recursive descent parsers. For a better example, -the parenthesis operator is defined like this: - -.. code-block:: ocaml - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - -This function illustrates a number of interesting things about the -parser: - -1) It shows how we use the ``Stream.Error`` exception. When called, this -function expects that the current token is a '(' token, but after -parsing the subexpression, it is possible that there is no ')' waiting. -For example, if the user types in "(4 x" instead of "(4)", the parser -should emit an error. Because errors can occur, the parser needs a way -to indicate that they happened. In our parser, we use the camlp4 -shortcut syntax ``token ?? "parse error"``, where if the token before -the ``??`` does not match, then ``Stream.Error "parse error"`` will be -raised. - -2) Another interesting aspect of this function is that it uses recursion -by calling ``Parser.parse_primary`` (we will soon see that -``Parser.parse_primary`` can call ``Parser.parse_primary``). This is -powerful because it allows us to handle recursive grammars, and keeps -each production very simple. Note that parentheses do not cause -construction of AST nodes themselves. While we could do it this way, the -most important role of parentheses are to guide the parser and provide -grouping. Once the parser constructs the AST, parentheses are not -needed. - -The next simple production is for handling variable references and -function calls: - -.. code-block:: ocaml - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - -This routine follows the same style as the other routines. (It expects -to be called if the current token is a ``Token.Ident`` token). It also -has recursion and error handling. One interesting aspect of this is that -it uses *look-ahead* to determine if the current identifier is a stand -alone variable reference or if it is a function call expression. It -handles this by checking to see if the token after the identifier is a -'(' token, constructing either a ``Ast.Variable`` or ``Ast.Call`` node -as appropriate. - -We finish up by raising an exception if we received a token we didn't -expect: - -.. code-block:: ocaml - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -Now that basic expressions are handled, we need to handle binary -expressions. They are a bit more complex. - -Binary Expression Parsing -========================= - -Binary expressions are significantly harder to parse because they are -often ambiguous. For example, when given the string "x+y\*z", the parser -can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common -definitions from mathematics, we expect the later parse, because "\*" -(multiplication) has higher *precedence* than "+" (addition). - -There are many ways to handle this, but an elegant and efficient way is -to use `Operator-Precedence -Parsing `_. -This parsing technique uses the precedence of binary operators to guide -recursion. To start with, we need a table of precedences: - -.. code-block:: ocaml - - (* binop_precedence - This holds the precedence for each binary operator that is - * defined *) - let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - - (* precedence - Get the precedence of the pending binary operator token. *) - let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - - ... - - let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - ... - -For the basic form of Kaleidoscope, we will only support 4 binary -operators (this can obviously be extended by you, our brave and intrepid -reader). The ``Parser.precedence`` function returns the precedence for -the current token, or -1 if the token is not a binary operator. Having a -``Hashtbl.t`` makes it easy to add new operators and makes it clear that -the algorithm doesn't depend on the specific operators involved, but it -would be easy enough to eliminate the ``Hashtbl.t`` and do the -comparisons in the ``Parser.precedence`` function. (Or just use a -fixed-size array). - -With the helper above defined, we can now start parsing binary -expressions. The basic idea of operator precedence parsing is to break -down an expression with potentially ambiguous binary operators into -pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g". -Operator precedence parsing considers this as a stream of primary -expressions separated by binary operators. As such, it will first parse -the leading primary expression "a", then it will see the pairs [+, b] -[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are -primary expressions, the binary expression parser doesn't need to worry -about nested subexpressions like (c+d) at all. - -To start, an expression is a primary expression potentially followed by -a sequence of [binop,primaryexpr] pairs: - -.. code-block:: ocaml - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -``Parser.parse_bin_rhs`` is the function that parses the sequence of -pairs for us. It takes a precedence and a pointer to an expression for -the part that has been parsed so far. Note that "x" is a perfectly valid -expression: As such, "binoprhs" is allowed to be empty, in which case it -returns the expression that is passed into it. In our example above, the -code passes the expression for "a" into ``Parser.parse_bin_rhs`` and the -current token is "+". - -The precedence value passed into ``Parser.parse_bin_rhs`` indicates the -*minimal operator precedence* that the function is allowed to eat. For -example, if the current pair stream is [+, x] and -``Parser.parse_bin_rhs`` is passed in a precedence of 40, it will not -consume any tokens (because the precedence of '+' is only 20). With this -in mind, ``Parser.parse_bin_rhs`` starts with: - -.. code-block:: ocaml - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - -This code gets the precedence of the current token and checks to see if -if is too low. Because we defined invalid tokens to have a precedence of --1, this check implicitly knows that the pair-stream ends when the token -stream runs out of binary operators. If this check succeeds, we know -that the token is a binary operator and that it will be included in this -expression: - -.. code-block:: ocaml - - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - -As such, this code eats (and remembers) the binary operator and then -parses the primary expression that follows. This builds up the whole -pair, the first of which is [+, b] for the running example. - -Now that we parsed the left-hand side of an expression and one pair of -the RHS sequence, we have to decide which way the expression associates. -In particular, we could have "(a+b) binop unparsed" or "a + (b binop -unparsed)". To determine this, we look ahead at "binop" to determine its -precedence and compare it to BinOp's precedence (which is '+' in this -case): - -.. code-block:: ocaml - - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - -If the precedence of the binop to the right of "RHS" is lower or equal -to the precedence of our current operator, then we know that the -parentheses associate as "(a+b) binop ...". In our example, the current -operator is "+" and the next operator is "+", we know that they have the -same precedence. In this case we'll create the AST node for "a+b", and -then continue parsing: - -.. code-block:: ocaml - - ... if body omitted ... - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - -In our example above, this will turn "a+b+" into "(a+b)" and execute the -next iteration of the loop, with "+" as the current token. The code -above will eat, remember, and parse "(c+d)" as the primary expression, -which makes the current pair equal to [+, (c+d)]. It will then evaluate -the 'if' conditional above with "\*" as the binop to the right of the -primary. In this case, the precedence of "\*" is higher than the -precedence of "+" so the if condition will be entered. - -The critical question left here is "how can the if condition parse the -right hand side in full"? In particular, to build the AST correctly for -our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression -variable. The code to do this is surprisingly simple (code from the -above two blocks duplicated for context): - -.. code-block:: ocaml - - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - if token_prec < precedence c2 - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - -At this point, we know that the binary operator to the RHS of our -primary has higher precedence than the binop we are currently parsing. -As such, we know that any sequence of pairs whose operators are all -higher precedence than "+" should be parsed together and returned as -"RHS". To do this, we recursively invoke the ``Parser.parse_bin_rhs`` -function specifying "token\_prec+1" as the minimum precedence required -for it to continue. In our example above, this will cause it to return -the AST node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of -the '+' expression. - -Finally, on the next iteration of the while loop, the "+g" piece is -parsed and added to the AST. With this little bit of code (14 -non-trivial lines), we correctly handle fully general binary expression -parsing in a very elegant way. This was a whirlwind tour of this code, -and it is somewhat subtle. I recommend running through it with a few -tough examples to see how it works. - -This wraps up handling of expressions. At this point, we can point the -parser at an arbitrary token stream and build an expression from it, -stopping at the first token that is not part of the expression. Next up -we need to handle function definitions, etc. - -Parsing the Rest -================ - -The next thing missing is handling of function prototypes. In -Kaleidoscope, these are used both for 'extern' function declarations as -well as function body definitions. The code to do this is -straight-forward and not very interesting (once you've survived -expressions): - -.. code-block:: ocaml - - (* prototype - * ::= id '(' id* ')' *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -Given this, a function definition is very simple, just a prototype plus -an expression to implement the body: - -.. code-block:: ocaml - - (* definition ::= 'def' prototype expression *) - let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -In addition, we support 'extern' to declare functions like 'sin' and -'cos' as well as to support forward declaration of user functions. These -'extern's are just prototypes with no body: - -.. code-block:: ocaml - - (* external ::= 'extern' prototype *) - let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e - -Finally, we'll also let the user type in arbitrary top-level expressions -and evaluate them on the fly. We will handle this by defining anonymous -nullary (zero argument) functions for them: - -.. code-block:: ocaml - - (* toplevelexpr ::= expression *) - let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -Now that we have all the pieces, let's build a little driver that will -let us actually *execute* this code we've built! - -The Driver -========== - -The driver for this simply invokes all of the parsing pieces with a -top-level dispatch loop. There isn't much interesting here, so I'll just -include the top-level loop. See `below <#full-code-listing>`_ for full code in the -"Top-Level Parsing" section. - -.. code-block:: ocaml - - (* top ::= definition | external | expression | ';' *) - let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - ignore(Parser.parse_definition stream); - print_endline "parsed a function definition."; - | Token.Extern -> - ignore(Parser.parse_extern stream); - print_endline "parsed an extern."; - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - ignore(Parser.parse_toplevel stream); - print_endline "parsed a top-level expr"; - with Stream.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream - -The most interesting part of this is that we ignore top-level -semicolons. Why is this, you ask? The basic reason is that if you type -"4 + 5" at the command line, the parser doesn't know whether that is the -end of what you will type or not. For example, on the next line you -could type "def foo..." in which case 4+5 is the end of a top-level -expression. Alternatively you could type "\* 6", which would continue -the expression. Having top-level semicolons allows you to type "4+5;", -and the parser will know you are done. - -Conclusions -=========== - -With just under 300 lines of commented code (240 lines of non-comment, -non-blank code), we fully defined our minimal language, including a -lexer, parser, and AST builder. With this done, the executable will -validate Kaleidoscope code and tell us if it is grammatically invalid. -For example, here is a sample interaction: - -.. code-block:: bash - - $ ./toy.byte - ready> def foo(x y) x+foo(y, 4.0); - Parsed a function definition. - ready> def foo(x y) x+y y; - Parsed a function definition. - Parsed a top-level expr - ready> def foo(x y) x+y ); - Parsed a function definition. - Error: unknown token when expecting an expression - ready> extern sin(a); - ready> Parsed an extern - ready> ^D - $ - -There is a lot of room for extension here. You can define new AST nodes, -extend the language in many ways, etc. In the `next -installment `_, we will describe how to generate -LLVM Intermediate Representation (IR) from the AST. - -Full Code Listing -================= - -Here is the complete code listing for this and the previous chapter. -Note that it is fully self-contained: you don't need LLVM or any -external libraries at all for this. (Besides the ocaml standard -libraries, of course.) To build this, just compile with: - -.. code-block:: bash - - # Compile - ocamlbuild toy.byte - # Run - ./toy.byte - -Here is the code: - -\_tags: - :: - - <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) - -token.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - - (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) - type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - -lexer.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - - let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - - and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - - and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - - and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] - -ast.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - - (* expr - Base type for all expression nodes. *) - type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = Prototype of string * string array - - (* func - This type represents a function definition itself. *) - type func = Function of proto * expr - -parser.ml: - .. code-block:: ocaml - - (*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - - (* binop_precedence - This holds the precedence for each binary operator that is - * defined *) - let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - - (* precedence - Get the precedence of the pending binary operator token. *) - let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) - let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - - (* prototype - * ::= id '(' id* ')' *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - - (* definition ::= 'def' prototype expression *) - let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - - (* toplevelexpr ::= expression *) - let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - - (* external ::= 'extern' prototype *) - let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e - -toplevel.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - - (* top ::= definition | external | expression | ';' *) - let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - ignore(Parser.parse_definition stream); - print_endline "parsed a function definition."; - | Token.Extern -> - ignore(Parser.parse_extern stream); - print_endline "parsed an extern."; - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - ignore(Parser.parse_toplevel stream); - print_endline "parsed a top-level expr"; - with Stream.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream - -toy.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - - let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; - ;; - - main () - -`Next: Implementing Code Generation to LLVM IR `_ - diff --git a/llvm/docs/tutorial/OCamlLangImpl3.rst b/llvm/docs/tutorial/OCamlLangImpl3.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl3.rst +++ /dev/null @@ -1,961 +0,0 @@ -======================================== -Kaleidoscope: Code generation to LLVM IR -======================================== - -.. contents:: - :local: - -Chapter 3 Introduction -====================== - -Welcome to Chapter 3 of the "`Implementing a language with -LLVM `_" tutorial. This chapter shows you how to transform -the `Abstract Syntax Tree `_, built in Chapter 2, -into LLVM IR. This will teach you a little bit about how LLVM does -things, as well as demonstrate how easy it is to use. It's much more -work to build a lexer and parser than it is to generate LLVM IR code. :) - -**Please note**: the code in this chapter and later require LLVM 2.3 or -LLVM SVN to work. LLVM 2.2 and before will not work with it. - -Code Generation Setup -===================== - -In order to generate LLVM IR, we want some simple setup to get started. -First we define virtual code generation (codegen) methods in each AST -class: - -.. code-block:: ocaml - - let rec codegen_expr = function - | Ast.Number n -> ... - | Ast.Variable name -> ... - -The ``Codegen.codegen_expr`` function says to emit IR for that AST node -along with all the things it depends on, and they all return an LLVM -Value object. "Value" is the class used to represent a "`Static Single -Assignment -(SSA) `_ -register" or "SSA value" in LLVM. The most distinct aspect of SSA values -is that their value is computed as the related instruction executes, and -it does not get a new value until (and if) the instruction re-executes. -In other words, there is no way to "change" an SSA value. For more -information, please read up on `Static Single -Assignment `_ -- the concepts are really quite natural once you grok them. - -The second thing we want is an "Error" exception like we used for the -parser, which will be used to report errors found during code generation -(for example, use of an undeclared parameter): - -.. code-block:: ocaml - - exception Error of string - - let context = global_context () - let the_module = create_module context "my cool jit" - let builder = builder context - let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 - let double_type = double_type context - -The static variables will be used during code generation. -``Codegen.the_module`` is the LLVM construct that contains all of the -functions and global variables in a chunk of code. In many ways, it is -the top-level structure that the LLVM IR uses to contain code. - -The ``Codegen.builder`` object is a helper object that makes it easy to -generate LLVM instructions. Instances of the -`IRBuilder `_ -class keep track of the current place to insert instructions and has -methods to create new instructions. - -The ``Codegen.named_values`` map keeps track of which values are defined -in the current scope and what their LLVM representation is. (In other -words, it is a symbol table for the code). In this form of Kaleidoscope, -the only things that can be referenced are function parameters. As such, -function parameters will be in this map when generating code for their -function body. - -With these basics in place, we can start talking about how to generate -code for each expression. Note that this assumes that the -``Codegen.builder`` has been set up to generate code *into* something. -For now, we'll assume that this has already been done, and we'll just -use it to emit code. - -Expression Code Generation -========================== - -Generating LLVM code for expression nodes is very straightforward: less -than 30 lines of commented code for all four of our expression nodes. -First we'll do numeric literals: - -.. code-block:: ocaml - - | Ast.Number n -> const_float double_type n - -In the LLVM IR, numeric constants are represented with the -``ConstantFP`` class, which holds the numeric value in an ``APFloat`` -internally (``APFloat`` has the capability of holding floating point -constants of Arbitrary Precision). This code basically just creates -and returns a ``ConstantFP``. Note that in the LLVM IR that constants -are all uniqued together and shared. For this reason, the API uses "the -foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)". - -.. code-block:: ocaml - - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - -References to variables are also quite simple using LLVM. In the simple -version of Kaleidoscope, we assume that the variable has already been -emitted somewhere and its value is available. In practice, the only -values that can be in the ``Codegen.named_values`` map are function -arguments. This code simply checks to see that the specified name is in -the map (if not, an unknown variable is being referenced) and returns -the value for it. In future chapters, we'll add support for `loop -induction variables `_ in the symbol table, and for -`local variables `_. - -.. code-block:: ocaml - - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_fadd lhs_val rhs_val "addtmp" builder - | '-' -> build_fsub lhs_val rhs_val "subtmp" builder - | '*' -> build_fmul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - -Binary operators start to get more interesting. The basic idea here is -that we recursively emit code for the left-hand side of the expression, -then the right-hand side, then we compute the result of the binary -expression. In this code, we do a simple switch on the opcode to create -the right LLVM instruction. - -In the example above, the LLVM builder class is starting to show its -value. IRBuilder knows where to insert the newly created instruction, -all you have to do is specify what instruction to create (e.g. with -``Llvm.create_add``), which operands to use (``lhs`` and ``rhs`` here) -and optionally provide a name for the generated instruction. - -One nice thing about LLVM is that the name is just a hint. For instance, -if the code above emits multiple "addtmp" variables, LLVM will -automatically provide each one with an increasing, unique numeric -suffix. Local value names for instructions are purely optional, but it -makes it much easier to read the IR dumps. - -`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict -rules: for example, the Left and Right operators of an `add -instruction <../LangRef.html#add-instruction>`_ must have the same type, and the -result type of the add must match the operand types. Because all values -in Kaleidoscope are doubles, this makes for very simple code for add, -sub and mul. - -On the other hand, LLVM specifies that the `fcmp -instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a -one bit integer). The problem with this is that Kaleidoscope wants the -value to be a 0.0 or 1.0 value. In order to get these semantics, we -combine the fcmp instruction with a `uitofp -instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its -input integer into a floating point value by treating the input as an -unsigned value. In contrast, if we used the `sitofp -instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator -would return 0.0 and -1.0, depending on the input value. - -.. code-block:: ocaml - - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - -Code generation for function calls is quite straightforward with LLVM. -The code above initially does a function name lookup in the LLVM -Module's symbol table. Recall that the LLVM Module is the container that -holds all of the functions we are JIT'ing. By giving each function the -same name as what the user specifies, we can use the LLVM symbol table -to resolve function names for us. - -Once we have the function to call, we recursively codegen each argument -that is to be passed in, and create an LLVM `call -instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C -calling conventions by default, allowing these calls to also call into -standard library functions like "sin" and "cos", with no additional -effort. - -This wraps up our handling of the four basic expressions that we have so -far in Kaleidoscope. Feel free to go in and add some more. For example, -by browsing the `LLVM language reference <../LangRef.html>`_ you'll find -several other interesting instructions that are really easy to plug into -our basic framework. - -Function Code Generation -======================== - -Code generation for prototypes and functions must handle a number of -details, which make their code less beautiful than expression code -generation, but allows us to illustrate some important points. First, -lets talk about code generation for prototypes: they are used both for -function bodies and external function declarations. The code starts -with: - -.. code-block:: ocaml - - let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - -This code packs a lot of power into a few lines. Note first that this -function returns a "Function\*" instead of a "Value\*" (although at the -moment they both are modeled by ``llvalue`` in ocaml). Because a -"prototype" really talks about the external interface for a function -(not the value computed by an expression), it makes sense for it to -return the LLVM Function it corresponds to when codegen'd. - -The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that -should be used for a given Prototype. Since all function arguments in -Kaleidoscope are of type double, the first line creates a vector of "N" -LLVM double types. It then uses the ``Llvm.function_type`` method to -create a function type that takes "N" doubles as arguments, returns one -double as a result, and that is not vararg (that uses the function -``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued -just like ``Constant``'s are, so you don't "new" a type, you "get" it. - -The final line above checks if the function has already been defined in -``Codegen.the_module``. If not, we will create it. - -.. code-block:: ocaml - - | None -> declare_function name ft the_module - -This indicates the type and name to use, as well as which module to -insert into. By default we assume a function has -``Llvm.Linkage.ExternalLinkage``. "`external -linkage <../LangRef.html#linkage>`_" means that the function may be defined -outside the current module and/or that it is callable by functions -outside the module. The "``name``" passed in is the name the user -specified: this name is registered in "``Codegen.the_module``"s symbol -table, which is used by the function call code above. - -In Kaleidoscope, I choose to allow redefinitions of functions in two -cases: first, we want to allow 'extern'ing a function more than once, as -long as the prototypes for the externs match (since all arguments have -the same type, we just have to check that the number of arguments -match). Second, we want to allow 'extern'ing a function and then -defining a body for it. This is useful when defining mutually recursive -functions. - -.. code-block:: ocaml - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if Array.length (basic_blocks f) == 0 then () else - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if Array.length (params f) == Array.length args then () else - raise (Error "redefinition of function with different # args"); - f - in - -In order to verify the logic above, we first check to see if the -pre-existing function is "empty". In this case, empty means that it has -no basic blocks in it, which means it has no body. If it has no body, it -is a forward declaration. Since we don't allow anything after a full -definition of the function, the code rejects this case. If the previous -reference to a function was an 'extern', we simply verify that the -number of arguments for that definition and this one match up. If not, -we emit an error. - -.. code-block:: ocaml - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -The last bit of code for prototypes loops over all of the arguments in -the function, setting the name of the LLVM Argument objects to match, -and registering the arguments in the ``Codegen.named_values`` map for -future use by the ``Ast.Variable`` variant. Once this is set up, it -returns the Function object to the caller. Note that we don't check for -conflicting argument names here (e.g. "extern foo(a b a)"). Doing so -would be very straight-forward with the mechanics we have already used -above. - -.. code-block:: ocaml - - let codegen_func = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - -Code generation for function definitions starts out simply enough: we -just codegen the prototype (Proto) and verify that it is ok. We then -clear out the ``Codegen.named_values`` map to make sure that there isn't -anything in it from the last function we compiled. Code generation of -the prototype ensures that there is an LLVM Function object that is -ready to go for us. - -.. code-block:: ocaml - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - -Now we get to the point where the ``Codegen.builder`` is set up. The -first line creates a new `basic -block `_ (named "entry"), -which is inserted into ``the_function``. The second line then tells the -builder that new instructions should be inserted into the end of the new -basic block. Basic blocks in LLVM are an important part of functions -that define the `Control Flow -Graph `_. Since we -don't have any control flow, our functions will only contain one block -at this point. We'll fix this in `Chapter 5 `_ :). - -.. code-block:: ocaml - - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - the_function - -Once the insertion point is set up, we call the ``Codegen.codegen_func`` -method for the root expression of the function. If no error happens, -this emits code to compute the expression into the entry block and -returns the value that was computed. Assuming no error, we then create -an LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the -function. Once the function is built, we call -``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This -function does a variety of consistency checks on the generated code, to -determine if our compiler is doing everything right. Using this is -important: it can catch a lot of bugs. Once the function is finished and -validated, we return it. - -.. code-block:: ocaml - - with e -> - delete_function the_function; - raise e - -The only piece left here is handling of the error case. For simplicity, -we handle this by merely deleting the function we produced with the -``Llvm.delete_function`` method. This allows the user to redefine a -function that they incorrectly typed in before: if we didn't delete it, -it would live in the symbol table, with a body, preventing future -redefinition. - -This code does have a bug, though. Since the ``Codegen.codegen_proto`` -can return a previously defined forward declaration, our code can -actually delete a forward declaration. There are a number of ways to fix -this bug, see what you can come up with! Here is a testcase: - -:: - - extern foo(a b); # ok, defines foo. - def foo(a b) c; # error, 'c' is invalid. - def bar() foo(1, 2); # error, unknown function "foo" - -Driver Changes and Closing Thoughts -=================================== - -For now, code generation to LLVM doesn't really get us much, except that -we can look at the pretty IR calls. The sample code inserts calls to -Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM -IR. This gives a nice way to look at the LLVM IR for simple functions. -For example: - -:: - - ready> 4+5; - Read top-level expression: - define double @""() { - entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp - } - -Note how the parser turns the top-level expression into anonymous -functions for us. This will be handy when we add `JIT -support `_ in the next chapter. Also note that -the code is very literally transcribed, no optimizations are being -performed. We will `add -optimizations `_ explicitly in the -next chapter. - -:: - - ready> def foo(a b) a*a + 2*a*b + b*b; - Read function definition: - define double @foo(double %a, double %b) { - entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 - } - -This shows some simple arithmetic. Notice the striking similarity to the -LLVM builder calls that we use to create the instructions. - -:: - - ready> def bar(a) foo(a, 4.0) + bar(31337); - Read function definition: - define double @bar(double %a) { - entry: - %calltmp = call double @foo(double %a, double 4.000000e+00) - %calltmp1 = call double @bar(double 3.133700e+04) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp - } - -This shows some function calls. Note that this function will take a long -time to execute if you call it. In the future we'll add conditional -control flow to actually make recursion useful :). - -:: - - ready> extern cos(x); - Read extern: - declare double @cos(double) - - ready> cos(1.234); - Read top-level expression: - define double @""() { - entry: - %calltmp = call double @cos(double 1.234000e+00) - ret double %calltmp - } - -This shows an extern for the libm "cos" function, and a call to it. - -:: - - ready> ^D - ; ModuleID = 'my cool jit' - - define double @""() { - entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp - } - - define double @foo(double %a, double %b) { - entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 - } - - define double @bar(double %a) { - entry: - %calltmp = call double @foo(double %a, double 4.000000e+00) - %calltmp1 = call double @bar(double 3.133700e+04) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp - } - - declare double @cos(double) - - define double @""() { - entry: - %calltmp = call double @cos(double 1.234000e+00) - ret double %calltmp - } - -When you quit the current demo, it dumps out the IR for the entire -module generated. Here you can see the big picture with all the -functions referencing each other. - -This wraps up the third chapter of the Kaleidoscope tutorial. Up next, -we'll describe how to `add JIT codegen and optimizer -support `_ to this so we can actually start running -code! - -Full Code Listing -================= - -Here is the complete code listing for our running example, enhanced with -the LLVM code generator. Because this uses the LLVM libraries, we need -to link them in. To do this, we use the -`llvm-config `_ tool to inform -our makefile/command line about which options to use: - -.. code-block:: bash - - # Compile - ocamlbuild toy.byte - # Run - ./toy.byte - -Here is the code: - -\_tags: - :: - - <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) - <*.{byte,native}>: g++, use_llvm, use_llvm_analysis - -myocamlbuild.ml: - .. code-block:: ocaml - - open Ocamlbuild_plugin;; - - ocaml_lib ~extern:true "llvm";; - ocaml_lib ~extern:true "llvm_analysis";; - - flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; - -token.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - - (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) - type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - -lexer.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - - let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - - and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - - and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - - and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] - -ast.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - - (* expr - Base type for all expression nodes. *) - type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = Prototype of string * string array - - (* func - This type represents a function definition itself. *) - type func = Function of proto * expr - -parser.ml: - .. code-block:: ocaml - - (*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - - (* binop_precedence - This holds the precedence for each binary operator that is - * defined *) - let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - - (* precedence - Get the precedence of the pending binary operator token. *) - let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) - let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - - (* prototype - * ::= id '(' id* ')' *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - - (* definition ::= 'def' prototype expression *) - let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - - (* toplevelexpr ::= expression *) - let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - - (* external ::= 'extern' prototype *) - let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e - -codegen.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - - open Llvm - - exception Error of string - - let context = global_context () - let the_module = create_module context "my cool jit" - let builder = builder context - let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 - let double_type = double_type context - - let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - - let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - - let codegen_func = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - the_function - with e -> - delete_function the_function; - raise e - -toplevel.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - - open Llvm - - (* top ::= definition | external | expression | ';' *) - let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - dump_value (Codegen.codegen_func e); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream - -toy.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - - open Llvm - - let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module - ;; - - main () - -`Next: Adding JIT and Optimizer Support `_ - diff --git a/llvm/docs/tutorial/OCamlLangImpl4.rst b/llvm/docs/tutorial/OCamlLangImpl4.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl4.rst +++ /dev/null @@ -1,915 +0,0 @@ -============================================== -Kaleidoscope: Adding JIT and Optimizer Support -============================================== - -.. contents:: - :local: - -Chapter 4 Introduction -====================== - -Welcome to Chapter 4 of the "`Implementing a language with -LLVM `_" tutorial. Chapters 1-3 described the implementation -of a simple language and added support for generating LLVM IR. This -chapter describes two new techniques: adding optimizer support to your -language, and adding JIT compiler support. These additions will -demonstrate how to get nice, efficient code for the Kaleidoscope -language. - -Trivial Constant Folding -======================== - -**Note:** the default ``IRBuilder`` now always includes the constant -folding optimisations below. - -Our demonstration for Chapter 3 is elegant and easy to extend. -Unfortunately, it does not produce wonderful code. For example, when -compiling simple code, we don't get obvious optimizations: - -:: - - ready> def test(x) 1+2+x; - Read function definition: - define double @test(double %x) { - entry: - %addtmp = fadd double 1.000000e+00, 2.000000e+00 - %addtmp1 = fadd double %addtmp, %x - ret double %addtmp1 - } - -This code is a very, very literal transcription of the AST built by -parsing the input. As such, this transcription lacks optimizations like -constant folding (we'd like to get "``add x, 3.0``" in the example -above) as well as other more important optimizations. Constant folding, -in particular, is a very common and very important optimization: so much -so that many language implementors implement constant folding support in -their AST representation. - -With LLVM, you don't need this support in the AST. Since all calls to -build LLVM IR go through the LLVM builder, it would be nice if the -builder itself checked to see if there was a constant folding -opportunity when you call it. If so, it could just do the constant fold -and return the constant instead of creating an instruction. This is -exactly what the ``LLVMFoldingBuilder`` class does. - -All we did was switch from ``LLVMBuilder`` to ``LLVMFoldingBuilder``. -Though we change no other code, we now have all of our instructions -implicitly constant folded without us having to do anything about it. -For example, the input above now compiles to: - -:: - - ready> def test(x) 1+2+x; - Read function definition: - define double @test(double %x) { - entry: - %addtmp = fadd double 3.000000e+00, %x - ret double %addtmp - } - -Well, that was easy :). In practice, we recommend always using -``LLVMFoldingBuilder`` when generating code like this. It has no -"syntactic overhead" for its use (you don't have to uglify your compiler -with constant checks everywhere) and it can dramatically reduce the -amount of LLVM IR that is generated in some cases (particular for -languages with a macro preprocessor or that use a lot of constants). - -On the other hand, the ``LLVMFoldingBuilder`` is limited by the fact -that it does all of its analysis inline with the code as it is built. If -you take a slightly more complex example: - -:: - - ready> def test(x) (1+2+x)*(x+(1+2)); - ready> Read function definition: - define double @test(double %x) { - entry: - %addtmp = fadd double 3.000000e+00, %x - %addtmp1 = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp1 - ret double %multmp - } - -In this case, the LHS and RHS of the multiplication are the same value. -We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" -instead of computing "``x*3``" twice. - -Unfortunately, no amount of local analysis will be able to detect and -correct this. This requires two transformations: reassociation of -expressions (to make the add's lexically identical) and Common -Subexpression Elimination (CSE) to delete the redundant add instruction. -Fortunately, LLVM provides a broad range of optimizations that you can -use, in the form of "passes". - -LLVM Optimization Passes -======================== - -LLVM provides many optimization passes, which do many different sorts of -things and have different tradeoffs. Unlike other systems, LLVM doesn't -hold to the mistaken notion that one set of optimizations is right for -all languages and for all situations. LLVM allows a compiler implementor -to make complete decisions about what optimizations to use, in which -order, and in what situation. - -As a concrete example, LLVM supports both "whole module" passes, which -look across as large of body of code as they can (often a whole file, -but if run at link time, this can be a substantial portion of the whole -program). It also supports and includes "per-function" passes which just -operate on a single function at a time, without looking at other -functions. For more information on passes and how they are run, see the -`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the -`List of LLVM Passes <../Passes.html>`_. - -For Kaleidoscope, we are currently generating functions on the fly, one -at a time, as the user types them in. We aren't shooting for the -ultimate optimization experience in this setting, but we also want to -catch the easy and quick stuff where possible. As such, we will choose -to run a few per-function optimizations as the user types the function -in. If we wanted to make a "static Kaleidoscope compiler", we would use -exactly the code we have now, except that we would defer running the -optimizer until the entire file has been parsed. - -In order to get per-function optimizations going, we need to set up a -`Llvm.PassManager <../WritingAnLLVMPass.html#what-passmanager-does>`_ to hold and -organize the LLVM optimizations that we want to run. Once we have that, -we can add a set of optimizations to run. The code looks like this: - -.. code-block:: ocaml - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combining the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - -The meat of the matter here, is the definition of "``the_fpm``". It -requires a pointer to the ``the_module`` to construct itself. Once it is -set up, we use a series of "add" calls to add a bunch of LLVM passes. -The first pass is basically boilerplate, it adds a pass so that later -optimizations know how the data structures in the program are laid out. -The "``the_execution_engine``" variable is related to the JIT, which we -will get to in the next section. - -In this case, we choose to add 4 optimization passes. The passes we -chose here are a pretty standard set of "cleanup" optimizations that are -useful for a wide variety of code. I won't delve into what they do but, -believe me, they are a good starting place :). - -Once the ``Llvm.PassManager.`` is set up, we need to make use of it. We -do this by running it after our newly created function is constructed -(in ``Codegen.codegen_func``), but before it is returned to the client: - -.. code-block:: ocaml - - let codegen_func the_fpm = function - ... - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - -As you can see, this is pretty straightforward. The ``the_fpm`` -optimizes and updates the LLVM Function\* in place, improving -(hopefully) its body. With this in place, we can try our test above -again: - -:: - - ready> def test(x) (1+2+x)*(x+(1+2)); - ready> Read function definition: - define double @test(double %x) { - entry: - %addtmp = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp - ret double %multmp - } - -As expected, we now get our nicely optimized code, saving a floating -point add instruction from every execution of this function. - -LLVM provides a wide variety of optimizations that can be used in -certain circumstances. Some `documentation about the various -passes <../Passes.html>`_ is available, but it isn't very complete. -Another good source of ideas can come from looking at the passes that -``Clang`` runs to get started. The "``opt``" tool allows you to -experiment with passes from the command line, so you can see if they do -anything. - -Now that we have reasonable code coming out of our front-end, lets talk -about executing it! - -Adding a JIT Compiler -===================== - -Code that is available in LLVM IR can have a wide variety of tools -applied to it. For example, you can run optimizations on it (as we did -above), you can dump it out in textual or binary forms, you can compile -the code to an assembly file (.s) for some target, or you can JIT -compile it. The nice thing about the LLVM IR representation is that it -is the "common currency" between many different parts of the compiler. - -In this section, we'll add JIT compiler support to our interpreter. The -basic idea that we want for Kaleidoscope is to have the user enter -function bodies as they do now, but immediately evaluate the top-level -expressions they type in. For example, if they type in "1 + 2;", we -should evaluate and print out 3. If they define a function, they should -be able to call it from the command line. - -In order to do this, we first declare and initialize the JIT. This is -done by adding a global variable and a call in ``main``: - -.. code-block:: ocaml - - ... - let main () = - ... - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - ... - -This creates an abstract "Execution Engine" which can be either a JIT -compiler or the LLVM interpreter. LLVM will automatically pick a JIT -compiler for you if one is available for your platform, otherwise it -will fall back to the interpreter. - -Once the ``Llvm_executionengine.ExecutionEngine.t`` is created, the JIT -is ready to be used. There are a variety of APIs that are useful, but -the simplest one is the -"``Llvm_executionengine.ExecutionEngine.run_function``" function. This -method JIT compiles the specified LLVM Function and returns a function -pointer to the generated machine code. In our case, this means that we -can change the code that parses a top-level expression to look like -this: - -.. code-block:: ocaml - - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - -Recall that we compile top-level expressions into a self-contained LLVM -function that takes no arguments and returns the computed double. -Because the LLVM JIT compiler matches the native platform ABI, this -means that you can just cast the result pointer to a function pointer of -that type and call it directly. This means, there is no difference -between JIT compiled code and native machine code that is statically -linked into your application. - -With just these two changes, lets see how Kaleidoscope works now! - -:: - - ready> 4+5; - define double @""() { - entry: - ret double 9.000000e+00 - } - - Evaluated to 9.000000 - -Well this looks like it is basically working. The dump of the function -shows the "no argument function that always returns double" that we -synthesize for each top level expression that is typed in. This -demonstrates very basic functionality, but can we do more? - -:: - - ready> def testfunc(x y) x + y*2; - Read function definition: - define double @testfunc(double %x, double %y) { - entry: - %multmp = fmul double %y, 2.000000e+00 - %addtmp = fadd double %multmp, %x - ret double %addtmp - } - - ready> testfunc(4, 10); - define double @""() { - entry: - %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) - ret double %calltmp - } - - Evaluated to 24.000000 - -This illustrates that we can now call user code, but there is something -a bit subtle going on here. Note that we only invoke the JIT on the -anonymous functions that *call testfunc*, but we never invoked it on -*testfunc* itself. What actually happened here is that the JIT scanned -for all non-JIT'd functions transitively called from the anonymous -function and compiled all of them before returning from -``run_function``. - -The JIT provides a number of other more advanced interfaces for things -like freeing allocated machine code, rejit'ing functions to update them, -etc. However, even with this simple code, we get some surprisingly -powerful capabilities - check this out (I removed the dump of the -anonymous functions, you should get the idea by now :) : - -:: - - ready> extern sin(x); - Read extern: - declare double @sin(double) - - ready> extern cos(x); - Read extern: - declare double @cos(double) - - ready> sin(1.0); - Evaluated to 0.841471 - - ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); - Read function definition: - define double @foo(double %x) { - entry: - %calltmp = call double @sin(double %x) - %multmp = fmul double %calltmp, %calltmp - %calltmp2 = call double @cos(double %x) - %multmp4 = fmul double %calltmp2, %calltmp2 - %addtmp = fadd double %multmp, %multmp4 - ret double %addtmp - } - - ready> foo(4.0); - Evaluated to 1.000000 - -Whoa, how does the JIT know about sin and cos? The answer is -surprisingly simple: in this example, the JIT started execution of a -function and got to a function call. It realized that the function was -not yet JIT compiled and invoked the standard set of routines to resolve -the function. In this case, there is no body defined for the function, -so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope -process itself. Since "``sin``" is defined within the JIT's address -space, it simply patches up calls in the module to call the libm version -of ``sin`` directly. - -The LLVM JIT provides a number of interfaces (look in the -``llvm_executionengine.mli`` file) for controlling how unknown functions -get resolved. It allows you to establish explicit mappings between IR -objects and addresses (useful for LLVM global variables that you want to -map to static tables, for example), allows you to dynamically decide on -the fly based on the function name, and even allows you to have the JIT -compile functions lazily the first time they're called. - -One interesting application of this is that we can now extend the -language by writing arbitrary C code to implement operations. For -example, if we add: - -.. code-block:: c++ - - /* putchard - putchar that takes a double and returns 0. */ - extern "C" - double putchard(double X) { - putchar((char)X); - return 0; - } - -Now we can produce simple output to the console by using things like: -"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' -on the console (120 is the ASCII code for 'x'). Similar code could be -used to implement file I/O, console input, and many other capabilities -in Kaleidoscope. - -This completes the JIT and optimizer chapter of the Kaleidoscope -tutorial. At this point, we can compile a non-Turing-complete -programming language, optimize and JIT compile it in a user-driven way. -Next up we'll look into `extending the language with control flow -constructs `_, tackling some interesting LLVM IR -issues along the way. - -Full Code Listing -================= - -Here is the complete code listing for our running example, enhanced with -the LLVM JIT and optimizer. To build this example, use: - -.. code-block:: bash - - # Compile - ocamlbuild toy.byte - # Run - ./toy.byte - -Here is the code: - -\_tags: - :: - - <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) - <*.{byte,native}>: g++, use_llvm, use_llvm_analysis - <*.{byte,native}>: use_llvm_executionengine, use_llvm_target - <*.{byte,native}>: use_llvm_scalar_opts, use_bindings - -myocamlbuild.ml: - .. code-block:: ocaml - - open Ocamlbuild_plugin;; - - ocaml_lib ~extern:true "llvm";; - ocaml_lib ~extern:true "llvm_analysis";; - ocaml_lib ~extern:true "llvm_executionengine";; - ocaml_lib ~extern:true "llvm_target";; - ocaml_lib ~extern:true "llvm_scalar_opts";; - - flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; - dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; - -token.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - - (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) - type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - -lexer.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - - let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - - and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - - and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - - and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] - -ast.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - - (* expr - Base type for all expression nodes. *) - type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = Prototype of string * string array - - (* func - This type represents a function definition itself. *) - type func = Function of proto * expr - -parser.ml: - .. code-block:: ocaml - - (*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - - (* binop_precedence - This holds the precedence for each binary operator that is - * defined *) - let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - - (* precedence - Get the precedence of the pending binary operator token. *) - let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) - let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - - (* prototype - * ::= id '(' id* ')' *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - - (* definition ::= 'def' prototype expression *) - let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - - (* toplevelexpr ::= expression *) - let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - - (* external ::= 'extern' prototype *) - let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e - -codegen.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - - open Llvm - - exception Error of string - - let context = global_context () - let the_module = create_module context "my cool jit" - let builder = builder context - let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 - let double_type = double_type context - - let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - - let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - - let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e - -toplevel.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - - (* top ::= definition | external | expression | ';' *) - let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream - -toy.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - open Llvm_target - open Llvm_scalar_opts - - let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module - ;; - - main () - -bindings.c - .. code-block:: c - - #include - - /* putchard - putchar that takes a double and returns 0. */ - extern double putchard(double X) { - putchar((char)X); - return 0; - } - -`Next: Extending the language: control flow `_ - diff --git a/llvm/docs/tutorial/OCamlLangImpl5.rst b/llvm/docs/tutorial/OCamlLangImpl5.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl5.rst +++ /dev/null @@ -1,1350 +0,0 @@ -================================================== -Kaleidoscope: Extending the Language: Control Flow -================================================== - -.. contents:: - :local: - -Chapter 5 Introduction -====================== - -Welcome to Chapter 5 of the "`Implementing a language with -LLVM `_" tutorial. Parts 1-4 described the implementation of -the simple Kaleidoscope language and included support for generating -LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as -presented, Kaleidoscope is mostly useless: it has no control flow other -than call and return. This means that you can't have conditional -branches in the code, significantly limiting its power. In this episode -of "build that compiler", we'll extend Kaleidoscope to have an -if/then/else expression plus a simple 'for' loop. - -If/Then/Else -============ - -Extending Kaleidoscope to support if/then/else is quite straightforward. -It basically requires adding lexer support for this "new" concept to the -lexer, parser, AST, and LLVM code emitter. This example is nice, because -it shows how easy it is to "grow" a language over time, incrementally -extending it as new ideas are discovered. - -Before we get going on "how" we add this extension, lets talk about -"what" we want. The basic idea is that we want to be able to write this -sort of thing: - -:: - - def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2); - -In Kaleidoscope, every construct is an expression: there are no -statements. As such, the if/then/else expression needs to return a value -like any other. Since we're using a mostly functional form, we'll have -it evaluate its conditional, then return the 'then' or 'else' value -based on how the condition was resolved. This is very similar to the C -"?:" expression. - -The semantics of the if/then/else expression is that it evaluates the -condition to a boolean equality value: 0.0 is considered to be false and -everything else is considered to be true. If the condition is true, the -first subexpression is evaluated and returned, if the condition is -false, the second subexpression is evaluated and returned. Since -Kaleidoscope allows side-effects, this behavior is important to nail -down. - -Now that we know what we "want", lets break this down into its -constituent pieces. - -Lexer Extensions for If/Then/Else ---------------------------------- - -The lexer extensions are straightforward. First we add new variants for -the relevant tokens: - -.. code-block:: ocaml - - (* control *) - | If | Then | Else | For | In - -Once we have that, we recognize the new keywords in the lexer. This is -pretty simple stuff: - -.. code-block:: ocaml - - ... - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] - -AST Extensions for If/Then/Else -------------------------------- - -To represent the new expression we add a new AST variant for it: - -.. code-block:: ocaml - - type expr = - ... - (* variant for if/then/else. *) - | If of expr * expr * expr - -The AST variant just has pointers to the various subexpressions. - -Parser Extensions for If/Then/Else ----------------------------------- - -Now that we have the relevant tokens coming from the lexer and we have -the AST node to build, our parsing logic is relatively straightforward. -Next we add a new case for parsing a if-expression as a primary expression: - -.. code-block:: ocaml - - let rec parse_primary = parser - ... - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - -LLVM IR for If/Then/Else ------------------------- - -Now that we have it parsing and building the AST, the final piece is -adding LLVM code generation support. This is the most interesting part -of the if/then/else example, because this is where it starts to -introduce new concepts. All of the code above has been thoroughly -described in previous chapters. - -To motivate the code we want to produce, lets take a look at a simple -example. Consider: - -:: - - extern foo(); - extern bar(); - def baz(x) if x then foo() else bar(); - -If you disable optimizations, the code you'll (soon) get from -Kaleidoscope looks like this: - -.. code-block:: llvm - - declare double @foo() - - declare double @bar() - - define double @baz(double %x) { - entry: - %ifcond = fcmp one double %x, 0.000000e+00 - br i1 %ifcond, label %then, label %else - - then: ; preds = %entry - %calltmp = call double @foo() - br label %ifcont - - else: ; preds = %entry - %calltmp1 = call double @bar() - br label %ifcont - - ifcont: ; preds = %else, %then - %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] - ret double %iftmp - } - -To visualize the control flow graph, you can use a nifty feature of the -LLVM '`opt `_' tool. If you put this LLVM -IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a -window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll -see this graph: - -.. figure:: MyFirstLanguageFrontend/LangImpl05-cfg.png - :align: center - :alt: Example CFG - - Example CFG - -Another way to get this is to call -"``Llvm_analysis.view_function_cfg f``" or -"``Llvm_analysis.view_function_cfg_only f``" (where ``f`` is a -"``Function``") either by inserting actual calls into the code and -recompiling or by calling these in the debugger. LLVM has many nice -features for visualizing various graphs. - -Getting back to the generated code, it is fairly simple: the entry block -evaluates the conditional expression ("x" in our case here) and compares -the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered -and Not Equal"). Based on the result of this expression, the code jumps -to either the "then" or "else" blocks, which contain the expressions for -the true/false cases. - -Once the then/else blocks are finished executing, they both branch back -to the 'ifcont' block to execute the code that happens after the -if/then/else. In this case the only thing left to do is to return to the -caller of the function. The question then becomes: how does the code -know which expression to return? - -The answer to this question involves an important SSA operation: the -`Phi -operation `_. -If you're not familiar with SSA, `the wikipedia -article `_ -is a good introduction and there are various other introductions to it -available on your favorite search engine. The short version is that -"execution" of the Phi operation requires "remembering" which block -control came from. The Phi operation takes on the value corresponding to -the input control block. In this case, if control comes in from the -"then" block, it gets the value of "calltmp". If control comes from the -"else" block, it gets the value of "calltmp1". - -At this point, you are probably starting to think "Oh no! This means my -simple and elegant front-end will have to start generating SSA form in -order to use LLVM!". Fortunately, this is not the case, and we strongly -advise *not* implementing an SSA construction algorithm in your -front-end unless there is an amazingly good reason to do so. In -practice, there are two sorts of values that float around in code -written for your average imperative programming language that might need -Phi nodes: - -#. Code that involves user variables: ``x = 1; x = x + 1;`` -#. Values that are implicit in the structure of your AST, such as the - Phi node in this case. - -In `Chapter 7 `_ of this tutorial ("mutable -variables"), we'll talk about #1 in depth. For now, just believe me that -you don't need SSA construction to handle this case. For #2, you have -the choice of using the techniques that we will describe for #1, or you -can insert Phi nodes directly, if convenient. In this case, it is really -really easy to generate the Phi node, so we choose to do it directly. - -Okay, enough of the motivation and overview, lets generate code! - -Code Generation for If/Then/Else --------------------------------- - -In order to generate code for this, we implement the ``Codegen`` method -for ``IfExprAST``: - -.. code-block:: ocaml - - let rec codegen_expr = function - ... - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - -This code is straightforward and similar to what we saw before. We emit -the expression for the condition, then compare that value to zero to get -a truth value as a 1-bit (bool) value. - -.. code-block:: ocaml - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - position_at_end then_bb builder; - -As opposed to the `C++ tutorial `_, we have to build our -basic blocks bottom up since we can't have dangling BasicBlocks. We -start off by saving a pointer to the first block (which might not be the -entry block), which we'll need to build a conditional branch later. We -do this by asking the ``builder`` for the current BasicBlock. The fourth -line gets the current Function object that is being built. It gets this -by the ``start_bb`` for its "parent" (the function it is currently -embedded into). - -Once it has that, it creates one block. It is automatically appended -into the function's list of blocks. - -.. code-block:: ocaml - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - -We move the builder to start inserting into the "then" block. Strictly -speaking, this call moves the insertion point to be at the end of the -specified block. However, since the "then" block is empty, it also -starts out by inserting at the beginning of the block. :) - -Once the insertion point is set, we recursively codegen the "then" -expression from the AST. - -The final line here is quite subtle, but is very important. The basic -issue is that when we create the Phi node in the merge block, we need to -set up the block/value pairs that indicate how the Phi will work. -Importantly, the Phi node expects to have an entry for each predecessor -of the block in the CFG. Why then, are we getting the current block when -we just set it to ThenBB 5 lines above? The problem is that the "Then" -expression may actually itself change the block that the Builder is -emitting into if, for example, it contains a nested "if/then/else" -expression. Because calling Codegen recursively could arbitrarily change -the notion of the current block, we are required to get an up-to-date -value for code that will set up the Phi node. - -.. code-block:: ocaml - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - -Code generation for the 'else' block is basically identical to codegen -for the 'then' block. - -.. code-block:: ocaml - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - -The first two lines here are now familiar: the first adds the "merge" -block to the Function object. The second changes the insertion -point so that newly created code will go into the "merge" block. Once -that is done, we need to create the PHI node and set up the block/value -pairs for the PHI. - -.. code-block:: ocaml - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - -Once the blocks are created, we can emit the conditional branch that -chooses between them. Note that creating new blocks does not implicitly -affect the IRBuilder, so it is still inserting into the block that the -condition went into. This is why we needed to save the "start" block. - -.. code-block:: ocaml - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - -To finish off the blocks, we create an unconditional branch to the merge -block. One interesting (and very important) aspect of the LLVM IR is -that it `requires all basic blocks to be -"terminated" <../LangRef.html#functionstructure>`_ with a `control flow -instruction <../LangRef.html#terminators>`_ such as return or branch. -This means that all control flow, *including fall throughs* must be made -explicit in the LLVM IR. If you violate this rule, the verifier will -emit an error. - -Finally, the CodeGen function returns the phi node as the value computed -by the if/then/else expression. In our example above, this returned -value will feed into the code for the top-level function, which will -create the return instruction. - -Overall, we now have the ability to execute conditional code in -Kaleidoscope. With this extension, Kaleidoscope is a fairly complete -language that can calculate a wide variety of numeric functions. Next up -we'll add another useful expression that is familiar from non-functional -languages... - -'for' Loop Expression -===================== - -Now that we know how to add basic control flow constructs to the -language, we have the tools to add more powerful things. Lets add -something more aggressive, a 'for' expression: - -:: - - extern putchard(char); - def printstar(n) - for i = 1, i < n, 1.0 in - putchard(42); # ascii 42 = '*' - - # print 100 '*' characters - printstar(100); - -This expression defines a new variable ("i" in this case) which iterates -from a starting value, while the condition ("i < n" in this case) is -true, incrementing by an optional step value ("1.0" in this case). If -the step value is omitted, it defaults to 1.0. While the loop is true, -it executes its body expression. Because we don't have anything better -to return, we'll just define the loop as always returning 0.0. In the -future when we have mutable variables, it will get more useful. - -As before, lets talk about the changes that we need to Kaleidoscope to -support this. - -Lexer Extensions for the 'for' Loop ------------------------------------ - -The lexer extensions are the same sort of thing as for if/then/else: - -.. code-block:: ocaml - - ... in Token.token ... - (* control *) - | If | Then | Else - | For | In - - ... in Lexer.lex_ident... - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] - -AST Extensions for the 'for' Loop ---------------------------------- - -The AST variant is just as simple. It basically boils down to capturing -the variable name and the constituent expressions in the node. - -.. code-block:: ocaml - - type expr = - ... - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - -Parser Extensions for the 'for' Loop ------------------------------------- - -The parser code is also fairly standard. The only interesting thing here -is handling of the optional step value. The parser code handles it by -checking to see if the second comma is present. If not, it sets the step -value to null in the AST node: - -.. code-block:: ocaml - - let rec parse_primary = parser - ... - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - -LLVM IR for the 'for' Loop --------------------------- - -Now we get to the good part: the LLVM IR we want to generate for this -thing. With the simple example above, we get this LLVM IR (note that -this dump is generated with optimizations disabled for clarity): - -.. code-block:: llvm - - declare double @putchard(double) - - define double @printstar(double %n) { - entry: - ; initial value = 1.0 (inlined into phi) - br label %loop - - loop: ; preds = %loop, %entry - %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] - ; body - %calltmp = call double @putchard(double 4.200000e+01) - ; increment - %nextvar = fadd double %i, 1.000000e+00 - - ; termination test - %cmptmp = fcmp ult double %i, %n - %booltmp = uitofp i1 %cmptmp to double - %loopcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %loopcond, label %loop, label %afterloop - - afterloop: ; preds = %loop - ; loop always returns 0.0 - ret double 0.000000e+00 - } - -This loop contains all the same constructs we saw before: a phi node, -several expressions, and some basic blocks. Lets see how this fits -together. - -Code Generation for the 'for' Loop ----------------------------------- - -The first part of Codegen is very simple: we just output the start -expression for the loop value: - -.. code-block:: ocaml - - let rec codegen_expr = function - ... - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - -With this out of the way, the next step is to set up the LLVM basic -block for the start of the loop body. In the case above, the whole loop -body is one block, but remember that the body code itself could consist -of multiple blocks (e.g. if it contains an if/then/else or a for/in -expression). - -.. code-block:: ocaml - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - -This code is similar to what we saw for if/then/else. Because we will -need it to create the Phi node, we remember the block that falls through -into the loop. Once we have that, we create the actual block that starts -the loop and create an unconditional branch for the fall-through between -the two blocks. - -.. code-block:: ocaml - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - -Now that the "preheader" for the loop is set up, we switch to emitting -code for the loop body. To begin with, we move the insertion point and -create the PHI node for the loop induction variable. Since we already -know the incoming value for the starting value, we add it to the Phi -node. Note that the Phi will eventually get a second value for the -backedge, but we can't set it up yet (because it doesn't exist!). - -.. code-block:: ocaml - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - -Now the code starts to get more interesting. Our 'for' loop introduces a -new variable to the symbol table. This means that our symbol table can -now contain either function arguments or loop variables. To handle this, -before we codegen the body of the loop, we add the loop variable as the -current value for its name. Note that it is possible that there is a -variable of the same name in the outer scope. It would be easy to make -this an error (emit an error and return null if there is already an -entry for VarName) but we choose to allow shadowing of variables. In -order to handle this correctly, we remember the Value that we are -potentially shadowing in ``old_val`` (which will be None if there is no -shadowed variable). - -Once the loop variable is set into the symbol table, the code -recursively codegen's the body. This allows the body to use the loop -variable: any references to it will naturally find it in the symbol -table. - -.. code-block:: ocaml - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - -Now that the body is emitted, we compute the next value of the iteration -variable by adding the step value, or 1.0 if it isn't present. -'``next_var``' will be the value of the loop variable on the next -iteration of the loop. - -.. code-block:: ocaml - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - -Finally, we evaluate the exit value of the loop, to determine whether -the loop should exit. This mirrors the condition evaluation for the -if/then/else statement. - -.. code-block:: ocaml - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - -With the code for the body of the loop complete, we just need to finish -up the control flow for it. This code remembers the end block (for the -phi node), then creates the block for the loop exit ("afterloop"). Based -on the value of the exit condition, it creates a conditional branch that -chooses between executing the loop again and exiting the loop. Any -future code is emitted in the "afterloop" block, so it sets the -insertion position to it. - -.. code-block:: ocaml - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - -The final code handles various cleanups: now that we have the -"``next_var``" value, we can add the incoming value to the loop PHI -node. After that, we remove the loop variable from the symbol table, so -that it isn't in scope after the for loop. Finally, code generation of -the for loop always returns 0.0, so that is what we return from -``Codegen.codegen_expr``. - -With this, we conclude the "adding control flow to Kaleidoscope" chapter -of the tutorial. In this chapter we added two control flow constructs, -and used them to motivate a couple of aspects of the LLVM IR that are -important for front-end implementors to know. In the next chapter of our -saga, we will get a bit crazier and add `user-defined -operators `_ to our poor innocent language. - -Full Code Listing -================= - -Here is the complete code listing for our running example, enhanced with -the if/then/else and for expressions.. To build this example, use: - -.. code-block:: bash - - # Compile - ocamlbuild toy.byte - # Run - ./toy.byte - -Here is the code: - -\_tags: - :: - - <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) - <*.{byte,native}>: g++, use_llvm, use_llvm_analysis - <*.{byte,native}>: use_llvm_executionengine, use_llvm_target - <*.{byte,native}>: use_llvm_scalar_opts, use_bindings - -myocamlbuild.ml: - .. code-block:: ocaml - - open Ocamlbuild_plugin;; - - ocaml_lib ~extern:true "llvm";; - ocaml_lib ~extern:true "llvm_analysis";; - ocaml_lib ~extern:true "llvm_executionengine";; - ocaml_lib ~extern:true "llvm_target";; - ocaml_lib ~extern:true "llvm_scalar_opts";; - - flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; - dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; - -token.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - - (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) - type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - -lexer.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - - let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - - and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - - and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] - - and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] - -ast.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - - (* expr - Base type for all expression nodes. *) - type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = Prototype of string * string array - - (* func - This type represents a function definition itself. *) - type func = Function of proto * expr - -parser.ml: - .. code-block:: ocaml - - (*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - - (* binop_precedence - This holds the precedence for each binary operator that is - * defined *) - let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - - (* precedence - Get the precedence of the pending binary operator token. *) - let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) - let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - - (* prototype - * ::= id '(' id* ')' *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - - (* definition ::= 'def' prototype expression *) - let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - - (* toplevelexpr ::= expression *) - let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - - (* external ::= 'extern' prototype *) - let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e - -codegen.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - - open Llvm - - exception Error of string - - let context = global_context () - let the_module = create_module context "my cool jit" - let builder = builder context - let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 - let double_type = double_type context - - let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - - let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - - let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e - -toplevel.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - - (* top ::= definition | external | expression | ';' *) - let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream - -toy.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - open Llvm_target - open Llvm_scalar_opts - - let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module - ;; - - main () - -bindings.c - .. code-block:: c - - #include - - /* putchard - putchar that takes a double and returns 0. */ - extern double putchard(double X) { - putchar((char)X); - return 0; - } - -`Next: Extending the language: user-defined -operators `_ - diff --git a/llvm/docs/tutorial/OCamlLangImpl6.rst b/llvm/docs/tutorial/OCamlLangImpl6.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl6.rst +++ /dev/null @@ -1,1441 +0,0 @@ -============================================================ -Kaleidoscope: Extending the Language: User-defined Operators -============================================================ - -.. contents:: - :local: - -Chapter 6 Introduction -====================== - -Welcome to Chapter 6 of the "`Implementing a language with -LLVM `_" tutorial. At this point in our tutorial, we now -have a fully functional language that is fairly minimal, but also -useful. There is still one big problem with it, however. Our language -doesn't have many useful operators (like division, logical negation, or -even any comparisons besides less-than). - -This chapter of the tutorial takes a wild digression into adding -user-defined operators to the simple and beautiful Kaleidoscope -language. This digression now gives us a simple and ugly language in -some ways, but also a powerful one at the same time. One of the great -things about creating your own language is that you get to decide what -is good or bad. In this tutorial we'll assume that it is okay to use -this as a way to show some interesting parsing techniques. - -At the end of this tutorial, we'll run through an example Kaleidoscope -application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an -example of what you can build with Kaleidoscope and its feature set. - -User-defined Operators: the Idea -================================ - -The "operator overloading" that we will add to Kaleidoscope is more -general than languages like C++. In C++, you are only allowed to -redefine existing operators: you can't programmatically change the -grammar, introduce new operators, change precedence levels, etc. In this -chapter, we will add this capability to Kaleidoscope, which will let the -user round out the set of operators that are supported. - -The point of going into user-defined operators in a tutorial like this -is to show the power and flexibility of using a hand-written parser. -Thus far, the parser we have been implementing uses recursive descent -for most parts of the grammar and operator precedence parsing for the -expressions. See `Chapter 2 `_ for details. Without -using operator precedence parsing, it would be very difficult to allow -the programmer to introduce new operators into the grammar: the grammar -is dynamically extensible as the JIT runs. - -The two specific features we'll add are programmable unary operators -(right now, Kaleidoscope has no unary operators at all) as well as -binary operators. An example of this is: - -:: - - # Logical unary not. - def unary!(v) - if v then - 0 - else - 1; - - # Define > with the same precedence as <. - def binary> 10 (LHS RHS) - RHS < LHS; - - # Binary "logical or", (note that it does not "short circuit") - def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - - # Define = with slightly lower precedence than relationals. - def binary= 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); - -Many languages aspire to being able to implement their standard runtime -library in the language itself. In Kaleidoscope, we can implement -significant parts of the language in the library! - -We will break down implementation of these features into two parts: -implementing support for user-defined binary operators and adding unary -operators. - -User-defined Binary Operators -============================= - -Adding support for user-defined binary operators is pretty simple with -our current framework. We'll first add support for the unary/binary -keywords: - -.. code-block:: ocaml - - type token = - ... - (* operators *) - | Binary | Unary - - ... - - and lex_ident buffer = parser - ... - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - -This just adds lexer support for the unary and binary keywords, like we -did in `previous chapters `_. One nice -thing about our current AST, is that we represent binary operators with -full generalisation by using their ASCII code as the opcode. For our -extended operators, we'll use this same representation, so we don't need -any new AST or parser support. - -On the other hand, we have to be able to represent the definitions of -these new operators, in the "def binary\| 5" part of the function -definition. In our grammar so far, the "name" for the function -definition is parsed as the "prototype" production and into the -``Ast.Prototype`` AST node. To represent our new user-defined operators -as prototypes, we have to extend the ``Ast.Prototype`` AST node like -this: - -.. code-block:: ocaml - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - -Basically, in addition to knowing a name for the prototype, we now keep -track of whether it was an operator, and if it was, what precedence -level the operator is at. The precedence is only used for binary -operators (as you'll see below, it just doesn't apply for unary -operators). Now that we have a way to represent the prototype for a -user-defined operator, we need to parse it: - -.. code-block:: ocaml - - (* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -This is all fairly straightforward parsing code, and we have already -seen a lot of similar code in the past. One interesting part about the -code above is the couple lines that set up ``name`` for binary -operators. This builds names like "binary@" for a newly defined "@" -operator. This then takes advantage of the fact that symbol names in the -LLVM symbol table are allowed to have any character in them, including -embedded nul characters. - -The next interesting thing to add, is codegen support for these binary -operators. Given our current structure, this is a simple addition of a -default case for our existing binary operator node: - -.. code-block:: ocaml - - let codegen_expr = function - ... - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - -As you can see above, the new code is actually really simple. It just -does a lookup for the appropriate operator in the symbol table and -generates a function call to it. Since user-defined operators are just -built as normal functions (because the "prototype" boils down to a -function with the right name) everything falls into place. - -The final piece of code we are missing, is a bit of top level magic: - -.. code-block:: ocaml - - let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - ... - -Basically, before codegening a function, if it is a user-defined -operator, we register it in the precedence table. This allows the binary -operator parsing logic we already have in place to handle it. Since we -are working on a fully-general operator precedence parser, this is all -we need to do to "extend the grammar". - -Now we have useful user-defined binary operators. This builds a lot on -the previous framework we built for other operators. Adding unary -operators is a bit more challenging, because we don't have any framework -for it yet - lets see what it takes. - -User-defined Unary Operators -============================ - -Since we don't currently support unary operators in the Kaleidoscope -language, we'll need to add everything to support them. Above, we added -simple support for the 'unary' keyword to the lexer. In addition to -that, we need an AST node: - -.. code-block:: ocaml - - type expr = - ... - (* variant for a unary operator. *) - | Unary of char * expr - ... - -This AST node is very simple and obvious by now. It directly mirrors the -binary operator AST node, except that it only has one child. With this, -we need to add the parsing logic. Parsing a unary operator is pretty -simple: we'll add a new function to do it: - -.. code-block:: ocaml - - (* unary - * ::= primary - * ::= '!' unary *) - and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - -The grammar we add is pretty straightforward here. If we see a unary -operator when parsing a primary operator, we eat the operator as a -prefix and parse the remaining piece as another unary operator. This -allows us to handle multiple unary operators (e.g. "!!x"). Note that -unary operators can't have ambiguous parses like binary operators can, -so there is no need for precedence information. - -The problem with this function, is that we need to call ParseUnary from -somewhere. To do this, we change previous callers of ParsePrimary to -call ``parse_unary`` instead: - -.. code-block:: ocaml - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - ... - (* Parse the unary expression after the binary operator. *) - let rhs = parse_unary stream in - ... - - ... - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - -With these two simple changes, we are now able to parse unary operators -and build the AST for them. Next up, we need to add parser support for -prototypes, to parse the unary operator prototype. We extend the binary -operator code above with: - -.. code-block:: ocaml - - (* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -As with binary operators, we name unary operators with a name that -includes the operator character. This assists us at code generation -time. Speaking of, the final piece we need to add is codegen support for -unary operators. It looks like this: - -.. code-block:: ocaml - - let rec codegen_expr = function - ... - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - -This code is similar to, but simpler than, the code for binary -operators. It is simpler primarily because it doesn't need to handle any -predefined operators. - -Kicking the Tires -================= - -It is somewhat hard to believe, but with a few simple extensions we've -covered in the last chapters, we have grown a real-ish language. With -this, we can do a lot of interesting things, including I/O, math, and a -bunch of other things. For example, we can now add a nice sequencing -operator (printd is defined to print out the specified value and a -newline): - -:: - - ready> extern printd(x); - Read extern: declare double @printd(double) - ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands. - .. - ready> printd(123) : printd(456) : printd(789); - 123.000000 - 456.000000 - 789.000000 - Evaluated to 0.000000 - -We can also define a bunch of other "primitive" operations, such as: - -:: - - # Logical unary not. - def unary!(v) - if v then - 0 - else - 1; - - # Unary negate. - def unary-(v) - 0-v; - - # Define > with the same precedence as <. - def binary> 10 (LHS RHS) - RHS < LHS; - - # Binary logical or, which does not short circuit. - def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - - # Binary logical and, which does not short circuit. - def binary& 6 (LHS RHS) - if !LHS then - 0 - else - !!RHS; - - # Define = with slightly lower precedence than relationals. - def binary = 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); - -Given the previous if/then/else support, we can also define interesting -functions for I/O. For example, the following prints out a character -whose "density" reflects the value passed in: the lower the value, the -denser the character: - -:: - - ready> - - extern putchard(char) - def printdensity(d) - if d > 8 then - putchard(32) # ' ' - else if d > 4 then - putchard(46) # '.' - else if d > 2 then - putchard(43) # '+' - else - putchard(42); # '*' - ... - ready> printdensity(1): printdensity(2): printdensity(3) : - printdensity(4): printdensity(5): printdensity(9): putchard(10); - *++.. - Evaluated to 0.000000 - -Based on these simple primitive operations, we can start to define more -interesting things. For example, here's a little function that solves -for the number of iterations it takes a function in the complex plane to -converge: - -:: - - # determine whether the specific location diverges. - # Solve for z = z^2 + c in the complex plane. - def mandelconverger(real imag iters creal cimag) - if iters > 255 | (real*real + imag*imag > 4) then - iters - else - mandelconverger(real*real - imag*imag + creal, - 2*real*imag + cimag, - iters+1, creal, cimag); - - # return the number of iterations required for the iteration to escape - def mandelconverge(real imag) - mandelconverger(real, imag, 0, real, imag); - -This "z = z\ :sup:`2`\ + c" function is a beautiful little creature -that is the basis for computation of the `Mandelbrot -Set `_. Our -``mandelconverge`` function returns the number of iterations that it -takes for a complex orbit to escape, saturating to 255. This is not a -very useful function by itself, but if you plot its value over a -two-dimensional plane, you can see the Mandelbrot set. Given that we are -limited to using putchard here, our amazing graphical output is limited, -but we can whip together something using the density plotter above: - -:: - - # compute and plot the mandelbrot set with the specified 2 dimensional range - # info. - def mandelhelp(xmin xmax xstep ymin ymax ystep) - for y = ymin, y < ymax, ystep in ( - (for x = xmin, x < xmax, xstep in - printdensity(mandelconverge(x,y))) - : putchard(10) - ) - - # mandel - This is a convenient helper function for plotting the mandelbrot set - # from the specified position with the specified Magnification. - def mandel(realstart imagstart realmag imagmag) - mandelhelp(realstart, realstart+realmag*78, realmag, - imagstart, imagstart+imagmag*40, imagmag); - -Given this, we can try plotting out the mandelbrot set! Lets try it out: - -:: - - ready> mandel(-2.3, -1.3, 0.05, 0.07); - *******************************+++++++++++************************************* - *************************+++++++++++++++++++++++******************************* - **********************+++++++++++++++++++++++++++++**************************** - *******************+++++++++++++++++++++.. ...++++++++************************* - *****************++++++++++++++++++++++.... ...+++++++++*********************** - ***************+++++++++++++++++++++++..... ...+++++++++********************* - **************+++++++++++++++++++++++.... ....+++++++++******************** - *************++++++++++++++++++++++...... .....++++++++******************* - ************+++++++++++++++++++++....... .......+++++++****************** - ***********+++++++++++++++++++.... ... .+++++++***************** - **********+++++++++++++++++....... .+++++++**************** - *********++++++++++++++........... ...+++++++*************** - ********++++++++++++............ ...++++++++************** - ********++++++++++... .......... .++++++++************** - *******+++++++++..... .+++++++++************* - *******++++++++...... ..+++++++++************* - *******++++++....... ..+++++++++************* - *******+++++...... ..+++++++++************* - *******.... .... ...+++++++++************* - *******.... . ...+++++++++************* - *******+++++...... ...+++++++++************* - *******++++++....... ..+++++++++************* - *******++++++++...... .+++++++++************* - *******+++++++++..... ..+++++++++************* - ********++++++++++... .......... .++++++++************** - ********++++++++++++............ ...++++++++************** - *********++++++++++++++.......... ...+++++++*************** - **********++++++++++++++++........ .+++++++**************** - **********++++++++++++++++++++.... ... ..+++++++**************** - ***********++++++++++++++++++++++....... .......++++++++***************** - ************+++++++++++++++++++++++...... ......++++++++****************** - **************+++++++++++++++++++++++.... ....++++++++******************** - ***************+++++++++++++++++++++++..... ...+++++++++********************* - *****************++++++++++++++++++++++.... ...++++++++*********************** - *******************+++++++++++++++++++++......++++++++************************* - *********************++++++++++++++++++++++.++++++++*************************** - *************************+++++++++++++++++++++++******************************* - ******************************+++++++++++++************************************ - ******************************************************************************* - ******************************************************************************* - ******************************************************************************* - Evaluated to 0.000000 - ready> mandel(-2, -1, 0.02, 0.04); - **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ - ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. - *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... - *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... - ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ - **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... - ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. - ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . - **********++++++++++++++++++++++++++++++++++++++++++++++............. - ********+++++++++++++++++++++++++++++++++++++++++++.................. - *******+++++++++++++++++++++++++++++++++++++++....................... - ******+++++++++++++++++++++++++++++++++++........................... - *****++++++++++++++++++++++++++++++++............................ - *****++++++++++++++++++++++++++++............................... - ****++++++++++++++++++++++++++...... ......................... - ***++++++++++++++++++++++++......... ...... ........... - ***++++++++++++++++++++++............ - **+++++++++++++++++++++.............. - **+++++++++++++++++++................ - *++++++++++++++++++................. - *++++++++++++++++............ ... - *++++++++++++++.............. - *+++....++++................ - *.......... ........... - * - *.......... ........... - *+++....++++................ - *++++++++++++++.............. - *++++++++++++++++............ ... - *++++++++++++++++++................. - **+++++++++++++++++++................ - **+++++++++++++++++++++.............. - ***++++++++++++++++++++++............ - ***++++++++++++++++++++++++......... ...... ........... - ****++++++++++++++++++++++++++...... ......................... - *****++++++++++++++++++++++++++++............................... - *****++++++++++++++++++++++++++++++++............................ - ******+++++++++++++++++++++++++++++++++++........................... - *******+++++++++++++++++++++++++++++++++++++++....................... - ********+++++++++++++++++++++++++++++++++++++++++++.................. - Evaluated to 0.000000 - ready> mandel(-0.9, -1.4, 0.02, 0.03); - ******************************************************************************* - ******************************************************************************* - ******************************************************************************* - **********+++++++++++++++++++++************************************************ - *+++++++++++++++++++++++++++++++++++++++*************************************** - +++++++++++++++++++++++++++++++++++++++++++++********************************** - ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** - ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* - +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** - +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* - +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** - +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** - ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ - +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** - ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** - ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** - +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** - ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** - ++++++++++++++++++++........... .........++++++++++++++++++++++* - ++++++++++++++++++............ ...........++++++++++++++++++++ - ++++++++++++++++............... .............++++++++++++++++++ - ++++++++++++++................. ...............++++++++++++++++ - ++++++++++++.................. .................++++++++++++++ - +++++++++.................. .................+++++++++++++ - ++++++........ . ......... ..++++++++++++ - ++............ ...... ....++++++++++ - .............. ...++++++++++ - .............. ....+++++++++ - .............. .....++++++++ - ............. ......++++++++ - ........... .......++++++++ - ......... ........+++++++ - ......... ........+++++++ - ......... ....+++++++ - ........ ...+++++++ - ....... ...+++++++ - ....+++++++ - .....+++++++ - ....+++++++ - ....+++++++ - ....+++++++ - Evaluated to 0.000000 - ready> ^D - -At this point, you may be starting to realize that Kaleidoscope is a -real and powerful language. It may not be self-similar :), but it can be -used to plot things that are! - -With this, we conclude the "adding user-defined operators" chapter of -the tutorial. We have successfully augmented our language, adding the -ability to extend the language in the library, and we have shown how -this can be used to build a simple but interesting end-user application -in Kaleidoscope. At this point, Kaleidoscope can build a variety of -applications that are functional and can call functions with -side-effects, but it can't actually define and mutate a variable itself. - -Strikingly, variable mutation is an important feature of some languages, -and it is not at all obvious how to `add support for mutable -variables `_ without having to add an "SSA -construction" phase to your front-end. In the next chapter, we will -describe how you can add variable mutation without building SSA in your -front-end. - -Full Code Listing -================= - -Here is the complete code listing for our running example, enhanced with -the if/then/else and for expressions.. To build this example, use: - -.. code-block:: bash - - # Compile - ocamlbuild toy.byte - # Run - ./toy.byte - -Here is the code: - -\_tags: - :: - - <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) - <*.{byte,native}>: g++, use_llvm, use_llvm_analysis - <*.{byte,native}>: use_llvm_executionengine, use_llvm_target - <*.{byte,native}>: use_llvm_scalar_opts, use_bindings - -myocamlbuild.ml: - .. code-block:: ocaml - - open Ocamlbuild_plugin;; - - ocaml_lib ~extern:true "llvm";; - ocaml_lib ~extern:true "llvm_analysis";; - ocaml_lib ~extern:true "llvm_executionengine";; - ocaml_lib ~extern:true "llvm_target";; - ocaml_lib ~extern:true "llvm_scalar_opts";; - - flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; - dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; - -token.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - - (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) - type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary - -lexer.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - - let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - - and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - - and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | id -> [< 'Token.Ident id; stream >] - - and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] - -ast.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - - (* expr - Base type for all expression nodes. *) - type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - - (* func - This type represents a function definition itself. *) - type func = Function of proto * expr - -parser.ml: - .. code-block:: ocaml - - (*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - - (* binop_precedence - This holds the precedence for each binary operator that is - * defined *) - let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - - (* precedence - Get the precedence of the pending binary operator token. *) - let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) - let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - - (* unary - * ::= primary - * ::= '!' unary *) - and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the unary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - - (* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - - (* definition ::= 'def' prototype expression *) - let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - - (* toplevelexpr ::= expression *) - let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - - (* external ::= 'extern' prototype *) - let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e - -codegen.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - - open Llvm - - exception Error of string - - let context = global_context () - let the_module = create_module context "my cool jit" - let builder = builder context - let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 - let double_type = double_type context - - let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - - let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - - let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e - -toplevel.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - - (* top ::= definition | external | expression | ';' *) - let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream - -toy.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - open Llvm_target - open Llvm_scalar_opts - - let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module - ;; - - main () - -bindings.c - .. code-block:: c - - #include - - /* putchard - putchar that takes a double and returns 0. */ - extern double putchard(double X) { - putchar((char)X); - return 0; - } - - /* printd - printf that takes a double prints it as "%f\n", returning 0. */ - extern double printd(double X) { - printf("%f\n", X); - return 0; - } - -`Next: Extending the language: mutable variables / SSA -construction `_ - diff --git a/llvm/docs/tutorial/OCamlLangImpl7.rst b/llvm/docs/tutorial/OCamlLangImpl7.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl7.rst +++ /dev/null @@ -1,1723 +0,0 @@ -======================================================= -Kaleidoscope: Extending the Language: Mutable Variables -======================================================= - -.. contents:: - :local: - -Chapter 7 Introduction -====================== - -Welcome to Chapter 7 of the "`Implementing a language with -LLVM `_" tutorial. In chapters 1 through 6, we've built a -very respectable, albeit simple, `functional programming -language `_. In our -journey, we learned some parsing techniques, how to build and represent -an AST, how to build LLVM IR, and how to optimize the resultant code as -well as JIT compile it. - -While Kaleidoscope is interesting as a functional language, the fact -that it is functional makes it "too easy" to generate LLVM IR for it. In -particular, a functional language makes it very easy to build LLVM IR -directly in `SSA -form `_. -Since LLVM requires that the input code be in SSA form, this is a very -nice property and it is often unclear to newcomers how to generate code -for an imperative language with mutable variables. - -The short (and happy) summary of this chapter is that there is no need -for your front-end to build SSA form: LLVM provides highly tuned and -well tested support for this, though the way it works is a bit -unexpected for some. - -Why is this a hard problem? -=========================== - -To understand why mutable variables cause complexities in SSA -construction, consider this extremely simple C example: - -.. code-block:: c - - int G, H; - int test(_Bool Condition) { - int X; - if (Condition) - X = G; - else - X = H; - return X; - } - -In this case, we have the variable "X", whose value depends on the path -executed in the program. Because there are two different possible values -for X before the return instruction, a PHI node is inserted to merge the -two values. The LLVM IR that we want for this example looks like this: - -.. code-block:: llvm - - @G = weak global i32 0 ; type of @G is i32* - @H = weak global i32 0 ; type of @H is i32* - - define i32 @test(i1 %Condition) { - entry: - br i1 %Condition, label %cond_true, label %cond_false - - cond_true: - %X.0 = load i32* @G - br label %cond_next - - cond_false: - %X.1 = load i32* @H - br label %cond_next - - cond_next: - %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.2 - } - -In this example, the loads from the G and H global variables are -explicit in the LLVM IR, and they live in the then/else branches of the -if statement (cond\_true/cond\_false). In order to merge the incoming -values, the X.2 phi node in the cond\_next block selects the right value -to use based on where control flow is coming from: if control flow comes -from the cond\_false block, X.2 gets the value of X.1. Alternatively, if -control flow comes from cond\_true, it gets the value of X.0. The intent -of this chapter is not to explain the details of SSA form. For more -information, see one of the many `online -references `_. - -The question for this article is "who places the phi nodes when lowering -assignments to mutable variables?". The issue here is that LLVM -*requires* that its IR be in SSA form: there is no "non-ssa" mode for -it. However, SSA construction requires non-trivial algorithms and data -structures, so it is inconvenient and wasteful for every front-end to -have to reproduce this logic. - -Memory in LLVM -============== - -The 'trick' here is that while LLVM does require all register values to -be in SSA form, it does not require (or permit) memory objects to be in -SSA form. In the example above, note that the loads from G and H are -direct accesses to G and H: they are not renamed or versioned. This -differs from some other compiler systems, which do try to version memory -objects. In LLVM, instead of encoding dataflow analysis of memory into -the LLVM IR, it is handled with `Analysis -Passes <../WritingAnLLVMPass.html>`_ which are computed on demand. - -With this in mind, the high-level idea is that we want to make a stack -variable (which lives in memory, because it is on the stack) for each -mutable object in a function. To take advantage of this trick, we need -to talk about how LLVM represents stack variables. - -In LLVM, all memory accesses are explicit with load/store instructions, -and it is carefully designed not to have (or need) an "address-of" -operator. Notice how the type of the @G/@H global variables is actually -"i32\*" even though the variable is defined as "i32". What this means is -that @G defines *space* for an i32 in the global data area, but its -*name* actually refers to the address for that space. Stack variables -work the same way, except that instead of being declared with global -variable definitions, they are declared with the `LLVM alloca -instruction <../LangRef.html#alloca-instruction>`_: - -.. code-block:: llvm - - define i32 @example() { - entry: - %X = alloca i32 ; type of %X is i32*. - ... - %tmp = load i32* %X ; load the stack value %X from the stack. - %tmp2 = add i32 %tmp, 1 ; increment it - store i32 %tmp2, i32* %X ; store it back - ... - -This code shows an example of how you can declare and manipulate a stack -variable in the LLVM IR. Stack memory allocated with the alloca -instruction is fully general: you can pass the address of the stack slot -to functions, you can store it in other variables, etc. In our example -above, we could rewrite the example to use the alloca technique to avoid -using a PHI node: - -.. code-block:: llvm - - @G = weak global i32 0 ; type of @G is i32* - @H = weak global i32 0 ; type of @H is i32* - - define i32 @test(i1 %Condition) { - entry: - %X = alloca i32 ; type of %X is i32*. - br i1 %Condition, label %cond_true, label %cond_false - - cond_true: - %X.0 = load i32* @G - store i32 %X.0, i32* %X ; Update X - br label %cond_next - - cond_false: - %X.1 = load i32* @H - store i32 %X.1, i32* %X ; Update X - br label %cond_next - - cond_next: - %X.2 = load i32* %X ; Read X - ret i32 %X.2 - } - -With this, we have discovered a way to handle arbitrary mutable -variables without the need to create Phi nodes at all: - -#. Each mutable variable becomes a stack allocation. -#. Each read of the variable becomes a load from the stack. -#. Each update of the variable becomes a store to the stack. -#. Taking the address of a variable just uses the stack address - directly. - -While this solution has solved our immediate problem, it introduced -another one: we have now apparently introduced a lot of stack traffic -for very simple and common operations, a major performance problem. -Fortunately for us, the LLVM optimizer has a highly-tuned optimization -pass named "mem2reg" that handles this case, promoting allocas like this -into SSA registers, inserting Phi nodes as appropriate. If you run this -example through the pass, for example, you'll get: - -.. code-block:: bash - - $ llvm-as < example.ll | opt -mem2reg | llvm-dis - @G = weak global i32 0 - @H = weak global i32 0 - - define i32 @test(i1 %Condition) { - entry: - br i1 %Condition, label %cond_true, label %cond_false - - cond_true: - %X.0 = load i32* @G - br label %cond_next - - cond_false: - %X.1 = load i32* @H - br label %cond_next - - cond_next: - %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.01 - } - -The mem2reg pass implements the standard "iterated dominance frontier" -algorithm for constructing SSA form and has a number of optimizations -that speed up (very common) degenerate cases. The mem2reg optimization -pass is the answer to dealing with mutable variables, and we highly -recommend that you depend on it. Note that mem2reg only works on -variables in certain circumstances: - -#. mem2reg is alloca-driven: it looks for allocas and if it can handle - them, it promotes them. It does not apply to global variables or heap - allocations. -#. mem2reg only looks for alloca instructions in the entry block of the - function. Being in the entry block guarantees that the alloca is only - executed once, which makes analysis simpler. -#. mem2reg only promotes allocas whose uses are direct loads and stores. - If the address of the stack object is passed to a function, or if any - funny pointer arithmetic is involved, the alloca will not be - promoted. -#. mem2reg only works on allocas of `first - class <../LangRef.html#first-class-types>`_ values (such as pointers, - scalars and vectors), and only if the array size of the allocation is - 1 (or missing in the .ll file). mem2reg is not capable of promoting - structs or arrays to registers. Note that the "sroa" pass is - more powerful and can promote structs, "unions", and arrays in many - cases. - -All of these properties are easy to satisfy for most imperative -languages, and we'll illustrate it below with Kaleidoscope. The final -question you may be asking is: should I bother with this nonsense for my -front-end? Wouldn't it be better if I just did SSA construction -directly, avoiding use of the mem2reg optimization pass? In short, we -strongly recommend that you use this technique for building SSA form, -unless there is an extremely good reason not to. Using this technique -is: - -- Proven and well tested: clang uses this technique - for local mutable variables. As such, the most common clients of LLVM - are using this to handle a bulk of their variables. You can be sure - that bugs are found fast and fixed early. -- Extremely Fast: mem2reg has a number of special cases that make it - fast in common cases as well as fully general. For example, it has - fast-paths for variables that are only used in a single block, - variables that only have one assignment point, good heuristics to - avoid insertion of unneeded phi nodes, etc. -- Needed for debug info generation: `Debug information in - LLVM <../SourceLevelDebugging.html>`_ relies on having the address of - the variable exposed so that debug info can be attached to it. This - technique dovetails very naturally with this style of debug info. - -If nothing else, this makes it much easier to get your front-end up and -running, and is very simple to implement. Lets extend Kaleidoscope with -mutable variables now! - -Mutable Variables in Kaleidoscope -================================= - -Now that we know the sort of problem we want to tackle, lets see what -this looks like in the context of our little Kaleidoscope language. -We're going to add two features: - -#. The ability to mutate variables with the '=' operator. -#. The ability to define new variables. - -While the first item is really what this is about, we only have -variables for incoming arguments as well as for induction variables, and -redefining those only goes so far :). Also, the ability to define new -variables is a useful thing regardless of whether you will be mutating -them. Here's a motivating example that shows how we could use these: - -:: - - # Define ':' for sequencing: as a low-precedence operator that ignores operands - # and just returns the RHS. - def binary : 1 (x y) y; - - # Recursive fib, we could do this before. - def fib(x) - if (x < 3) then - 1 - else - fib(x-1)+fib(x-2); - - # Iterative fib. - def fibi(x) - var a = 1, b = 1, c in - (for i = 3, i < x in - c = a + b : - a = b : - b = c) : - b; - - # Call it. - fibi(10); - -In order to mutate variables, we have to change our existing variables -to use the "alloca trick". Once we have that, we'll add our new -operator, then extend Kaleidoscope to support new variable definitions. - -Adjusting Existing Variables for Mutation -========================================= - -The symbol table in Kaleidoscope is managed at code generation time by -the '``named_values``' map. This map currently keeps track of the LLVM -"Value\*" that holds the double value for the named variable. In order -to support mutation, we need to change this slightly, so that it -``named_values`` holds the *memory location* of the variable in -question. Note that this change is a refactoring: it changes the -structure of the code, but does not (by itself) change the behavior of -the compiler. All of these changes are isolated in the Kaleidoscope code -generator. - -At this point in Kaleidoscope's development, it only supports variables -for two things: incoming arguments to functions and the induction -variable of 'for' loops. For consistency, we'll allow mutation of these -variables in addition to other user-defined variables. This means that -these will both need memory locations. - -To start our transformation of Kaleidoscope, we'll change the -``named_values`` map so that it maps to AllocaInst\* instead of Value\*. -Once we do this, the C++ compiler will tell us what parts of the code we -need to update: - -**Note:** the ocaml bindings currently model both ``Value*``'s and -``AllocInst*``'s as ``Llvm.llvalue``'s, but this may change in the future -to be more type safe. - -.. code-block:: ocaml - - let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 - -Also, since we will need to create these alloca's, we'll use a helper -function that ensures that the allocas are created in the entry block of -the function: - -.. code-block:: ocaml - - (* Create an alloca instruction in the entry block of the function. This - * is used for mutable variables etc. *) - let create_entry_block_alloca the_function var_name = - let builder = builder_at (instr_begin (entry_block the_function)) in - build_alloca double_type var_name builder - -This funny looking code creates an ``Llvm.llbuilder`` object that is -pointing at the first instruction of the entry block. It then creates an -alloca with the expected name and returns it. Because all values in -Kaleidoscope are doubles, there is no need to pass in a type to use. - -With this in place, the first functionality change we want to make is to -variable references. In our new scheme, variables live on the stack, so -code generating a reference to them actually needs to produce a load -from the stack slot: - -.. code-block:: ocaml - - let rec codegen_expr = function - ... - | Ast.Variable name -> - let v = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - (* Load the value. *) - build_load v name builder - -As you can see, this is pretty straightforward. Now we need to update -the things that define the variables to set up the alloca. We'll start -with ``codegen_expr Ast.For ...`` (see the `full code listing <#id1>`_ -for the unabridged code): - -.. code-block:: ocaml - - | Ast.For (var_name, start, end_, step, body) -> - let the_function = block_parent (insertion_block builder) in - - (* Create an alloca for the variable in the entry block. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Store the value into the alloca. *) - ignore(build_store start_val alloca builder); - - ... - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name alloca; - - ... - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Reload, increment, and restore the alloca. This handles the case where - * the body of the loop mutates the variable. *) - let cur_var = build_load alloca var_name builder in - let next_var = build_add cur_var step_val "nextvar" builder in - ignore(build_store next_var alloca builder); - ... - -This code is virtually identical to the code `before we allowed mutable -variables `_. The big difference is that -we no longer have to construct a PHI node, and we use load/store to -access the variable as needed. - -To support mutable argument variables, we need to also make allocas for -them. The code for this is also pretty simple: - -.. code-block:: ocaml - - (* Create an alloca for each argument and register the argument in the symbol - * table so that references to it will succeed. *) - let create_argument_allocas the_function proto = - let args = match proto with - | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args - in - Array.iteri (fun i ai -> - let var_name = args.(i) in - (* Create an alloca for this variable. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Store the initial value into the alloca. *) - ignore(build_store ai alloca builder); - - (* Add arguments to variable symbol table. *) - Hashtbl.add named_values var_name alloca; - ) (params the_function) - -For each argument, we make an alloca, store the input value to the -function into the alloca, and register the alloca as the memory location -for the argument. This method gets invoked by ``Codegen.codegen_func`` -right after it sets up the entry block for the function. - -The final missing piece is adding the mem2reg pass, which allows us to -get good codegen once again: - -.. code-block:: ocaml - - let main () = - ... - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Promote allocas to registers. *) - add_memory_to_register_promotion the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combining the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - -It is interesting to see what the code looks like before and after the -mem2reg optimization runs. For example, this is the before/after code -for our recursive fib function. Before the optimization: - -.. code-block:: llvm - - define double @fib(double %x) { - entry: - %x1 = alloca double - store double %x, double* %x1 - %x2 = load double* %x1 - %cmptmp = fcmp ult double %x2, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - - then: ; preds = %entry - br label %ifcont - - else: ; preds = %entry - %x3 = load double* %x1 - %subtmp = fsub double %x3, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - %x4 = load double* %x1 - %subtmp5 = fsub double %x4, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - - ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp - } - -Here there is only one variable (x, the input argument) but you can -still see the extremely simple-minded code generation strategy we are -using. In the entry block, an alloca is created, and the initial input -value is stored into it. Each reference to the variable does a reload -from the stack. Also, note that we didn't modify the if/then/else -expression, so it still inserts a PHI node. While we could make an -alloca for it, it is actually easier to create a PHI node for it, so we -still just make the PHI. - -Here is the code after the mem2reg pass runs: - -.. code-block:: llvm - - define double @fib(double %x) { - entry: - %cmptmp = fcmp ult double %x, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - - then: - br label %ifcont - - else: - %subtmp = fsub double %x, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - %subtmp5 = fsub double %x, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - - ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp - } - -This is a trivial case for mem2reg, since there are no redefinitions of -the variable. The point of showing this is to calm your tension about -inserting such blatent inefficiencies :). - -After the rest of the optimizers run, we get: - -.. code-block:: llvm - - define double @fib(double %x) { - entry: - %cmptmp = fcmp ult double %x, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp ueq double %booltmp, 0.000000e+00 - br i1 %ifcond, label %else, label %ifcont - - else: - %subtmp = fsub double %x, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - %subtmp5 = fsub double %x, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - ret double %addtmp - - ifcont: - ret double 1.000000e+00 - } - -Here we see that the simplifycfg pass decided to clone the return -instruction into the end of the 'else' block. This allowed it to -eliminate some branches and the PHI node. - -Now that all symbol table references are updated to use stack variables, -we'll add the assignment operator. - -New Assignment Operator -======================= - -With our current framework, adding a new assignment operator is really -simple. We will parse it just like any other binary operator, but handle -it internally (instead of allowing the user to define it). The first -step is to set a precedence: - -.. code-block:: ocaml - - let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '=' 2; - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - ... - -Now that the parser knows the precedence of the binary operator, it -takes care of all the parsing and AST generation. We just need to -implement codegen for the assignment operator. This looks like: - -.. code-block:: ocaml - - let rec codegen_expr = function - begin match op with - | '=' -> - (* Special case '=' because we don't want to emit the LHS as an - * expression. *) - let name = - match lhs with - | Ast.Variable name -> name - | _ -> raise (Error "destination of '=' must be a variable") - in - -Unlike the rest of the binary operators, our assignment operator doesn't -follow the "emit LHS, emit RHS, do computation" model. As such, it is -handled as a special case before the other binary operators are handled. -The other strange thing is that it requires the LHS to be a variable. It -is invalid to have "(x+1) = expr" - only things like "x = expr" are -allowed. - -.. code-block:: ocaml - - (* Codegen the rhs. *) - let val_ = codegen_expr rhs in - - (* Lookup the name. *) - let variable = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - ignore(build_store val_ variable builder); - val_ - | _ -> - ... - -Once we have the variable, codegen'ing the assignment is -straightforward: we emit the RHS of the assignment, create a store, and -return the computed value. Returning a value allows for chained -assignments like "X = (Y = Z)". - -Now that we have an assignment operator, we can mutate loop variables -and arguments. For example, we can now run code like this: - -:: - - # Function to print a double. - extern printd(x); - - # Define ':' for sequencing: as a low-precedence operator that ignores operands - # and just returns the RHS. - def binary : 1 (x y) y; - - def test(x) - printd(x) : - x = 4 : - printd(x); - - test(123); - -When run, this example prints "123" and then "4", showing that we did -actually mutate the value! Okay, we have now officially implemented our -goal: getting this to work requires SSA construction in the general -case. However, to be really useful, we want the ability to define our -own local variables, lets add this next! - -User-defined Local Variables -============================ - -Adding var/in is just like any other other extensions we made to -Kaleidoscope: we extend the lexer, the parser, the AST and the code -generator. The first step for adding our new 'var/in' construct is to -extend the lexer. As before, this is pretty trivial, the code looks like -this: - -.. code-block:: ocaml - - type token = - ... - (* var definition *) - | Var - - ... - - and lex_ident buffer = parser - ... - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | "var" -> [< 'Token.Var; stream >] - ... - -The next step is to define the AST node that we will construct. For -var/in, it looks like this: - -.. code-block:: ocaml - - type expr = - ... - (* variant for var/in. *) - | Var of (string * expr option) array * expr - ... - -var/in allows a list of names to be defined all at once, and each name -can optionally have an initializer value. As such, we capture this -information in the VarNames vector. Also, var/in has a body, this body -is allowed to access the variables defined by the var/in. - -With this in place, we can define the parser pieces. The first thing we -do is add it as a primary expression: - -.. code-block:: ocaml - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr - * ::= varexpr *) - let rec parse_primary = parser - ... - (* varexpr - * ::= 'var' identifier ('=' expression? - * (',' identifier ('=' expression)?)* 'in' expression *) - | [< 'Token.Var; - (* At least one variable name is required. *) - 'Token.Ident id ?? "expected identifier after var"; - init=parse_var_init; - var_names=parse_var_names [(id, init)]; - (* At this point, we have to have 'in'. *) - 'Token.In ?? "expected 'in' keyword after 'var'"; - body=parse_expr >] -> - Ast.Var (Array.of_list (List.rev var_names), body) - - ... - - and parse_var_init = parser - (* read in the optional initializer. *) - | [< 'Token.Kwd '='; e=parse_expr >] -> Some e - | [< >] -> None - - and parse_var_names accumulator = parser - | [< 'Token.Kwd ','; - 'Token.Ident id ?? "expected identifier list after var"; - init=parse_var_init; - e=parse_var_names ((id, init) :: accumulator) >] -> e - | [< >] -> accumulator - -Now that we can parse and represent the code, we need to support -emission of LLVM IR for it. This code starts out with: - -.. code-block:: ocaml - - let rec codegen_expr = function - ... - | Ast.Var (var_names, body) - let old_bindings = ref [] in - - let the_function = block_parent (insertion_block builder) in - - (* Register all variables and emit their initializer. *) - Array.iter (fun (var_name, init) -> - -Basically it loops over all the variables, installing them one at a -time. For each variable we put into the symbol table, we remember the -previous value that we replace in OldBindings. - -.. code-block:: ocaml - - (* Emit the initializer before adding the variable to scope, this - * prevents the initializer from referencing the variable itself, and - * permits stuff like this: - * var a = 1 in - * var a = a in ... # refers to outer 'a'. *) - let init_val = - match init with - | Some init -> codegen_expr init - (* If not specified, use 0.0. *) - | None -> const_float double_type 0.0 - in - - let alloca = create_entry_block_alloca the_function var_name in - ignore(build_store init_val alloca builder); - - (* Remember the old variable binding so that we can restore the binding - * when we unrecurse. *) - - begin - try - let old_value = Hashtbl.find named_values var_name in - old_bindings := (var_name, old_value) :: !old_bindings; - with Not_found > () - end; - - (* Remember this binding. *) - Hashtbl.add named_values var_name alloca; - ) var_names; - -There are more comments here than code. The basic idea is that we emit -the initializer, create the alloca, then update the symbol table to -point to it. Once all the variables are installed in the symbol table, -we evaluate the body of the var/in expression: - -.. code-block:: ocaml - - (* Codegen the body, now that all vars are in scope. *) - let body_val = codegen_expr body in - -Finally, before returning, we restore the previous variable bindings: - -.. code-block:: ocaml - - (* Pop all our variables from scope. *) - List.iter (fun (var_name, old_value) -> - Hashtbl.add named_values var_name old_value - ) !old_bindings; - - (* Return the body computation. *) - body_val - -The end result of all of this is that we get properly scoped variable -definitions, and we even (trivially) allow mutation of them :). - -With this, we completed what we set out to do. Our nice iterative fib -example from the intro compiles and runs just fine. The mem2reg pass -optimizes all of our stack variables into SSA registers, inserting PHI -nodes where needed, and our front-end remains simple: no "iterated -dominance frontier" computation anywhere in sight. - -Full Code Listing -================= - -Here is the complete code listing for our running example, enhanced with -mutable variables and var/in support. To build this example, use: - -.. code-block:: bash - - # Compile - ocamlbuild toy.byte - # Run - ./toy.byte - -Here is the code: - -\_tags: - :: - - <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) - <*.{byte,native}>: g++, use_llvm, use_llvm_analysis - <*.{byte,native}>: use_llvm_executionengine, use_llvm_target - <*.{byte,native}>: use_llvm_scalar_opts, use_bindings - -myocamlbuild.ml: - .. code-block:: ocaml - - open Ocamlbuild_plugin;; - - ocaml_lib ~extern:true "llvm";; - ocaml_lib ~extern:true "llvm_analysis";; - ocaml_lib ~extern:true "llvm_executionengine";; - ocaml_lib ~extern:true "llvm_target";; - ocaml_lib ~extern:true "llvm_scalar_opts";; - - flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; - dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; - -token.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - - (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) - type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary - - (* var definition *) - | Var - -lexer.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - - let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - - and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - - and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | "var" -> [< 'Token.Var; stream >] - | id -> [< 'Token.Ident id; stream >] - - and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] - -ast.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - - (* expr - Base type for all expression nodes. *) - type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - - (* variant for var/in. *) - | Var of (string * expr option) array * expr - - (* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) - type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - - (* func - This type represents a function definition itself. *) - type func = Function of proto * expr - -parser.ml: - .. code-block:: ocaml - - (*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - - (* binop_precedence - This holds the precedence for each binary operator that is - * defined *) - let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - - (* precedence - Get the precedence of the pending binary operator token. *) - let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - - (* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr - * ::= varexpr *) - let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - (* varexpr - * ::= 'var' identifier ('=' expression? - * (',' identifier ('=' expression)?)* 'in' expression *) - | [< 'Token.Var; - (* At least one variable name is required. *) - 'Token.Ident id ?? "expected identifier after var"; - init=parse_var_init; - var_names=parse_var_names [(id, init)]; - (* At this point, we have to have 'in'. *) - 'Token.In ?? "expected 'in' keyword after 'var'"; - body=parse_expr >] -> - Ast.Var (Array.of_list (List.rev var_names), body) - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - - (* unary - * ::= primary - * ::= '!' unary *) - and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - - (* binoprhs - * ::= ('+' primary)* *) - and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - - and parse_var_init = parser - (* read in the optional initializer. *) - | [< 'Token.Kwd '='; e=parse_expr >] -> Some e - | [< >] -> None - - and parse_var_names accumulator = parser - | [< 'Token.Kwd ','; - 'Token.Ident id ?? "expected identifier list after var"; - init=parse_var_init; - e=parse_var_names ((id, init) :: accumulator) >] -> e - | [< >] -> accumulator - - (* expression - * ::= primary binoprhs *) - and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - - (* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) - let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - - (* definition ::= 'def' prototype expression *) - let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - - (* toplevelexpr ::= expression *) - let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - - (* external ::= 'extern' prototype *) - let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e - -codegen.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - - open Llvm - - exception Error of string - - let context = global_context () - let the_module = create_module context "my cool jit" - let builder = builder context - let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 - let double_type = double_type context - - (* Create an alloca instruction in the entry block of the function. This - * is used for mutable variables etc. *) - let create_entry_block_alloca the_function var_name = - let builder = builder_at context (instr_begin (entry_block the_function)) in - build_alloca double_type var_name builder - - let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - let v = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - (* Load the value. *) - build_load v name builder - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - begin match op with - | '=' -> - (* Special case '=' because we don't want to emit the LHS as an - * expression. *) - let name = - match lhs with - | Ast.Variable name -> name - | _ -> raise (Error "destination of '=' must be a variable") - in - - (* Codegen the rhs. *) - let val_ = codegen_expr rhs in - - (* Lookup the name. *) - let variable = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - ignore(build_store val_ variable builder); - val_ - | _ -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Output this as: - * var = alloca double - * ... - * start = startexpr - * store start -> var - * goto loop - * loop: - * ... - * bodyexpr - * ... - * loopend: - * step = stepexpr - * endcond = endexpr - * - * curvar = load var - * nextvar = curvar + step - * store nextvar -> var - * br endcond, loop, endloop - * outloop: *) - - let the_function = block_parent (insertion_block builder) in - - (* Create an alloca for the variable in the entry block. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Store the value into the alloca. *) - ignore(build_store start_val alloca builder); - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name alloca; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Reload, increment, and restore the alloca. This handles the case where - * the body of the loop mutates the variable. *) - let cur_var = build_load alloca var_name builder in - let next_var = build_add cur_var step_val "nextvar" builder in - ignore(build_store next_var alloca builder); - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - | Ast.Var (var_names, body) -> - let old_bindings = ref [] in - - let the_function = block_parent (insertion_block builder) in - - (* Register all variables and emit their initializer. *) - Array.iter (fun (var_name, init) -> - (* Emit the initializer before adding the variable to scope, this - * prevents the initializer from referencing the variable itself, and - * permits stuff like this: - * var a = 1 in - * var a = a in ... # refers to outer 'a'. *) - let init_val = - match init with - | Some init -> codegen_expr init - (* If not specified, use 0.0. *) - | None -> const_float double_type 0.0 - in - - let alloca = create_entry_block_alloca the_function var_name in - ignore(build_store init_val alloca builder); - - (* Remember the old variable binding so that we can restore the binding - * when we unrecurse. *) - begin - try - let old_value = Hashtbl.find named_values var_name in - old_bindings := (var_name, old_value) :: !old_bindings; - with Not_found -> () - end; - - (* Remember this binding. *) - Hashtbl.add named_values var_name alloca; - ) var_names; - - (* Codegen the body, now that all vars are in scope. *) - let body_val = codegen_expr body in - - (* Pop all our variables from scope. *) - List.iter (fun (var_name, old_value) -> - Hashtbl.add named_values var_name old_value - ) !old_bindings; - - (* Return the body computation. *) - body_val - - let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - - (* Create an alloca for each argument and register the argument in the symbol - * table so that references to it will succeed. *) - let create_argument_allocas the_function proto = - let args = match proto with - | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args - in - Array.iteri (fun i ai -> - let var_name = args.(i) in - (* Create an alloca for this variable. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Store the initial value into the alloca. *) - ignore(build_store ai alloca builder); - - (* Add arguments to variable symbol table. *) - Hashtbl.add named_values var_name alloca; - ) (params the_function) - - let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - (* Add all arguments to the symbol table and create their allocas. *) - create_argument_allocas the_function proto; - - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e - -toplevel.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - - (* top ::= definition | external | expression | ';' *) - let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream - -toy.ml: - .. code-block:: ocaml - - (*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - - open Llvm - open Llvm_executionengine - open Llvm_target - open Llvm_scalar_opts - - let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '=' 2; - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Promote allocas to registers. *) - add_memory_to_register_promotion the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module - ;; - - main () - -bindings.c - .. code-block:: c - - #include - - /* putchard - putchar that takes a double and returns 0. */ - extern double putchard(double X) { - putchar((char)X); - return 0; - } - - /* printd - printf that takes a double prints it as "%f\n", returning 0. */ - extern double printd(double X) { - printf("%f\n", X); - return 0; - } - -`Next: Conclusion and other useful LLVM tidbits `_ - diff --git a/llvm/docs/tutorial/OCamlLangImpl8.rst b/llvm/docs/tutorial/OCamlLangImpl8.rst deleted file mode 100644 --- a/llvm/docs/tutorial/OCamlLangImpl8.rst +++ /dev/null @@ -1,267 +0,0 @@ -====================================================== -Kaleidoscope: Conclusion and other useful LLVM tidbits -====================================================== - -.. contents:: - :local: - -Tutorial Conclusion -=================== - -Welcome to the final chapter of the "`Implementing a language with -LLVM `_" tutorial. In the course of this tutorial, we have -grown our little Kaleidoscope language from being a useless toy, to -being a semi-interesting (but probably still useless) toy. :) - -It is interesting to see how far we've come, and how little code it has -taken. We built the entire lexer, parser, AST, code generator, and an -interactive run-loop (with a JIT!) by-hand in under 700 lines of -(non-comment/non-blank) code. - -Our little language supports a couple of interesting features: it -supports user defined binary and unary operators, it uses JIT -compilation for immediate evaluation, and it supports a few control flow -constructs with SSA construction. - -Part of the idea of this tutorial was to show you how easy and fun it -can be to define, build, and play with languages. Building a compiler -need not be a scary or mystical process! Now that you've seen some of -the basics, I strongly encourage you to take the code and hack on it. -For example, try adding: - -- **global variables** - While global variables have questional value - in modern software engineering, they are often useful when putting - together quick little hacks like the Kaleidoscope compiler itself. - Fortunately, our current setup makes it very easy to add global - variables: just have value lookup check to see if an unresolved - variable is in the global variable symbol table before rejecting it. - To create a new global variable, make an instance of the LLVM - ``GlobalVariable`` class. -- **typed variables** - Kaleidoscope currently only supports variables - of type double. This gives the language a very nice elegance, because - only supporting one type means that you never have to specify types. - Different languages have different ways of handling this. The easiest - way is to require the user to specify types for every variable - definition, and record the type of the variable in the symbol table - along with its Value\*. -- **arrays, structs, vectors, etc** - Once you add types, you can start - extending the type system in all sorts of interesting ways. Simple - arrays are very easy and are quite useful for many different - applications. Adding them is mostly an exercise in learning how the - LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction - works: it is so nifty/unconventional, it `has its own - FAQ <../GetElementPtr.html>`_! If you add support for recursive types - (e.g. linked lists), make sure to read the `section in the LLVM - Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that - describes how to construct them. -- **standard runtime** - Our current language allows the user to access - arbitrary external functions, and we use it for things like "printd" - and "putchard". As you extend the language to add higher-level - constructs, often these constructs make the most sense if they are - lowered to calls into a language-supplied runtime. For example, if - you add hash tables to the language, it would probably make sense to - add the routines to a runtime, instead of inlining them all the way. -- **memory management** - Currently we can only access the stack in - Kaleidoscope. It would also be useful to be able to allocate heap - memory, either with calls to the standard libc malloc/free interface - or with a garbage collector. If you would like to use garbage - collection, note that LLVM fully supports `Accurate Garbage - Collection <../GarbageCollection.html>`_ including algorithms that - move objects and need to scan/update the stack. -- **debugger support** - LLVM supports generation of `DWARF Debug - info <../SourceLevelDebugging.html>`_ which is understood by common - debuggers like GDB. Adding support for debug info is fairly - straightforward. The best way to understand it is to compile some - C/C++ code with "``clang -g -O0``" and taking a look at what it - produces. -- **exception handling support** - LLVM supports generation of `zero - cost exceptions <../ExceptionHandling.html>`_ which interoperate with - code compiled in other languages. You could also generate code by - implicitly making every function return an error value and checking - it. You could also make explicit use of setjmp/longjmp. There are - many different ways to go here. -- **object orientation, generics, database access, complex numbers, - geometric programming, ...** - Really, there is no end of crazy - features that you can add to the language. -- **unusual domains** - We've been talking about applying LLVM to a - domain that many people are interested in: building a compiler for a - specific language. However, there are many other domains that can use - compiler technology that are not typically considered. For example, - LLVM has been used to implement OpenGL graphics acceleration, - translate C++ code to ActionScript, and many other cute and clever - things. Maybe you will be the first to JIT compile a regular - expression interpreter into native code with LLVM? - -Have fun - try doing something crazy and unusual. Building a language -like everyone else always has, is much less fun than trying something a -little crazy or off the wall and seeing how it turns out. If you get -stuck or want to talk about it, feel free to email the `llvm-dev mailing -list `_: it has lots -of people who are interested in languages and are often willing to help -out. - -Before we end this tutorial, I want to talk about some "tips and tricks" -for generating LLVM IR. These are some of the more subtle things that -may not be obvious, but are very useful if you want to take advantage of -LLVM's capabilities. - -Properties of the LLVM IR -========================= - -We have a couple common questions about code in the LLVM IR form - lets -just get these out of the way right now, shall we? - -Target Independence -------------------- - -Kaleidoscope is an example of a "portable language": any program written -in Kaleidoscope will work the same way on any target that it runs on. -Many other languages have this property, e.g. lisp, java, haskell, -javascript, python, etc (note that while these languages are portable, -not all their libraries are). - -One nice aspect of LLVM is that it is often capable of preserving target -independence in the IR: you can take the LLVM IR for a -Kaleidoscope-compiled program and run it on any target that LLVM -supports, even emitting C code and compiling that on targets that LLVM -doesn't support natively. You can trivially tell that the Kaleidoscope -compiler generates target-independent code because it never queries for -any target-specific information when generating code. - -The fact that LLVM provides a compact, target-independent, -representation for code gets a lot of people excited. Unfortunately, -these people are usually thinking about C or a language from the C -family when they are asking questions about language portability. I say -"unfortunately", because there is really no way to make (fully general) -C code portable, other than shipping the source code around (and of -course, C source code is not actually portable in general either - ever -port a really old application from 32- to 64-bits?). - -The problem with C (again, in its full generality) is that it is heavily -laden with target specific assumptions. As one simple example, the -preprocessor often destructively removes target-independence from the -code when it processes the input text: - -.. code-block:: c - - #ifdef __i386__ - int X = 1; - #else - int X = 42; - #endif - -While it is possible to engineer more and more complex solutions to -problems like this, it cannot be solved in full generality in a way that -is better than shipping the actual source code. - -That said, there are interesting subsets of C that can be made portable. -If you are willing to fix primitive types to a fixed size (say int = -32-bits, and long = 64-bits), don't care about ABI compatibility with -existing binaries, and are willing to give up some other minor features, -you can have portable code. This can make sense for specialized domains -such as an in-kernel language. - -Safety Guarantees ------------------ - -Many of the languages above are also "safe" languages: it is impossible -for a program written in Java to corrupt its address space and crash the -process (assuming the JVM has no bugs). Safety is an interesting -property that requires a combination of language design, runtime -support, and often operating system support. - -It is certainly possible to implement a safe language in LLVM, but LLVM -IR does not itself guarantee safety. The LLVM IR allows unsafe pointer -casts, use after free bugs, buffer over-runs, and a variety of other -problems. Safety needs to be implemented as a layer on top of LLVM and, -conveniently, several groups have investigated this. Ask on the `llvm-dev -mailing list `_ if -you are interested in more details. - -Language-Specific Optimizations -------------------------------- - -One thing about LLVM that turns off many people is that it does not -solve all the world's problems in one system (sorry 'world hunger', -someone else will have to solve you some other day). One specific -complaint is that people perceive LLVM as being incapable of performing -high-level language-specific optimization: LLVM "loses too much -information". - -Unfortunately, this is really not the place to give you a full and -unified version of "Chris Lattner's theory of compiler design". Instead, -I'll make a few observations: - -First, you're right that LLVM does lose information. For example, as of -this writing, there is no way to distinguish in the LLVM IR whether an -SSA-value came from a C "int" or a C "long" on an ILP32 machine (other -than debug info). Both get compiled down to an 'i32' value and the -information about what it came from is lost. The more general issue -here, is that the LLVM type system uses "structural equivalence" instead -of "name equivalence". Another place this surprises people is if you -have two types in a high-level language that have the same structure -(e.g. two different structs that have a single int field): these types -will compile down into a single LLVM type and it will be impossible to -tell what it came from. - -Second, while LLVM does lose information, LLVM is not a fixed target: we -continue to enhance and improve it in many different ways. In addition -to adding new features (LLVM did not always support exceptions or debug -info), we also extend the IR to capture important information for -optimization (e.g. whether an argument is sign or zero extended, -information about pointers aliasing, etc). Many of the enhancements are -user-driven: people want LLVM to include some specific feature, so they -go ahead and extend it. - -Third, it is *possible and easy* to add language-specific optimizations, -and you have a number of choices in how to do it. As one trivial -example, it is easy to add language-specific optimization passes that -"know" things about code compiled for a language. In the case of the C -family, there is an optimization pass that "knows" about the standard C -library functions. If you call "exit(0)" in main(), it knows that it is -safe to optimize that into "return 0;" because C specifies what the -'exit' function does. - -In addition to simple library knowledge, it is possible to embed a -variety of other language-specific information into the LLVM IR. If you -have a specific need and run into a wall, please bring the topic up on -the llvm-dev list. At the very worst, you can always treat LLVM as if it -were a "dumb code generator" and implement the high-level optimizations -you desire in your front-end, on the language-specific AST. - -Tips and Tricks -=============== - -There is a variety of useful tips and tricks that you come to know after -working on/with LLVM that aren't obvious at first glance. Instead of -letting everyone rediscover them, this section talks about some of these -issues. - -Implementing portable offsetof/sizeof -------------------------------------- - -One interesting thing that comes up, if you are trying to keep the code -generated by your compiler "target independent", is that you often need -to know the size of some LLVM type or the offset of some field in an -llvm structure. For example, you might need to pass the size of a type -into a function that allocates memory. - -Unfortunately, this can vary widely across targets: for example the -width of a pointer is trivially target-specific. However, there is a -`clever way to use the getelementptr -instruction `_ -that allows you to compute this in a portable way. - -Garbage Collected Stack Frames ------------------------------- - -Some languages want to explicitly manage their stack frames, often so -that they are garbage collected or to allow easy implementation of -closures. There are often better ways to implement these features than -explicit stack frames, but `LLVM does support -them, `_ -if you want. It requires your front-end to convert the code into -`Continuation Passing -Style `_ and -the use of tail calls (which LLVM also supports). - diff --git a/llvm/docs/tutorial/index.rst b/llvm/docs/tutorial/index.rst --- a/llvm/docs/tutorial/index.rst +++ b/llvm/docs/tutorial/index.rst @@ -14,15 +14,12 @@ This is the "Kaleidoscope" Language tutorial, showing how to implement a simple language using LLVM components in C++. -Kaleidoscope: Implementing a Language with LLVM in Objective Caml -================================================================= - .. toctree:: :titlesonly: :glob: :numbered: - OCamlLangImpl* + MyFirstLanguageFrontend/LangImpl* Building a JIT in LLVM =============================================== diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter2/_tags deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter2/_tags +++ /dev/null @@ -1 +0,0 @@ -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/ast.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter2/ast.ml +++ /dev/null @@ -1,25 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/lexer.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter2/lexer.ml +++ /dev/null @@ -1,52 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/parser.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter2/parser.ml +++ /dev/null @@ -1,122 +0,0 @@ -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/token.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter2/token.ml +++ /dev/null @@ -1,15 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/toplevel.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter2/toplevel.ml +++ /dev/null @@ -1,34 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - ignore(Parser.parse_definition stream); - print_endline "parsed a function definition."; - | Token.Extern -> - ignore(Parser.parse_extern stream); - print_endline "parsed an extern."; - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - ignore(Parser.parse_toplevel stream); - print_endline "parsed a top-level expr"; - with Stream.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/toy.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter2/toy.ml +++ /dev/null @@ -1,21 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; -;; - -main () diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter3/_tags deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/_tags +++ /dev/null @@ -1,2 +0,0 @@ -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/ast.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/ast.ml +++ /dev/null @@ -1,25 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/codegen.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/codegen.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/codegen.ml +++ /dev/null @@ -1,100 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - the_function - with e -> - delete_function the_function; - raise e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/lexer.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/lexer.ml +++ /dev/null @@ -1,52 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/myocamlbuild.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/myocamlbuild.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/myocamlbuild.ml +++ /dev/null @@ -1,6 +0,0 @@ -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/parser.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/parser.ml +++ /dev/null @@ -1,122 +0,0 @@ -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/token.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/token.ml +++ /dev/null @@ -1,15 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/toplevel.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/toplevel.ml +++ /dev/null @@ -1,39 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm - -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - dump_value (Codegen.codegen_func e); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/toy.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter3/toy.ml +++ /dev/null @@ -1,26 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter4/_tags deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/_tags +++ /dev/null @@ -1,4 +0,0 @@ -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/ast.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/ast.ml +++ /dev/null @@ -1,25 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/bindings.c b/llvm/examples/OCaml-Kaleidoscope/Chapter4/bindings.c deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/bindings.c +++ /dev/null @@ -1,7 +0,0 @@ -#include - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/codegen.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/codegen.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/codegen.ml +++ /dev/null @@ -1,103 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_fadd lhs_val rhs_val "addtmp" builder - | '-' -> build_fsub lhs_val rhs_val "subtmp" builder - | '*' -> build_fmul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/lexer.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/lexer.ml +++ /dev/null @@ -1,52 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/myocamlbuild.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/myocamlbuild.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/myocamlbuild.ml +++ /dev/null @@ -1,10 +0,0 @@ -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/parser.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/parser.ml +++ /dev/null @@ -1,122 +0,0 @@ -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/token.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/token.ml +++ /dev/null @@ -1,15 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/toplevel.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/toplevel.ml +++ /dev/null @@ -1,49 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter4/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter4/toy.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter4/toy.ml +++ /dev/null @@ -1,53 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter5/_tags deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/_tags +++ /dev/null @@ -1,4 +0,0 @@ -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/ast.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/ast.ml +++ /dev/null @@ -1,31 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/bindings.c b/llvm/examples/OCaml-Kaleidoscope/Chapter5/bindings.c deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/bindings.c +++ /dev/null @@ -1,7 +0,0 @@ -#include - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/codegen.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/codegen.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/codegen.ml +++ /dev/null @@ -1,225 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_fadd lhs_val rhs_val "addtmp" builder - | '-' -> build_fsub lhs_val rhs_val "subtmp" builder - | '*' -> build_fmul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/lexer.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/lexer.ml +++ /dev/null @@ -1,57 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/myocamlbuild.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/myocamlbuild.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/myocamlbuild.ml +++ /dev/null @@ -1,10 +0,0 @@ -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/parser.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/parser.ml +++ /dev/null @@ -1,158 +0,0 @@ -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/token.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/token.ml +++ /dev/null @@ -1,19 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/toplevel.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/toplevel.ml +++ /dev/null @@ -1,49 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter5/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter5/toy.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter5/toy.ml +++ /dev/null @@ -1,53 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter6/_tags deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/_tags +++ /dev/null @@ -1,4 +0,0 @@ -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/ast.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/ast.ml +++ /dev/null @@ -1,36 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/bindings.c b/llvm/examples/OCaml-Kaleidoscope/Chapter6/bindings.c deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/bindings.c +++ /dev/null @@ -1,13 +0,0 @@ -#include - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} - -/* printd - printf that takes a double prints it as "%f\n", returning 0. */ -extern double printd(double X) { - printf("%f\n", X); - return 0; -} diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/codegen.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/codegen.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/codegen.ml +++ /dev/null @@ -1,251 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_fadd lhs_val rhs_val "addtmp" builder - | '-' -> build_fsub lhs_val rhs_val "subtmp" builder - | '*' -> build_fmul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - -let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/lexer.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/lexer.ml +++ /dev/null @@ -1,59 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/myocamlbuild.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/myocamlbuild.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/myocamlbuild.ml +++ /dev/null @@ -1,10 +0,0 @@ -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/parser.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/parser.ml +++ /dev/null @@ -1,195 +0,0 @@ -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the unary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/token.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/token.ml +++ /dev/null @@ -1,22 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/toplevel.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/toplevel.ml +++ /dev/null @@ -1,49 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter6/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter6/toy.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter6/toy.ml +++ /dev/null @@ -1,53 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter7/_tags deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/_tags +++ /dev/null @@ -1,4 +0,0 @@ -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/ast.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/ast.ml +++ /dev/null @@ -1,39 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - - (* variant for var/in. *) - | Var of (string * expr option) array * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/bindings.c b/llvm/examples/OCaml-Kaleidoscope/Chapter7/bindings.c deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/bindings.c +++ /dev/null @@ -1,13 +0,0 @@ -#include - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} - -/* printd - printf that takes a double prints it as "%f\n", returning 0. */ -extern double printd(double X) { - printf("%f\n", X); - return 0; -} diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/codegen.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/codegen.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/codegen.ml +++ /dev/null @@ -1,370 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -(* Create an alloca instruction in the entry block of the function. This - * is used for mutable variables etc. *) -let create_entry_block_alloca the_function var_name = - let builder = builder_at context (instr_begin (entry_block the_function)) in - build_alloca double_type var_name builder - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - let v = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - (* Load the value. *) - build_load v name builder - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - begin match op with - | '=' -> - (* Special case '=' because we don't want to emit the LHS as an - * expression. *) - let name = - match lhs with - | Ast.Variable name -> name - | _ -> raise (Error "destination of '=' must be a variable") - in - - (* Codegen the rhs. *) - let val_ = codegen_expr rhs in - - (* Lookup the name. *) - let variable = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - ignore(build_store val_ variable builder); - val_ - | _ -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_fadd lhs_val rhs_val "addtmp" builder - | '-' -> build_fsub lhs_val rhs_val "subtmp" builder - | '*' -> build_fmul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Output this as: - * var = alloca double - * ... - * start = startexpr - * store start -> var - * goto loop - * loop: - * ... - * bodyexpr - * ... - * loopend: - * step = stepexpr - * endcond = endexpr - * - * curvar = load var - * nextvar = curvar + step - * store nextvar -> var - * br endcond, loop, endloop - * outloop: *) - - let the_function = block_parent (insertion_block builder) in - - (* Create an alloca for the variable in the entry block. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Store the value into the alloca. *) - ignore(build_store start_val alloca builder); - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name alloca; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Reload, increment, and restore the alloca. This handles the case where - * the body of the loop mutates the variable. *) - let cur_var = build_load alloca var_name builder in - let next_var = build_add cur_var step_val "nextvar" builder in - ignore(build_store next_var alloca builder); - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - | Ast.Var (var_names, body) -> - let old_bindings = ref [] in - - let the_function = block_parent (insertion_block builder) in - - (* Register all variables and emit their initializer. *) - Array.iter (fun (var_name, init) -> - (* Emit the initializer before adding the variable to scope, this - * prevents the initializer from referencing the variable itself, and - * permits stuff like this: - * var a = 1 in - * var a = a in ... # refers to outer 'a'. *) - let init_val = - match init with - | Some init -> codegen_expr init - (* If not specified, use 0.0. *) - | None -> const_float double_type 0.0 - in - - let alloca = create_entry_block_alloca the_function var_name in - ignore(build_store init_val alloca builder); - - (* Remember the old variable binding so that we can restore the binding - * when we unrecurse. *) - begin - try - let old_value = Hashtbl.find named_values var_name in - old_bindings := (var_name, old_value) :: !old_bindings; - with Not_found -> () - end; - - (* Remember this binding. *) - Hashtbl.add named_values var_name alloca; - ) var_names; - - (* Codegen the body, now that all vars are in scope. *) - let body_val = codegen_expr body in - - (* Pop all our variables from scope. *) - List.iter (fun (var_name, old_value) -> - Hashtbl.add named_values var_name old_value - ) !old_bindings; - - (* Return the body computation. *) - body_val - -let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -(* Create an alloca for each argument and register the argument in the symbol - * table so that references to it will succeed. *) -let create_argument_allocas the_function proto = - let args = match proto with - | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args - in - Array.iteri (fun i ai -> - let var_name = args.(i) in - (* Create an alloca for this variable. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Store the initial value into the alloca. *) - ignore(build_store ai alloca builder); - - (* Add arguments to variable symbol table. *) - Hashtbl.add named_values var_name alloca; - ) (params the_function) - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - (* Add all arguments to the symbol table and create their allocas. *) - create_argument_allocas the_function proto; - - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/lexer.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/lexer.ml +++ /dev/null @@ -1,60 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | "var" -> [< 'Token.Var; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/myocamlbuild.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/myocamlbuild.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/myocamlbuild.ml +++ /dev/null @@ -1,10 +0,0 @@ -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/parser.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/parser.ml +++ /dev/null @@ -1,221 +0,0 @@ -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr - * ::= varexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - (* varexpr - * ::= 'var' identifier ('=' expression? - * (',' identifier ('=' expression)?)* 'in' expression *) - | [< 'Token.Var; - (* At least one variable name is required. *) - 'Token.Ident id ?? "expected identifier after var"; - init=parse_var_init; - var_names=parse_var_names [(id, init)]; - (* At this point, we have to have 'in'. *) - 'Token.In ?? "expected 'in' keyword after 'var'"; - body=parse_expr >] -> - Ast.Var (Array.of_list (List.rev var_names), body) - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -and parse_var_init = parser - (* read in the optional initializer. *) - | [< 'Token.Kwd '='; e=parse_expr >] -> Some e - | [< >] -> None - -and parse_var_names accumulator = parser - | [< 'Token.Kwd ','; - 'Token.Ident id ?? "expected identifier list after var"; - init=parse_var_init; - e=parse_var_names ((id, init) :: accumulator) >] -> e - | [< >] -> accumulator - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/token.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/token.ml +++ /dev/null @@ -1,25 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary - - (* var definition *) - | Var diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/toplevel.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/toplevel.ml +++ /dev/null @@ -1,49 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter7/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter7/toy.ml deleted file mode 100644 --- a/llvm/examples/OCaml-Kaleidoscope/Chapter7/toy.ml +++ /dev/null @@ -1,57 +0,0 @@ -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '=' 2; - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Promote allocas to registers. *) - add_memory_to_register_promotion the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main ()