Abstract Syntax Trees

CSC 310 - Programming Languages

Review of Parsing

  • Given a language L(G), a parser consumes a sequence of tokens \(s\) and produces a parse tree

  • Issues:

    • How do we recognize that \(s \in L(G)\)?

    • A parse tree of \(s\) describes how \(s \in L(G)\)

    • Ambiguity: more than one parse tree for some string \(s\)

    • Error: no parse tree for some string \(s\)

    • How do we construct the parse tree?

Abstract Syntax Trees

  • So far, a parser traces the derivation of a sequence of tokens

  • The rest of the compiler needs a structural representation of the program

  • Abstract syntax trees (ASTs) are like parse trees, but ignore some details

Abstract Syntax Trees

  • Consider the grammar \[E \rightarrow int | (E) | E + E\]

  • and the string \[5 + (2 + 3)\]

  • After lexical analysis (a list of tokens) \[int(5), plus, lparen, int(2), plus, int(3), rparen\]

  • During parsing, we build a parse tree \(\ldots\)

Example of Parse Tree

  • Traces the operation of the parser

  • Captures the nesting structure

  • But has too much info, for example parentheses

Example of AST

  • Also captures the nesting structure

  • But abstracts from the concrete syntax making it more compact and easier to use

  • An important data structure in a compiler

Semantic Actions

  • Each grammar symbol may have attributes

    • An attribute is a property of a programming language construct

    • For terminal symbols attributes can be calculated by the lexer

  • Each production may have an action

    • Written as: \(X \rightarrow Y_1 \ldots Y_2 \{action\}\)

    • That can refer to or compute symbol attributes

  • This is what we will use to construct ASTs

Semantic Actions: Example

  • Consider the grammar \[E \rightarrow int | (E) | E + E\]

  • For each symbol \(X\) define an attribute \(X.val\)

    • For terminals, \(val\) is the associated lexeme

    • For non-terminals, \(val\) is the expression’s value

  • We annotate the grammar with actions: \[\begin{aligned} E & \rightarrow int &\{&E.val = int.val\}\\ & \quad \vert \; (E_1) &\{&E.val = E_1.val\}\\ & \quad \vert \; E_1 + E_2 &\{&E.val = E_1.val + E_2.val\}\\ \end{aligned}\]

Semantic Actions: Example Continued

  • String: \(5 + (2 + 3)\)

  • Tokens: int(5), plus, lparen, int(2), plus, int(3), rparen

    Productions Equations
    \(E \rightarrow E_1 + E_2\) \(E.val = E_1.val + E_2.val\)
    \(E_1 \rightarrow int(5)\) \(E_1.val = int(5).val = 5\)
    \(E_2 \rightarrow (E_3)\) \(E_2.val = E_3.val\)
    \(E_3 \rightarrow E_4 + E_5\) \(E_3.val = E_4.val + E_5.val\)
    \(E_4 \rightarrow int(2)\) \(E_4.val = int(2).val = 2\)
    \(E_5 \rightarrow int(3)\) \(E_5.val = int(3).val = 3\)

Semantic Actions: Dependencies

  • Semantic actions specify a system of equations, but the order of executing the actions is not specified

  • Example: \[E_3.val = E_4.val + E_5.val\]

    • Must compute \(E_4.val\) and \(E_5.val\) before \(E_3.val\)

    • We say that \(E_3.val\) depends on \(E_4.val\) and \(E_5.val\)

  • The parser must find the order of evaluation

Evaluating Attributes

  • An attribute must be computed after all its successors in the dependency graph have been computed

  • Such an order exists when there are no cycles

  • In the previous example, attributes can be computed bottom-up

Types of Attributes

  • Synthesized attributes

    • Calculated from attributes of descendants in the parse tree

    • \(E.val\) is a synthesized attribute

    • Can always be calculated in a bottom-up order

  • Grammars with only synthesized attributes are called S-attributed grammars

  • Inherited attributes

    • Calculated from attributes of the parent node(s) and/or siblings in the parse tree

Example: Line Calculator

  • Each line contains an expression \[E \rightarrow int \; \vert \; E + E\]

  • Each line is terminated with the \(=\) sign \[L \rightarrow E = \; \vert \; + E =\]

  • In the second form, the value of evaluating the previous line is used as a starting value

  • A program is a sequence of lines \[P \rightarrow \epsilon \; \vert \; P L\]

Attributes for the Line Calculator

  • Each \(E\) has a synthesized attribute \(val\)

  • Each \(L\) has a synthesized attribute \(val\) \[\begin{aligned} L & \rightarrow E = &\{&L.val = E.val\}\\ & \quad \vert \; + E = &\{&L.val = E.val + L.prev\}\\ \end{aligned}\]

  • We need the value of the previous line

  • We use an inherited attribute \(L.prev\)

Attributes for the Line Calculator

  • Each \(P\) has a synthesized attribute \(val\) \[\begin{aligned} P & \rightarrow \epsilon &\{&P.val = 0\}\\ & \quad \vert \; P_1 L &\{&P.val = L.val;\\ & & &L.prev = P_1.val \}\\ \end{aligned}\]

  • Each \(L\) has an inherited attribute \(prev\)

    • \(L.prev\) is inherited from sibling \(P_1.val\)

Semantic Actions: Notes

  • Semantic actions can be used to build ASTs

  • And many other things, such as, type checking and code generation

  • This process is called syntax-directed translation – a substantial generalization over context-free grammars

Constructing an AST

  • We first define the AST data type

  • Consider an abstract tree type with two constructors:

    • mkleaf(n)

    • mkplus(left_tree, right_tree)

Constructing a Parse Tree

  • We define a synthesized attribute \(ast\)

    • Values of \(ast\) values are ASTs

    • We assume that \(int.lexval\) is the value of the integer lexeme

    • Computed using semantic actions

    \[\begin{aligned} E & \rightarrow int &\{&E.ast = makeleaf(int.val)\}\\ & \quad \vert \; (E_1) &\{&E.ast = E_1.ast\}\\ & \quad \vert \; E_1 + E_2 &\{&E.ast = mkplus(E_1.ast, E_2.ast)\}\\ \end{aligned}\]

Parse Tree Example

  • Consider the string: \(5 + (2 + 3)\)

  • A bottom-up evaluation of the \(ast\) attribute: \[\begin{aligned} E.ast = mkplus(&mkleaf(5),\\ &mkplus(mkleaf(2), mkleaf(3)))\\ \end{aligned}\]

Review of Abstract Syntax Trees

  • We can specify language syntax using a context-free grammar

  • A parser will answer whether \(s \in L(G)\)

  • \(\ldots\) and will build a parse tree

  • \(\ldots\) which we convert to an AST

  • \(\ldots\) and pass on to the next phase