Scoping and Type Checking

CSC 310 - Programming Languages

Outline

  • The role of semantic analysis in a compiler

  • Scope

    • static vs. dynamic scoping

    • implementation: symbol tables

  • Types

    • static analyses that detect type errors

    • statically vs. dynamically typed languages

The Compiler Front-End

  • Lexical analysis: the program is lexically well-formed

    • tokens are legal

    • detects inputs with illegal tokens

  • Parsing

    • declarations have correct structure, expressions are syntactically valid, etc.

    • detects inputs with ill-formed syntax

  • Semantic analysis

    • last "front end" compilation phase

    • catches all remaining errors

Beyond Syntax Errors

  • Example C program semantic errors:
foo(int a, char *s){...}

int bar() {
  int f[3];
  int i, j, k;
  char q, *p;
  float k;
  foo(f[6], 10, j);
  break;
  i->val = 42;
  j = m + k;
  printf("%s,%s.\n",p,q);
  goto label42;
}

Beyond Syntax Errors (continued)

  • Example C program semantic errors:

    • Undeclared identifier

    • Multiple declarations of identifier

    • Index out of bounds

    • Incorrect number or types of arguments to function call

    • Incompatible types for operation

    • A break statement outside of a loop

    • A goto with no label

Why Have a Separate Semantic Analysis Phase?

  • Parsing cannot catch some errors

  • Some language constructs are not context-free

    • Example: All used variables must have been declared (that is, scoping)

    • Example: A method must be invoked with arguments of proper type (that is, typing)

What Does Semantic Analysis Do?

  • Performs checks beyond syntax of many kinds

  • Examples for Cool:

    • All used identifiers are declared
    • Static types
    • Inheritance relationships
    • Classes defined only once
    • Methods in a class defined only once
    • Reserved identifiers are not misused
  • The requirements depend on the language

Scope

  • The scope of an identifier (a binding of a name to the entity it names) is the textual part of the program in which the binding is active

  • Scope matches identifier declarations with uses, an important static analysis step in most languages

  • The scope of an identifier is the portion of a program in which that identifier is accessible

  • The same identifier may refer to different things in different parts of the program

  • An identifier may have restricted scope

Static vs. Dynamic Scope

  • Most languages have static (lexical) scope

    • Scope depends only on the physical structure of program text, not its run-time behavior

    • The determination of scope is made by the compiler

  • A few languages are dynamically scoped

    • Scope depends on execution of the program

Static Scoping Example

  • Uses of x refer to the closest enclosing function

    let integer x := 0 in
    {
      x;
      let integer x := 1 in
        x;
      x;
    }

Static vs. Dynamic Scope

  • Example

    program scopes(input, output);
    var a: integer;
    procedure first;
      begin a := 1; end;
    procedure second;
      var a: integer;
      begin first; end;
    begin
      a := 2; second; write(a);
    end.
  • With static scope, the result is 2

  • With dynamic scope, the result is 1

Scope in Cool

  • Cool identifier bindings are introduced by:

    • Class declarations (introduce class names)
    • Method definitions (introduce method names)
    • Let expressions (introduce object identifiers)
    • Formal parameters (introduce object identifiers)
    • Attribute definitions in a class (introduce object identifiers)
    • Case expressions (introduce object identifiers)

Scope of Identifiers

  • In most programming languages identifier bindings are introduced by

    • Function declarations (introduce function names)

    • Procedure definitions (introduce procedure names)

    • Identifier declarations (introduce identifiers)

    • Formal parameters (introduce identifiers)

Implementing the Most Closely Nested Rule

  • Much of semantic analysis can be expressed as a recursive descent of an AST

    • Process an AST node \(n\)

    • Process the children of \(n\)

    • Finish processing node \(n\)

  • When performing semantic analysis on a portion of the AST, we need to know which identifiers are defined.

Implementing the Most Closely Nested Rule

  • Example: the scope of variable declarations is one subtree

    let x : Int <- 0 in E
  • x can be used in subtree E

Symbol Tables

  • Purpose: to hold information about identifiers that is computed at some point and looked up at later times during compilation

  • Example information:

    • type of a variable

    • entry point for a function

  • Operations: insert, lookup, delete

  • Common implementations: linked lists, hash tables

Symbol Tables

  • Assuming static scope, consider again

    let x : Int <- 1 in E
  • Idea:

    • before processing E, add a definition of x to the current definitions, overriding any other definition of x

    • after processing E, remove the definition of x and, if needed, restore old definition of x

  • A symbol table is a data structure that tracks the current bindings of identifiers

Scope in Cool

  • Not all kinds of identifiers follow the most-closely nested rule

  • For example, class definitions in Cool

    • Cannot be nested
    • Are globally visible throughout the program
  • In other words, a class name can be used before it is defined

Scope in Cool (Continued)

  • Attribute names are global within the class in which they are defined

    Class Foo {
        f(): Int { tm };
        tm : Int <- 0;
    };

Scope in Cool (Continued)

  • Method and attribute names have complex rules

  • A method need not be defined in the class in which it is used, but in some parent class (this is standard inheritance)

  • Methods may also be redefined (overridden)

Class Definitions

  • Class names can be used before being defined

  • We cannot check this property

    • using a symbol table
    • or even in one pass
  • Solution

    • Pass 1: Collect all class names
    • Pass 2: Do the checking
  • Semantic analysis requires multiple passes (probably more that two)

Types

  • What is a type?

    • This is the subject of some debate

    • The notion varies from language to language

  • Consensus

    • A type is a set of values and

    • A set of operations on those values

  • Classes are one instantiation of the modern notion of type

Types and Operations

  • Consider the assembly language fragment

    addi $r1, $r2, $r3

    What are the types of $r1, $r2, and $r3?

  • Certain operations are legal for values of each type

    • It does not make sense to add a function pointer and an integer in C

    • It does make sense to add two integers

    • But, both have the same assembly language implementation

Type Systems

  • A language’s type system specifies which operations are valid for which types

  • The goal of type checking is to ensure that operations are used with the correct types

    • Enforces intended interpretation of values, because nothing else will
  • Type systems provide a concise formalization of the semantic checking rules

What Can Types do For Us?

  • Allow for a more efficient compilation of programs

    • Allocate the correct amount of space for variables

    • Select the correct machine instructions

  • Statically detect certain kinds of errors

    • Memory errors (reading from an invalid pointer, etc.)

    • Violation of abstraction boundaries

    • Security and access rights violations

Type Checking Overview

  • Three kinds of languages

    • Statically typed: all or almost all checking of types is done as part of compilation

    • Dynamically typed: almost all checking of types is done as part of program execution

    • Untyped: no checking (machine code)

The Type Wars

  • Competing views on static vs. dynamic typing

  • Static typing proponents say:

    • Static checking catches many programming errors at compile time

    • Avoids overhead of runtime type checks

  • Dynamic typing proponents say:

    • Static type systems are restrictive

    • Rapid protoyping is easier in a dynamic type system

Cool Types

  • The types are:

    • Class names
    • SELF_TYPE
  • There are no unboxed base types

  • The user declares types for all identifiers

  • The compiler infers types for expressions

Type Checking and Type Inference

  • Type checking is the process of verifying fully typed programs

  • Type inference is the process of filling in missing type information

  • The two are different, but are often used interchangeably

Rules of Inference

  • We have seen two examples of formal notation for specifying parts of a compiler

    • Regular expressions (for the lexer)

    • Context-free grammars (for the parser)

  • The appropriate formalism for type checking is logical rules of inference

Why Rules of Inference?

  • Inference rues have the form: If Hypothesis is true, then Conclusion is true

  • Type checking computes via reasoning: If \(E_1\) and \(E_2\) have certain types, then \(E_3\) has a certain type

  • Rules of inference are a compact notation for “If-Then” statements

From English to an Inference Rule

  • The notation is easy to read (with practice)

  • Start with a simplified system and gradually add features

  • Building blocks:

    • Symbol \(\land\) is “and”

    • Symbol \(\Rightarrow\) is “if-then”

    • \(x:T\) is “\(x\)” has type “\(T\)

  • Example:

    • If \(e_1\) has type \(int\) and \(e_2\) has type \(int\), then \(e_1 + e_2\) has type \(int\)

    • \((e_1\) has type \(int \land e_2\) has type \(int) \Rightarrow e_1 + e_2\) has type \(int\)

    • \((e_1:int \land e_2:int) \Rightarrow e_1 + e_2 : int\)

  • The statement \((e_1:int \land e_2:int) \Rightarrow e_1 + e_2 : int\) is a special case of \(H_1 \land \ldots \land H_n \Rightarrow C\); this is an inference rule

Notation for Inference Rules

  • By tradition, inference rules are written \[\frac{\vdash Hypothesis_1 \ldots \vdash Hypothesis_n}{\vdash Conclusion}\]

  • Type rules have hypotheses and conclusions of the form: \[\vdash e : T\]

  • \(\vdash\) means “it is provable that …”

Example Rules

  • Example \[\frac{i \text{ is an integer}}{\vdash i : Int}\text{[Int]}\]

    \[\frac{ \begin{array}{l} \vdash e_1 : Int\\ \vdash e_2 : Int \end{array}} {\vdash e_1 + e_2 : Int}\text{[Add]}\]

  • Thes rules give templates describing how to type integers and \(+\) expressions

  • By filling in the templates, we can produce complete typings for expressions

Example: 1 + 2

\[\frac{ \begin{array}{l} \vdash 1 : Int\\ \vdash 2 : Int \end{array}} {\vdash 1 + 2 : int}\text{[Add]}\]

Summary

  • Scoping rules match identifier uses with identifier definitions

  • A type is a set of values coupled with a set of operations on those values

  • A type system specifies which operations are valid for which types

  • Type checking can be done statically (at compile time) or dynamically (at run time)