More Type Checking

CSC 310 - Programming Languages

Assignment

  • Review: what are \(\vdash\), \(O\), and \(\leq\)?

    \[ \frac{ \begin{array}{l} O(id) = T_0\\ O \vdash e_1 : T_1\\ T_1 \leq T_0 \end{array}} {O \vdash id \; \texttt{<-} \; e_1 : T_1}\text{[Assign]} \]

Initialized Attributes

  • Let \(O_c(x) = T\) for all attributes \(x : T\) in class \(C\)

    • \(O_c\) represents the class-wide scope
  • Attribute initialization is similar to let, except for the scope of names

    \[ \frac{ \begin{array}{l} O_c(id) = T_0\\ O_c \vdash e_1 : T_1\\ T_1 \leq T_0 \end{array}} {O_c \vdash id \; \texttt{<-} \; e_1 : T_1}\text{[Attr-Init]} \]

Method Dispatch

  • There is a problem with type checking method calls:

    \[ \frac{ \begin{array}{l} O \vdash e_0 : T_0\\ O \vdash e_1 : T_1\\ ...\\ O \vdash e_n : T_n\\ \end{array}} {O \vdash e_0.f(e_1, \ldots, e_n) : \; ?}\text{[Dispatch]} \]

  • We need information about the formal parameters and return type of \(f\)

Notes on Dispatch

  • In Cool, method and object identifiers live in different name spaces

    • A method foo and an object foo can coexist in the same scope
  • In the type rules, this is reflected by a separate mapping \(M\) for method signatures:

    \[M(C, f) = (T_1, \ldots, T_n, T_{ret})\]

    which means in class \(C\) there is a method \(f\) where

    \[f(x_1 : T_1, \ldots, x_n : T_n) : T_{ret}\]

An Extended Typing Judgment

  • Now we have two environments: \(O\) and \(M\)

  • The form of the typing judgment is

    \[O, M \vdash e : T\]

    which can be read as: “with the assumption that the object identifiers have types as given by \(O\) and the method identifiers have signatures as given by \(M\), the expression \(e\) has type \(T\)

The Method Environment

  • The method enviroment must be added to all rules

  • In most cases, \(M\) is passed down but not actually used

  • Only the dispatch rule uses \(M\)

The Dispatch Rule Revisited

  • Steps: check reciever object, check actual arguments, then look up the formal argument types \(T_i'\)

    \[ \frac{ \begin{array}{l} O, M \vdash e_0 : T_0\\ O, M \vdash e_1 : T_1\\ ...\\ O, M \vdash e_n : T_n\\ M(T_0, f) = (T_1', \ldots, T_n', T_{n+1}')\\ T_i \leq T_i' \; \text{for} \; 1 \leq i \leq n \end{array}} {O, M \vdash e_0.f(e_1, \ldots, e_n) : \; T_{n+1}'}\text{[Dispatch]} \]

Static Dispatch

  • Static dispatch is a variation of normal dispatch

  • The method is found in the class explicitly named by the programmer (not via \(e_0\))

  • The inferred type of the dispatch expression must conform to the specified type

Static Dispatch (Continued)

\[ \frac{ \begin{array}{l} O, M \vdash e_0 : T_0\\ O, M \vdash e_1 : T_1\\ ...\\ O, M \vdash e_n : T_n\\ T_0 \leq T\\ M(T, f) = (T_1', \ldots, T_n', T_{n+1}')\\ T_i \leq T_i' \; \text{for} \; 1 \leq i \leq n \end{array}} {O, M \vdash e_0@T.f(e_1, \ldots, e_n) : \; T_{n+1}'}\text{[Static Dispatch]} \]

Flexibility vs. Soundness

  • Recall that type systems have two conflicting goals:

    • Give flexibility to the programmer
    • Prevent valid programs from “going wrong”
  • An active line of research is in the area of inventing more flexibile type systems while preserving soundness

Dynamic and Static Types

  • The dynamic type of an object is the class C that is used in the new C expression that created it

    • A run-time notion
    • Even languages that are not statically typed have the notion of dynamic type
  • The static type of an expression is a notion that captures all possible dynamic types the expression could take

    • A compile-time notion

Recall: Soundness

  • Soundness theorem for the Cool type system: \[\forall E. dynamic\_type(E) \leq static\_type(E)\]

  • Why is this correct?

    • For \(E\), compiler uses \(static\_type(E)\)
    • All operations that can be used on an object of type \(C\) can also be used on an object of type \(C' \leq C\)
    • Subclasses can only add attributes or methods
    • Methods can be redefined but with the same types

An Example

  • Class Count incorporates a counter; the inc method works for any subclass

    class Count {
        i : Int <- 0;
        inc () : Count {
            {
                i <- i + 1;
                self;
            }
        };
    };
  • But, there is disaster lurking in the type system

Continuing Example

  • Consider a subclass Stock of Count

    class Stock inherits Count {
        name() : String {...}; -- name of item
    };
  • And the following use of Stock

    class Main {
        a : Stock <- (new Stock).inc(); -- Type checking error
        ...  a.name() ...
    };

Post-Mortem

  • (new Stock).inc() has dynamic type Stock
  • So it is legitimate to write

    a : Stock <- (new Stock).inc()
  • But (new Stock).inc() has static type Count

  • The type checker “loses” type information

  • This makes inheriting inc useless

    • That is, we must redefine inc for each of the subclasses, with a specialized return type

SELF_TYPE to the Rescue

  • We will extend the type system

  • Insight:

    • inc returns self
    • Therefore the return value has the same type as self
    • Which could be Count or any subtype of Count
    • In the case of (new Stock).inc() the type is Stock
  • We introduce the keyword SELF_TYPE to use for the return value of such functions

    • We will also modify the typing rules to handle SELF_TYPE

SELF_TYPE to the Rescue (Continued)

  • SELF_TYPE allows the return type of inc to change when inc is inherited

  • Modify the declaration of inc to read

    inc() : SELF_TYPE { ... }
  • The type checker can now prove:

    • \(O, M \vdash\) (new Count).inc() : Count
    • \(O, M \vdash\) (new Stock).inc() : Stock
  • The program from before is now well typed

SELF_TYPE as a Tool

  • SELF_TYPE is not a dynamic type
  • SELF_TYPE is a static type
  • It helps the type checker to keep better track of types
  • It enables the type checker to accept more correct programs
  • In short, having SELF_TYPE increases the expressive power of the type system

SELF_TYPE and Dynamic Types

  • What can the dynamic type of the object returned by inc be?

  • Answer: whatever the type of self could be

  • Example: the dynamic type could be Count or any subtype of Count

    class A inherits Count { };
    class B inherits Count { };
    class C inherits Count { };

SELF_TYPE and Dynamic Types (Continued)

  • In general, if SELF_TYPE appears textually in the class \(C\) as the declared type of \(E\) then it denotes the dynamic type of the self expression:

    \[dynamic\_type(E) = dynamic\_type(\texttt{self}) \leq C\]

  • Note: the meaning of SELF_TYPE depends on where it appears

    • We write \(\texttt{SELF_TYPE}_C\) to refer to an occurrence of SELF_TYPE in the body of \(C\)

Type Checking

  • This suggests a typing rule:

    \[\texttt{SELF_TYPE}_C \leq C\]

  • This rule has an important consequence:

    • In type checking it is always safe to replace \(\texttt{SELF_TYPE}_C\) with \(C\)
  • This suggests one way to handle SELF_TYPE: replace all \(\texttt{SELF_TYPE}_C\) with with \(C\)

  • This would be correct but it is like not having SELF_TYPE at all (whoops!)

Operations on SELF_TYPE

  • Recall the operations on types:

    • \(T_1 \leq T_2\): \(T_1\) is a subtype of \(T_2\)
    • \(lub(T_1, T_2)\): the least-upper bound of \(T_1\) and \(T_2\)
  • We must extend these operations to handle SELF_TYPE

Extending \(\leq\)

Let \(T\) and \(T'\) be any types except SELF_TYPE. There are four cases in the definition of \(\leq\)

  1. \(\texttt{SELF_TYPE}_C \leq T\) if \(C \leq T\)

    • \(\texttt{SELF_TYPE}_C\) can be any subtype of \(C\)
    • This includes \(C\) itself
    • Thus this is the most flexible rule we can allow
  2. \(\texttt{SELF_TYPE}_C \leq \texttt{SELF_TYPE}_C\)

    • \(\texttt{SELF_TYPE}_C\) is the type of the self expression
    • In Cool, we never need to compare SELF_TYPEs comming from different classes

Extending \(\leq\) (Continued)

  1. \(T \leq \texttt{SELF_TYPE}_C\) is always false

    • Note: \(\texttt{SELF_TYPE}_C\) can denote any subtype of \(C\)
  2. \(T \leq T'\) (according to the rules from before

Based on these rules, we can extend \(lub\)

Extending \(lub(T, T')\)

Let \(T\) and \(T'\) be any types except SELF_TYPE. Again, there are four cases:

  1. \(lub(\texttt{SELF_TYPE}_C, \texttt{SELF_TYPE}_C) = \texttt{SELF_TYPE}_C\)
  2. \(lub(\texttt{SELF_TYPE}_C, T) = lub(C, T)\)
  3. \(lub(T, \texttt{SELF_TYPE}_C) = lub(T, C)\)
  4. \(lub(T, T')\) defined as before

Where Can SELF_TYPE Appear in Cool?

  • The parser checks that SELF_TYPE appears only where a type is expected

  • But SELF_TYPE is not allowed everywhere a type can appear:

  • class \(T\) inherits \(T'\) {...}

    • \(T, T'\) cannot be SELF_TYPE because SELF_TYPE is never a dynamic type
  • x : \(T\)

    • \(T\) can be SELF_TYPE
    • An attribute whose type is \(\texttt{SELF_TYPE}_C\)

Where Can SELF_TYPE Appear in Cool?

  • let x : \(T\) in \(E\)

    • \(T\) can be SELF_TYPE
    • x has type \(\texttt{SELF_TYPE}_C\)
  • new \(T\)

    • \(T\) can be SELF_TYPE
    • Creates an object of the same type as self
  • m@ \(T(E_1, \ldots, E_n)\)

    • \(T\) cannot be SELF_TYPE

Typing Rules for SELF_TYPE

  • Since occurrences of SELF_TYPE depend on the enclosing class we need to carry more context during type checking

  • New form of the typing judgment:

    \[O, M, C \vdash e : T\]

    (an expression \(e\) occurring in the body of \(C\) has static type \(T\) given a variable type environment \(O\) and method signatures \(M\))

Type Checking Rules

  • The next step is to design type rules using SELF_TYPE for each language construct

  • Most of the rules remain the same except that \(\leq\) and \(lub\) are the new ones

  • Example:

    \[ \frac{ \begin{array}{l} O(id) = T_0\\ O,M,C \vdash e_1 : T_1\\ T_1 \leq T_0 \end{array}} {O,M,C \vdash id \; \texttt{<-} \; e_1 : T_1}\text{[Assign]} \]

What is Different?

  • Compare this to the old rule for dispatch

    \[ \frac{ \begin{array}{l} O, M,C \vdash e_0 : T_0\\ O, M,C \vdash e_1 : T_1\\ ...\\ O, M,C \vdash e_n : T_n\\ M(T_0, f) = (T_1', \ldots, T_n', T_{n+1}')\\ T_{n+1}' \neq \texttt{SELF_TYPE}\\ T_i \leq T_i' \; \text{for} \; 1 \leq i \leq n \end{array}} {O, M \vdash e_0.f(e_1, \ldots, e_n) : \; T_{n+1}'} \]

The Big Rule for SELF_TYPE

  • If the return type of the method is SELF_TYPE, then the type of the dispatch is the type of the dispatch expressions:

    \[ \frac{ \begin{array}{l} O, M,C \vdash e_0 : T_0\\ O, M,C \vdash e_1 : T_1\\ ...\\ O, M,C \vdash e_n : T_n\\ M(T_0, f) = (T_1', \ldots, T_n', \texttt{SELF_TYPE})\\ T_i \leq T_i' \; \text{for} \; 1 \leq i \leq n \end{array}} {O, M \vdash e_0.f(e_1, \ldots, e_n) : \; T_0} \]

What is Different?

  • Note this rule handles the Stock example
  • Formal parameters cannot be SELF_TYPE
  • Actual arguments can be SELF_TYPE

    • The extended \(\leq\) relation handles this case
  • The type \(T_0\) of the dispatch expression could be SELF_TYPE

    • Which class is used to find the declaration of \(f\)?
    • Answer: if is safe to use the class where the dispatch appears

Static Dispatch

  • Compare this to the old rule for static dispatch

    \[ \frac{ \begin{array}{l} O, M \vdash e_0 : T_0\\ O, M \vdash e_1 : T_1\\ ...\\ O, M \vdash e_n : T_n\\ T_0 \leq T\\ M(T, f) = (T_1', \ldots, T_n', T_{n+1}')\\ T_{n+1}' \neq \texttt{SELF_TYPE}\\ T_i \leq T_i' \; \text{for} \; 1 \leq i \leq n \end{array}} {O, M \vdash e_0@T.f(e_1, \ldots, e_n) : \; T_{n+1}'} \]

Static Dispatch

  • If the return type of the method is SELF_TYPE, then we have:

    \[ \frac{ \begin{array}{l} O, M \vdash e_0 : T_0\\ O, M \vdash e_1 : T_1\\ ...\\ O, M \vdash e_n : T_n\\ T_0 \leq T\\ M(T, f) = (T_1', \ldots, T_n', \texttt{SELF_TYPE})\\ T_i \leq T_i' \; \text{for} \; 1 \leq i \leq n \end{array}} {O, M \vdash e_0@T.f(e_1, \ldots, e_n) : \; T_0} \]

Static Dispatch

  • Why is this rule correct?

  • If we dispatch a method returning SELF_TYPE in class \(T\), don’t we get back a \(T\)?

  • Answer: No. SELF_TYPE is the type of the self parameter, which may be a subtype of the class in which the method body appears (not the class in which the call appears)

  • The static dispatch class cannot be SELF_TYPE

New Rules

  • There are two new rules using SELF_TYPE

    \[\frac{}{O,M,C \vdash \texttt{self} : \texttt{SELF_TYPE}_C}\]

    \[\frac{}{O,M,C \vdash \texttt{new SELF_TYPE} : \texttt{SELF_TYPE}_C}\]

  • There are a number of other places where SELF_TYPE is used

Where is SELF_TYPE Illegal in Cool?

  • In m(x : \(T\) ) : \(T'\), only \(T'\) can be SELF_TYPE

  • Example: what could go wrong if \(T\) were SELF_TYPE?

    class A { comp(x : SELF_TYPE) : Bool {...}; };
    class B inherits A {
        b(): Int {...};
        comp(y: SELF_TYPE) : Bool {... y.b() ...}; };
    };
    ...
    let x : A <- new B in ... x.comp(new A); ...

Summary of SELF_TYPE

  • The extended \(\leq\) and \(lub\) operations can do a lot of the work; implement them to handle SELF_TYPE

  • SELF_TYPE can be used only in a few places; be sure it is not used anywhere else

  • A use of SELF_TYPE always refers to any subtype in the current class

    • The exception is the type checking of dispatch
    • SELF_TYPE as the return type in a invoked method might have nothing to do with the current class

Why Cover SELF_TYPE?

  • SELF_TYPE is a research idea; it adds more expressiveness to the type system without allowing any “bad” programs

  • SELF_TYPE itself is not so important (except for the course project)

  • In practice, there should be a balance between the complexity of the type system and its expressiveness

Type Systems

  • The rules in this lecture were Cool-specific

    • Other languages have very different rules
    • We will survey a few more type systems later as time permits
  • General Themes

    • Type rules are defined on the structure of expressions
    • Types of variables are modeled by an environment
  • Types are a play between flexibility and safety