The Semantic Analyzer Checkpoint

Due:: 11:00 pm, Monday April 5, 2021

Max grace days: 2

Overview

For this assignment you will write a partial semantic analyzer. Among other things, this involves traversing the abstract syntax tree and the class hierarchy. You will reject all Cool programs that do not comply with the Cool type system.

Specification

You must create two artifacts:

A program that takes a single command-line argument (e.g., file.cl-ast). That argument will be an ASCII text Cool abstract syntax tree file. Your program must either indicate that there is an error in the input or emit file.cl-type, a class map. Your program will consist of a number of OCaml files. The starter code contains a file named semantic_analysis.ml; this is file that you need to edit.
A plain ASCII text file called README describing your design decisions. See the grading rubric. A few paragraphs should suffice.

Considerations:

Line numbers: The typing rules do not directly specify the line numbers on which errors are to be reported. As of v1.11, the Cool reference compiler uses these guidelines (possibly surprising ones are italicized):
- Errors related to parameter-less method main in class Main: always line 0
- Inheritance cycle: always line 0
- Other inheritance type problem: inherited type identifier location
- self or SELF_TYPE used in wrong place: self (resp. SELF_TYPE) identifier (resp. type) location
- Redefining a feature: (second) feature location
- Redefining a formal or class: (second) identifier location
- Other attribute problems: attribute location
- Redefining a method and changing types: (second) type location
- Other problems with redefining a method: method location
- Method body type does not conform: method name identifier location
- Attribute initializer does not conform: attribute name identifier location
- Errors with types of arguments to relational/arithmetic operations: location of relational/arithmetic operation expression
- Errors with types of while / if subexpression(s): location of (enclosing) while or if expression (not the location of the conditional expression)
- Errors with case expression (e.g., lub): location of case expression
- Errors with conformance in let: location of let expression (not location of initializer)
- Errors in blocks: location of (beginning of) block expression
- Errors in actual arguments: location of method invocation expression (not the location of any particular actual argument)
- Assignment does not conform: assignment expression location (not right-hand-side location)
- Unknown identifier: location of identifier
- Unknown method: location of method name identifier
- Unknown type: location of type
Error reporting: To report an error, write the string
```
ERROR: line_number: Type-Check: message
```
to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative. Example erroneous input:
```
class Main inherits IO {
 main() : Object {
   out_string("Hello, world.\n" + 16777216) -- adding string + int
!?
 } ;
} ;
```
Example error report output:
```
ERROR: 3: Type-Check: arithmetic on String Int instead of Ints
```

Remember that you do not have to match the English prose of the reference compiler's error messages at all. You just have to get the line number right.

Semantic checks are unordered — if a program contains two or more errors, you may indicate whichever you like. You can infer from this that all of our test cases will contain at most one error.

The `.cl-type` File Format

If there are no errors in file.cl-ast your program should create file.cl-type and serialize the class map to it.

The class map is described in the Cool Reference Manual.

A .cl-type file consists of one section for this assignment:

The class map.

We will now describe exactly what to output for the class map. The general idea and notation (one string per line, recursive descent) are the same as in the previous assignment.

The Class Map

Output class_map \n.
Output the number of classes and then \n.
Output each class in turn (in ascending alphabetical order):
- Output the name of the class and then \n.
- Output the number of attributes and then \n.
- Output each attribute in turn (in order of appearance, with inherited attributes from a superclass coming first):
  - Output no_initializer \n and then the attribute name \n and then the type name \n.
  - or Output initializer \n and then the attribute name \n and then the type name \n and then the initializer expression.

Detailed `.cl-type` Example

Now that we've formally defined the output specification, we can present a worked example. Here's the example input we will consider:

class Main inherits IO {
  my_attribute : Int <- 5 ;
  main() : Object {
    out_string("Hello, world.\n")
  } ;
} ;

Resulting .cl-type class map output with comments:

class_map
6              -- number of classes
Bool           -- note: includes predefined base classes
0
IO
0
Int
0
Main
1              -- our Main has 1 attribute
initializer
my_attribute  -- named "my_attribute" Int with type Int
2              -- initializer expression line number
Int            -- initializer expression type (NOT PART OF THIS ASSIGNMENT)
integer        -- initializer expression kind
5              -- which integer constant is it?
Object
0
String
0

Commentary

This is a checkpoint (a partial implementation) of the complete Semantic Analysis assignment. The implementation of the checkpoint should do the following:

Read in the .cl-ast file given as a command-line argument.
Do every bit of typechecking and semantic analysis possible without typechecking expressions.
- Thus you should not annotate types in initializer expressions in the class map.
Print out error messages as normal.
Output only the class map to .cl-type if there are no errors.

Thus you should build the class hierarchy and check everything related to that. For example:

Check to see if a class inherits from Int (etc.).
Check to see if a class inherits from an undeclared class.
Check for cycles in the class hierarchy.
Check for duplicate method or attribute definitions in the same class.
Check for a child class that redefines a parent method but changes the parameters.
Check for a missing method main in class Main.
Check for self and SELF_TYPE mistakes in classes and methods.
This list is not exhaustive -- read the Cool Reference Manual carefully and find everything you might check for without typechecking expressions.
Basically, you'll look at classes, methods and attributes (but not method bodies).

You can do basic testing with something like the following:

linux> cool.exe --parse file.cl
linux> cool.exe --out reference --class-map file.cl
linux> my-checker file.cl-ast
linux> diff -b -B -E -w file.cl-type reference.cl-type

However, the reference implementation produces expressions with type annotations (an extra line of output per expression), so the diff command will report differences; you need to examine the differences. You can output the diff results side-by-side by passing the -y flag.

Getting the Assignment

The starter code for the assignment is on the Linux server at the path:

/export/home/public/schwesin/csc310/semantic-analyzer-checkpoint-handout

Turning in the Assignment

You must turn in a zip file containing these files:

ast.ml
serialize.ml
deserialize.ml
semantic_analysis.ml
main.ml
README

There is a makefile provided with this assignment. To submit the assignment, execute the command:

    make submit

from within the assignment directory.

Grading Criteria

Grading (out of 100 points):

70 points — for autograder tests
- 70 points 90% or greater passing test cases
- 55 points between 75% and 89% passing test cases
- 35 points between 50% and 74% passing test cases
- 20 points between 25% and 49% passing test cases
- 0 points less than 25% passing test cases
15 points — for a clear description in your README
- 15 — thorough discussion of design decisions (e.g., handling of the class hierarchy, case and new and dispatch); a few paragraphs of coherent English sentences should be fine
- 8 — vague or hard to understand; omits important details
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT
15 points — for code cleanliness
- 15 — code is mostly clean and well-commented
- 8 — code is sloppy and/or poorly commented in places
- 0 — little to no effort to organize and document code