The Semantic Analyzer
- Due:
- 11:00 pm, Friday April 23, 2021
Max grace days: 2
Overview
For this assignment you will write a semantic analyzer. Among other things, this involves traversing the abstract syntax tree and the class hierarchy. You will reject all Cool programs that do not comply with the Cool type system.
You will also write additional code to serialize the class map, implementation map, parent map, and annotated AST produced by your semantic analysis.
Specification
You must create two artifacts:
- A program that takes a single command-line argument (e.g.,
file.cl-ast
). That argument will be an ASCII text Cool abstract syntax tree file. Your program must either indicate that there is an error in the input (for example, a type error) or emitfile.cl-type
, a serialized Cool abstract syntax tree, class map, implementation map, and parent map. If your program is calledchecker
, invokingchecker file.cl-ast
should yield the same output ascool --type file.cl
. Your program will consist of a number of OCaml files. - A plain ASCII text file called
README
describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
Considerations:
Line numbers: The typing rules do not directly specify the line numbers on which errors are to be reported. As of v1.11, the Cool reference compiler uses these guidelines (possibly surprising ones are italicized):
- Errors related to parameter-less method
main
in classMain
: always line 0 - Inheritance cycle: always line 0
- Other inheritance type problem: inherited type identifier location
self
orSELF_TYPE
used in wrong place: self (resp. SELF_TYPE) identifier (resp. type) location- Redefining a feature: (second) feature location
- Redefining a formal or class: (second) identifier location
- Other attribute problems: attribute location
- Redefining a method and changing types: (second) type location
- Other problems with redefining a method: method location
- Method body type does not conform: method name identifier location
- Attribute initializer does not conform: attribute name identifier location
- Errors with types of arguments to relational/arithmetic operations: location of relational/arithmetic operation expression
- Errors with types of
while
/if
subexpression(s): location of (enclosing)while
orif
expression (not the location of the conditional expression) - Errors with
case
expression (e.g., lub): location ofcase
expression - Errors with conformance in
let
: location oflet
expression (not location of initializer) - Errors in blocks: location of (beginning of) block expression
- Errors in actual arguments: location of method invocation expression (not the location of any particular actual argument)
- Assignment does not conform: assignment expression location (not right-hand-side location)
- Unknown identifier: location of identifier
- Unknown method: location of method name identifier
- Unknown type: location of type
- Errors related to parameter-less method
Error reporting: To report an error, write the string
ERROR: line_number: Type-Check: message
to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative. Example erroneous input:
class Main inherits IO { main() : Object { out_string("Hello, world.\n" + 16777216) -- adding string + int !? } ; } ;
Example error report output:
ERROR: 3: Type-Check: arithmetic on String Int instead of Ints
Remember that you do not have to match the English prose of the reference compiler's error messages at all. You just have to get the line number right.
Semantic checks are unordered — if a program contains two or more errors, you may indicate whichever you like. You can infer from this that all of our test cases will contain at most one error.
The .cl-type
File Format
If there are no errors in file.cl-ast
your program should create file.cl-type
and serialize the class map, implementation map, parent map, and annotated AST to it.
The class and implementation maps are described in the Cool Reference Manual.
A .cl-type
file consists of four sections:
- The class map.
- The implementation map.
- The parent map.
- The annotated AST.
Simply output the four sections in order, one after the other.
We will now describe exactly what to output for the class and implementation maps. The general idea and notation (one string per line, recursive descent) are the same as in the previous assignment.
The Class Map
- Output
class_map
\n. - Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
- Output the name of the class and then \n.
- Output the number of attributes and then \n.
- Output each attribute in turn (in order of appearance, with inherited attributes from a superclass coming first):
- Output
no_initializer
\n and then the attribute name \n and then the type name \n. - or Output
initializer
\n and then the attribute name \n and then the type name \n and then the initializer expression.
- Output
The Implementation Map
- Output
implementation_map
\n. - Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
- Output the name of the class and then \n.
- Output the number of methods for that class and then \n.
- Output each method in turn (in order of appearance, with inherited or overridden methods from a superclass coming first; internal methods are defined to appear in ascending alphabetical order):
- Output the method name and then \n.
- Output the number of formals and then \n.
- Output each formal's name only:
- Output the name and then \n
- If this method is inherited from a parent class and not overriden, output the name of the ultimate parent class that defined the method body expression and then \n. Otherwise, output the name of the current class and then \n.
- Output the method body expression.
The Parent Map
- Output
parent_map
\n. - Output the number of parent-child inheritance relations and then \n. This number is equal to the number of classes minus one (since
Object
has no parent). - Output each child class in turn (in ascending alphabetical order):
- Output the name of the child class and then \n.
- Output the name of the child class's parent and then \n.
The Annotated AST
With two exceptions, the annotated AST format is identical to the normal AST from the previous assignment.
- The first change involves expressions. To output an Expression:
- Output the line number of the expression and then a newline.
- Output the name of type associated with the expression and then a newline. For example, the expression
3+x
is associated with the typeInt
. This is new; it should be done for the final submission, but not for the checkpoint. - Output the name of the expression and then a newline and then any subparts.
The second change is a new kind of expression, internal, used to represent the bodies of predefined methods. Internal expressions are those that are handled by the run-time system — you might think of them as part of the standard library. You output Internal Expressions (including the type annotation, as above) as follows:
- 0 \n type \n
internal
\n Class.method \n
The valid kinds of internal expressions (i.e., the values for Class.method) are:
- IO.in_int IO.in_string IO.out_int IO.out_string Object.abort Object.copy Object.type_name String.concat String.length String.substr
They are formally defined in the Cool Reference Manual.
Note that you must output information about all classes and methods defined in the program as well as all base classes (and their methods). Do not just print out "classes actually used" or "methods actually called" or something like that. Output all classes and methods — no optimizations or shortcuts!
- 0 \n type \n
Detailed .cl-type
Example
Now that we've formally defined the output specification, we can present a worked example. Here's the example input we will consider:
class Main inherits IO {
my_attribute : Int <- 5 ;
main() : Object {
out_string("Hello, world.\n")
} ;
} ;
Resulting .cl-type
class map output with comments:
class_map
6 -- number of classes
Bool -- note: includes predefined base classes
0
IO
0
Int
0
Main
1 -- our Main has 1 attribute
initializer
my_attribute -- named "my_attribute" Int with type Int
2 -- initializer expression line number
Int -- initializer expression type
integer -- initializer expression kind
5 -- which integer constant is it?
Object
0
String
0
Resulting .cl-type
implementation map output with comments:
implementation_map
6 -- six classes
Bool -- first is Bool
3 -- - it has three methods
abort -- - first is abort()
0 -- -- abort has 0 formal arguments
Object -- -- name of parent class from which Bool inherits abort()
0 -- -- abort's body expression starts on line 0
Object -- -- abort's body expression has type Object
internal -- -- abort's body is an internal kind of expression (i.e., a system call; see above)
Object.abort -- -- extra detail on abort's body expression
copy -- - second of Bool's three methods is copy()
0 -- -- copy has 0 formal arguments
Object -- -- name of parent class from which Bool inherits copy()
0 -- -- copy's body expression starts on line 0
SELF_TYPE -- -- copy's body expression has type SELF_TYPE
internal -- -- copy's body is an internal kind of expression (i.e., a system call; see above)
Object.copy -- -- extra detail on copy's body expression
... many lines skipped ...
Main -- another class is Main
8 -- - it has 8 methods
... many lines skipped ... --
main -- - one of Main's methods is main()
0 -- -- main has 0 formal arguments
Main -- -- the name of the class where Main.main() is defined
4 -- -- the body expression of Main.main starts on line 4
SELF_TYPE -- -- the body expression of Main.main has type SELF_TYPE
self_dispatch -- -- the body of Main.main() is a self_dispatch kind of expression
... many lines skipped ...
Finally, the resulting .cl-type
parent map output with comments:
parent_map
5 -- there are five classes with parents (Object is the sixth class)
Bool -- Bool's parent ...
Object -- ... is Object.
IO -- IO's parent ...
Object -- ... is Object.
Int -- Int's parent ...
Object -- ... is Object.
Main -- Main's parent ...
IO -- ... is IO.
String -- String's parent ...
Object -- ... is Object.
Commentary
You can do basic testing with something like the following:
linux> cool --parse file.cl
linux> cool --out reference --type file.cl
linux> my-checker file.cl-ast
linux> diff -b -B -E -w file.cl-type reference.cl-type
You should implement all of the typing rules in the Cool Reference Manual. There are also a number of other rules and corner cases you have to check (e.g., no class can inherit from Int, you cannot redefine a class, you cannot have an attribute named self, etc.). They are sprinkled throughout the manual. Check everything you possibly can.
Getting the Assignment
The starter code for the assignment is on the Linux server at the path:
/export/home/public/schwesin/csc310/semantic-analyzer-full-handout
Turning in the Assignment
You must turn in a zip file containing these files:
ast.ml
serialize.ml
deserialize.ml
semantic_analysis.ml
main.ml
README
There is a makefile provided with this assignment. To submit the assignment, execute the command:
make submit
from within the assignment directory.
Grading Criteria
Grading (out of 100 points):
- 70 points — for the negative autograder tests
- 70 points 90% or greater passing test cases
- 55 points between 75% and 89% passing test cases
- 35 points between 50% and 74% passing test cases
- 20 points between 25% and 49% passing test cases
- 0 points less than 25% passing test cases
- 15 points — for a clear description in your README
- 15 — thorough discussion of design decisions; a few paragraphs of coherent English sentences should be fine
- 8 — vague or hard to understand; omits important details
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT
- 15 points — for code cleanliness
- 15 — code is mostly clean and well-commented
- 8 — code is sloppy and/or poorly commented in places
- 0 — little to no effort to organize and document code
Extra Credit
You can receive extra credit for your final course grade by passing the positive test cases:
- 5%: 90% or greater passing test cases
- 3%: between 75% and 89% passing test cases
- 1%: between 50% and 74% passing test cases