The Parser

Due:
11:00 pm, Friday March 12, 2021

Max grace days: 2

Overview

For this assignment you will write a parser using a parser generator. You will describe the Cool grammar in an appropriate input format and the parser generator will generate actual code (in OCaml). You will also write additional code to unserialize the tokens produced by the lexer stage and to serialize the abstract syntax tree produced by your parser.

Specification

You must create a program that takes a single command-line argument (for example, file.cl-lex). That argument will be an ASCII text Cool tokens file (as described in PA2). The cl-lex file will always be well-formed (i.e., there will be no syntax errors in the cl-lex file itself). However, the cl-lex file may describe a sequence of Cool tokens that do not form a valid Cool program.

Your program must either indicate that there is an error in the Cool program described by the cl-lex file (e.g., a parse error in the Cool file) or emit file.cl-ast, a serialized Cool abstract syntax tree. Your program’s main parser component must be constructed by a parser generator. The “glue code” for processing command-line arguments, unserializing tokens and serializing the resulting abstract syntax tree should be written by hand. If your program is called parser, invoking parser file.cl-lex should yield the same output as cool --parse file.cl. Your program will consist of a number of OCaml files.

You must use ocamlyacc. Do not write your entire parser by hand. Parts of it must be tool-generated from context-free grammar rules you provide.

Considerations:

The .cl-ast File Format

If there are no errors in file.cl-lex your program should create file.cl-ast and serialize the abstract syntax tree to it. The general format of a .cl-ast file follows the Cool Reference Manual Syntax chart. Basically, we do a pre-order traversal of the abstract syntax tree, writing down every node as we come to it.

We will now describe exactly what to output for each kind of node. You can view this as specifying a set of mutually-recursive tree-walking functions. The notation "superclass:identifier" means "output the superclass using the rule (below) for outputting an identifier". The notation "\n" means "output a newline".

Example input:

(* Line 01 *)
(* Line 02 *)
(* Line 03 *)  class List {
(* Line 04 *)     -- Define operations on lists.
(* Line 05 *)
(* Line 06 *)     cons(i : Int) : List {
(* Line 07 *)        (new Cons).init(i, self)
(* Line 08 *)     };
(* Line 09 *)
(* Line 10 *)  };

Example .cl-ast output with comments.

1                      -- number of classes                   
3                      --  line number of class name identifier
List                   --  class name identifier
no_inherits            --  does this class inherit? 
1                      --  number of features
method                 --   what kind of feature? 
6                      --   line number of method name identifier
cons                   --   method name identifier
1                      --   number of formal parameters
6                      --    line number of formal parameter identifier
i                      --    formal parameter identifier
6                      --    line number of formal parameter type identifier
Int                    --    formal parameter type identifier
6                      --   line number of return type identifier
List                   --   return type identifier
7                      --    line number of body expression 
dynamic_dispatch       --    kind of body expression 
7                      --     line number of dispatch receiver expression 
new                    --     kind of dispatch receiver expression  
7                      --      line number of new-class identifier 
Cons                   --      new-class identifier
7                      --     line number of dispatch method identifier
init                   --     dispatch method identifier
2                      --     number of arguments in dispatch 
7                      --      line number of first argument expression
identifier             --      kind of first argument expression
7                      --       line number of the identifier
i                      --       what is the identifier? 
7                      --      line nmber of second argument expression
identifier             --      kind of second argument expression
7                      --       line number of the identifier
self                   --       what is the identifier? 

The .cl-ast format is quite verbose, but it is particularly easy for later stages (e.g., the type checker) to read in again without having to go through all of the trouble of "actually parsing". It will also make it particularly easy for you to notice where things are going awry if your parser is not producing the correct output.

Writing the rote code to output a .cl-ast text file given an AST may take a bit of time but it should not be difficult; our reference implementation does it in 116 lines and cleaves closely to the structure given above.

Parser Generators

You must use a parser generator or similar library for this assignment.

Most parser generators are derived from yacc (or bison), the original parser generator for C. Thus you may find it handy to refer to the Yacc paper or the Bison manual. When you're reading, mentally translate the C code references into the language of your choice.

Commentary

You can do basic testing with something like the following:

linux> ./cool.exe --lex file.cl
linux> ./cool.exe --out reference --parse file.cl
linux> ./main.exe file.cl-lex
linux> diff -b -B -E -w file.cl-ast reference.cl-ast

You may find the reference compiler's --unparse option useful for debugging your .cl-ast files.

Getting the Assignment

The starter code for the assignment is on the Linux server at the path:

/export/home/public/schwesin/csc310/parser-handout

Turning in the Assignment

You must turn in a zip file containing these files:

  1. ast.ml
  2. parser.mly
  3. main.ml

There is a makefile provided with this assignment. To submit the assignment, execute the command:

    make submit

from within the assignment directory.

Grading Criteria

Grading (out of 50 points):