Implementation of Lexical Analysis

Outline

Regular Expressions in Lexical Specification

Regular Expressions to Lexical Specification

  1. Select a set of tokens

    • Integer, Keyword, Identifier, \(\ldots\)
  2. Write a regular expression (or rule) for the lexemes of each token

    • Integer = \([0123456789]+\)

    • Keyword = \((if \; | \; else \; | \; \ldots)\)

    • Identifiers: \([A-Za-z] \; ( \; [A-Za-z] \; | \; [0123456789] \; )*\)

Regular Expressions to Lexical Specification

  1. Construct a regular expression that matches all lexemes for all tokens

    • \(R\) = Integer \(|\) Keyword \(|\) Identifier \(|\) \(\ldots\)

    • \(R\) = \(R_1\) \(|\) \(R_2\) \(|\) \(R_3\) \(|\) \(\ldots\)

Regular Expressions to Lexical Specification

  1. Let the input be \(x_1 \ldots x_n\)

    • \(x_1 \ldots x_n\) are characters

    • For \(1 \leq i \leq n\) check if \(x_1 \ldots x_i \in L(R)\)

    • If so, it must be that \(x_1 \ldots x_i \in L(R_j)\) for some \(j\)

    • Otherwise, \(s \notin L(R)\)

  2. Remove \(x_1 \ldots x_i\) from the input and got to the previous step

Options for Handling Whitespace and Comments

  1. We could create a token for whitespace or comments

    • Whitespace = \((\; \texttt{' '} \; | \; \texttt{'\n'} \; | \; \texttt{'\t'} \; )+\)

    • Comment = \(\ldots\)

    • An input of " \t\n 42 " is transformed to the token stream Whitespace Integer Whitespace

  2. The lexer skips whitespace and comments

    • This is the preferred method because whitespace and comments are irrelevant to the parser (for most languages)

    • The lexer still needs to match a whitespace (or comment) regular expression, but a token is not output

Ambiguities

Error Handling

Summary

Regular Languages and Finite Automata

Finite Automata

Finite Automata

Finite Automata State Graphs

A Simple Example

Another Simple Example

Another Example

Epsilon Transitions

Deterministic and Non-Deterministic Automata

Execution of Finite Automata

Acceptance of NFAs

NFA versus DFA

NFA versus DFA

Regular Expressions to Finite Automata

Regular Expressions to NFA

Regular Expressions to NFA

Regular Expressions to NFA

Regular Expressions to NFA Example

NFA to DFA (The Trick)

NFA to DFA Remark

NFA to DFA Example

Implementation

Example: Table Implementation of a DFA

0 1
S T U
T T U
U T U

Implementation Continued

Theory versus Practice