Language Lexer

Overview

In this project, you will implement a lexer for a small C-like language. Your lexer functions will convert the source text of a program The input could be an entire program, or a fragment of a program This document describes the lexical specification.

The only requirement for error handling is that input that cannot be lexed or lexed according to the specification should raise an InvalidInputException. Informative error messages should be used when raising these exceptions to make debugging easier.

Testing

You can run your lexer directly on a program by running

  dune exec bin/main.exe <filename>

where the <filename> argument is required.

All of the tests will compare the output of the reference implementation to your implementation.

The Lexer (aka Scanner or Tokenizer)

The lexer transforms source text into tokens. The goal is to transform a program, represented as a string, into a list of tokens that capture the different elements of the program. This process can be handled by using regular expressions. Information about OCaml’s regular expressions library can be found in the Str module. You are not required to use it, but you may find it useful.

Your lexer must be written in lexer.ml. For unit testing purposes, you may want to implement a pure function with a type signature of: string -> token list.

The token type is implemented in token.ml.

A few important notes to consider:

The following table shows all mappings of tokens to their lexical representations. Tok_Bool, Tok_Int, and Tok_ID are listed as regular expressions.

Token Name Lexical Representation
Tok_LParen (
Tok_RParen )
Tok_LBrace {
Tok_RBrace }
Tok_Equal ==
Tok_NotEqual !=
Tok_Assign =
Tok_Greater >
Tok_Less <
Tok_GreaterEqual >=
Tok_LessEqual <=
Tok_Or ||
Tok_And &&
Tok_Not !
Tok_Semi ;
Tok_Int_Type int
Tok_Bool_Type bool
Tok_Print print
Tok_If if
Tok_Else else
Tok_For for
Tok_From from
Tok_To to
Tok_While while
Tok_Add +
Tok_Sub -
Tok_Mult *
Tok_Div /
Tok_Pow ^
Tok_Bool /true|false/
Tok_Int /-?[0-9]+/
Tok_ID /[a-zA-Z][a-zA-Z0-9]*/

Turning in the Assignment

To submit your assignment, create a zip file of a DIRECTORY named project3-handin containing ONLY the project related source files. Then submit that file to the appropriate folder on D2L.

Grading Criteria

OCaml Style Guide

Programming Project Rubric