Due 11:59pm Wednesday, 28 March 2018

The goals of this lab are to:

- Gain familiarity with Turing Machine terminology and properties
- Take an idea from theory to implementation in high level language

This assignment consists of a written homework portion and a programming portion. Write your written solution using $\LaTeX$. A python sketch of the programming portion has been provided, but you are free to write a solution in C, C++, Java, or Python if desired. Submit both portions using github. This is a partnered assignment. You may work with one partner and submit a joint assignment. It is OK to discuss approaches at a high level with other students that are not your partner, but you must do your own write-up. Do not share your write up with other groups, and do not read other group's solutions. Please refer to the syllabus or ask me if you have questions about the policy.

Cloning the starting files

wherecd ~/cs46git clone git@github.swarthmore.edu:CS46-S18/lab06-<group>.git ./lab06

Parsing Context Free Languages

In the programming portion of this lab, you will implement an algorithm to determine if a given string $w$ is generated by context free grammar $G$ in Chomsky Normal Form. The end of section 2.1 defines Chomsky Normal Form (CNF) and proves that any grammar can be converted into an equivalent CNF grammar. You are not expected to know how this proof works or how to execute it, but it may provide some insight.
Given a grammar $G$ in Chomsky Normal Form, and an input string $w=w_1 w_2 \cdots w_n$, we can design a polynomial time algorithm to determine if $w \in L(G)$. The basic idea is to compute for all substrings $x=w_i \cdots w_j, j \geq i$ of $w$ if there is some rule in $G$ that generates $x$. Define $V_{i,j}$ to be the set of all variables in $V \in G$ that that can derive substring $x=w_i \cdots w_j$. $G$ can derive $w$ if $V_{1,n}$ contains the start symbol $S \in V$

Our algorithm for parsing strings computes the sets $V_{i,j}$ for $1 \leq i \leq n, j \geq i$. The algorithm proceeds in a series of rounds, where in each round, it considers substrings of length $s, 1 \leq s \leq n$. In the first round, the algorithm sets $V_{i,i}$ to $\{ A \in V, A \rightarrow w_i$ for some rule $A$ in the grammar $\}$.

For substrings longer than $1$, the algorithm checks if the substring $x=x_i \cdots x_{i+s-1}$ can be broken into two smaller pieces at some index $k$, $y=x_i \cdots x_k$ and $z=x_{k+1} \cdots x_{i+s-1}$ such that there is a rule $A \rightarrow BC$ in the grammar where $B \in V_{i,k}$ and $C \in V_{k+1,i+s-1}$. If such a rule $A$ exists, we add $A$ to $V_{i,i+s-1}$.

After finishing all rounds, we just need to check if $S \in V_{1,n}$. A pseudocode summary is below

for i in 1 to n: Add A to V[i,i] if A yields w[i] is a rule for s in 1 to n-1: for i in 1 to n-s: for k in i to i+s-1: if A yields BC is a rule where B is in V[i,k] and C is in V[k+1,i+s] then add A to V[i,i+s] return True if S is in V[1,n]Some example output for a CNF grammar for $a^nb^n$.

$cat grammar1.txta b S: E S: A1 B S: A B A1: A S A: a B: b $python cnf.py grammar1.txt string1.txt

abab: False a: False aaaabbbb: True ab: TrueFor the balanced parens

$cat grammar2.txt( ) S: E S: A1 B S: A B S: S S A1: A S A: ( B: ) $python cnf.py grammar2.txt string2.txt

(): True E: True ()(): True (()()): True )(: False (((()): False

The grammar format starts with a line containing the symbols of the alphabet separated by spaces, a blank line, and then the grammar rules. The first rule lists the start symbol on the left hand side. The rule format is `V: A B` to indicate the rule $V \rightarrow AB$. Note `A1` is a single variable not two variables. Variables on the right hand side are separated by spaces.

self.start = "S" #set start variable result = self.matchedSym("a") # call the matchedSym method with argument 'a' # result is saved in local variable result.

d = {} # create empty dictionary i=1 j=2 #store the value 1 in the dictionary with the key (1,2) d[(i,j)] = 1 #print the value associated with a key (i,j) #if the key does not already exists, it will throw an error print(d[(i,j)]) """ you can store any object as a value. A list may be handy. Python dictionary keys must be immutable, so a tuple (i,j) is fine, but a list [i,j] is not a valid key. """

if "a" in "apple": print("Yes!")

#print 1 through 9 inclusive for i in range(1,10): print(i)If

If you have questions about the existing code structure in `cnf.py`, please ask in lab or on Piazza. I don't want you to be spending hours deciphering python syntax.

Readme survey

Before submitting the lab. Please complete the short survey in the file Submit

Once you have edited the files, you should publish your changes using the following steps:

$git add hw06.tex Readme.md cnf.py$git commit -m "completed lab06"$git push

If you make changes to files after your push and want to share these changes, repeat the add, commit, push loop again to update the github server.

To recap, the git commit cycle is `git add, git commit, git
push`. Don't forget to `git push` when you have completed an
assignment.

You can review the basic git commands on the github help pages. Eventually, you should get in the habit of using git status to see if everything has been published, but we will talk about this more throughout the semester.

Remember
to *add, commit, push*. You may commit and push as often as you
like, and only the most recent push will be graded.