Data Structures and Algorithms

Lab 8: Scrabble Assistant

Due on Sunday, November 20th at 11:59 PM. This is a team lab. You and your assigned lab partner(s) will complete this lab together. Make sure that you are familiar with the Partner Etiquette guidelines. You may discuss the concepts of this lab with other classmates, but you may not share your code with anyone other than course staff and your lab partner(s). Do not look at solutions written by students other than your team. If your team needs help, please post on the Piazza forum or contact the instructor or ninjas. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, you will write a hash table implementation for a dictionary data structure. (Unit tests for this assignment have been provided.) You will then use that implementation to create a tool which assists in finding solutions to tiled word games (such as Scrabble). You will also provide a written analysis of the performance of your hash table.

Starting Code

Crossword letter tiles

Your starting code includes the following files. Bolded files are those which you may need to change.

Part I: HashTable

For the first part of this assignment, you will implement a separate chaining hash table as we discussed in class. (You will not implement a linear probing hash table.) One of the constructors allows the user to specify a maximum load factor; the other should use a sensible default value (e.g. 0.75). Unit tests have been provided for this lab; once you have completed your HashTable implementation, the appropriate unit tests should pass. Further, all operations on your HashTable (other than getKeys and getItems) must run in \(O(1)\) average time.

Note that hashing functions have been provided in hashes.h and hashes.cpp for keys of type int and string. If you want to use any other type of data as the key in your hash table, you’ll need to write an appropriate hashing function. Do not worry about experiment.cpp for this part of your lab; we will use that file later.

You should consider using an array of statically-allocated vector objects to represent the buckets of your HashTable. Statically-allocated vectors are easier to use and less error-prone, especially when they are stored in an array. You could also use the STL class list (a linked list implementation) or even the provided STLList class.

Memory Errors and Negative Modulus

In C++, the modulus operator can return a negative number:

cout << -5 % 3 << endl; // prints "-2"!
int h = hash(key); // could be a negative int
int index = h % capacity; // could be a negative number -- bad index!

This could be a problem if we then use that negative number as an array index!

To solve this, we can simply increase the result of modulus by the modulus base if turns out to be negative:

int x = -5 % 3;
if (x < 0) {
    x += 3;
}
cout << x << endl; // prints "1"

A helper function to do this for you, positive_modulus, appears in hashes.h/hashes.cpp. Make sure to use it!

Removing From A vector

The vector class provides a means to remove an element from the end of a list (pop_back) but doesn’t give a clear interface for removing elements e.g. from an arbitrary index. Vectors provide a general function called erase which can delete entire contiguous regions:

vector<int> vec = ...;
vec.erase(vec.begin() + start_index, vec.begin() + end_index);

The above function works much like a Python slice: the start index is included but the end index is not. Thus, remove_range_from_vector(1,3) would remove the second and third elements (indices 1 and 2) from the vector. If you want to delete a single element from a vector, you can do something like this:

vector<int> vec = ...; // whatever defines this vector
int index = ...; // e.g. 4
vec.erase(vec.begin() + index, vec.begin() + index + 1);

Part II: Scrabble Assistant

Dictionary

Once you have a working HashTable, you can write the Scrabble Assistant. We will design the application in two parts: a ScrabbleAssistant class (which does all of the interesting work) and a main function (which provides the interface that allows the end user to interact with that object).

The work of the ScrabbleAssistant can be broken into the following parts:

  1. Keeping track of which words are legal.
  2. Finding the anagrams of a particular word.
  3. Finding the “power set” of a string.
  4. Eliminating duplicate strings in a list.
  5. Finding all of the legal plays given a string containing the player’s tiles.

The spellcheck method of ScrabbleAssistant takes a string and returns a boolean indicating whether it is a word; that is, spellcheck("puppy") should return true but spellcheck("cloombish") returns false. The catch is that this function must run in \(O(1)\) average time!

To accomplish this, we will construct a hash table in ScrabbleAssistant which tracks legal words. The constructor of ScrabbleAssistant receives a vector of all of the legal words; you should add each of those words as a key to the hash table. (The value can be anything – false, 0, the word itself – because we will ignore it.) This way, the spellcheck method only needs to check this dictionary.

Once you have finished this task, the spellcheck-related tests should pass.

Anagrams

The anagramsOf method determines the dictionary words which can be created by rearranging a given string of characters. For instance, the word “bat” contains exactly the same letters as the word “tab” and no other English words contain exactly those letters. We also need this function to run in \(O(1)\) average time!

Similar to the above, we can accomplish this by pre-processing our word list in the ScrabbleAssistant constructor. First, write a helper function to sort the letters in a string (e.g. sortString("bat") should return "abt"). (You don’t have to do this yourself; the std::sort function of the algorithm library can help. Try std::sort(mystr.begin(), mystr.end()).) Then, in the constructor of ScrabbleAssistant, create a dictionary that maps each sorted string (e.g. "abt") to a vector of all words that sort to that string (here, a vector containing "bat" and "tab"). You’ll need to write this constructor code to add the words to the dictionary one at a time. This way, your findAnagrams function can return the vector to which the given string is mapped.

Power Set

Pressed Switch

The anagrams function lets us find all of the words made of exactly a given string of letters, but it doesn’t allow us to find words made of fewer letters. For instance, the string "at" can be made from letters in "atb", but it is not an anagram of them. To find all words, we can write a function to compute the power set of a given string: the strings which can be made by removing any number of letters from it. The power set of "atb", for instance, is made of the strings "atb", "at", "ab", "a", "tb", "t", "b", and "". Computing this power set does take \(O(2^n)\) time; fortunately, most words are small and so this doesn’t cost all that much.

You should write a stringPowerSet function to compute the power set of a given string. Here’s a recursive algorithm that you can use to write the function:

Eliminating Duplicate Strings

By the time we’re finished looking for all of the words we can make, we might find the same word multiple times. You should therefore write a deduplicate function that, given a vector<string>, returns a vector<string> which contains each word only once. You could approach this in different ways:

You can use any of the above approaches, although the last is the most efficient and least error-prone.

Finding Words

Once you’ve written the anagramsOf and stringPowerSet routines, you can combine them to solve the problem of finding words. The user will provide a string containing all of the letters available to make a word. You can find all legal words as follows:

  1. Find the power set of that string.
  2. Get all of the anagrams for each string in the power set.
  3. Put all of these anagrams into a single vector.
  4. Deduplicate that vector.
  5. Be happy!

Once you’ve implemented the findWords method as described above, your ScrabbleAssistant will be complete! You can run the provided program by providing the name of a dictionary file; if you don’t specify one, /usr/share/dict/words will be used. Congratulations!

Part III: Experimentation

For the final part of this lab, you will write some functions to perform experiments on your HashTable to compare its performance with that of the provided STLBST class (in the adts/impls/ directory). Your objective will be both to show that HashTable works well when given a good hashing algorithm and to show that it can work poorly when given a bad hashing algorithm.

The Experiment

Scientist Conducting Experiment

Most of the code for this experiment has been written in experiment.cpp. This code will create three different kinds of dictionaries, each of which contains a data type that stores a string and an int. One dictionary will be a HashTable using a “good” hash, another will be a HashTable using a “bad” hash, and the third will be the binary search tree STLBST. The experiment will add some number of elements to each dictionary and retrieve all of them; it reports the average amount of time per element spent inserting and retrieving mappings.

There are two different pair-of-int classes used in this experiment, both of which are defined in experimentClasses.cpp: the StringIntGoodHash class and the StringIntBadHash class. These classes are identical in every way except their name. In experimentHashes.cpp, however, each is given a different hash function. You should study the difference between these hash functions to understand what they do.

Your Task

In experiment.cpp, there is a single function that you need to write. The getExperimentKeys function is used by the experiment to determine which keys to add to the dictionary. You should write this function to produce keys that exploit the weakness of the bad hashing function; that is, write this function to cause hash collisions for StringIntBadHash. Note that every key returned by getExperimentKeys must be distinct; the experiment will not run if your mappings contain duplicate keys.

Once you have written your getExperimentKeys function, you use the experiment program to produce a graph much as we did with the mystery function program. This program does not require any arguments, so you can run it like so:

./experiment | gnuplot

This will both save a file called experiment.png as well as open a window showing you the results of your experiment. If you have successfully exploited the bad hash function, the runtime of the bad hash experiment should be comparatively high.

After you have completed your experiment, create a written summary that contains:

A couple paragraphs should be sufficient. As with previous assignments, this write-up is required to be in PDF form. If you would like to use LaTeX, the WrittenLab.tex file has been provided for you and you can use \includegraphics{experiment.png} to add your graph to the file.

Memory

Your code must run without any memory errors; make sure to use valgrind to find them. Memory leaks, on the other hand, are acceptable in small amounts and will not significantly impact your grade. In particular, “still reachable” memory is acceptable; “lost” memory should not occur in large amounts (e.g. millions of bytes).

Coding Style Requirements

You are required to observe some good coding practices:

Well-dressed people (style)

Peer Review

As this is a team assignment, you are required to complete a peer review of your lab partner or partners. You must do this even if everything is fine. You should complete this peer review after you have turned in your assignment. The peer review comes in the form of a simple Google Form and can be accessed here; in most cases, it will likely take less than a minute to complete. If you have trouble accessing this peer review, please make sure that you are logged into Google using your Swarthmore credentials.

Your peer review will be kept private; only course staff will have access to this information. You can also update your peer review after you have submitted it. However, if you do not submit a peer review, you will lose participation grade points. Please don’t forget!

Summary of Requirements

When you are finished, you should have