Data Structures and Algorithms

Lab 8: Scrabble Assistant

Due on Sunday, November 20th at 11:59 PM. This is a team lab. You and your assigned lab partner(s) will complete this lab together. Make sure that you are familiar with the Partner Etiquette guidelines. You may discuss the concepts of this lab with other classmates, but you may not share your code with anyone other than course staff and your lab partner(s). Do not look at solutions written by students other than your team. If your team needs help, please post on the Piazza forum or contact the instructor or ninjas. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, you will write a hash table implementation for a dictionary data structure. (Unit tests for this assignment have been provided.) You will then use that implementation to create a tool which assists in finding solutions to tiled word games (such as Scrabble). You will also provide a written analysis of the performance of your hash table.

Starting Code

Source: OpenClipArt.org

Your starting code includes the following files. Bolded files are those which you may need to change.

adts/: As usual, the ADT interface declarations are stored here. Reference implementations of previously-covered data structures (like lists) are stored in the subdirectory impls. Review the dictionary.h interface which should be familiar from the previous lab.
hashes.h/hashes.cpp: Some hash implementations for the types you will use in your table.
hashTable.h/hashTable__definitions.h: The HashTable class. Only the public methods have been provided as part of the declaration; you will need to write your own private fields and members.
scrabbleAssistant.h/scrabbleAssistant.cpp: The ScrabbleAssistant class, which bundles up the work of identifying anagrams and similar tasks.
main.cpp: The file where the main function for the program. This program is already written but relies upon the ScrabbleAssistant class you will write.
tests.cpp: Unit tests for your hash table data structure.
manualTests.cpp: A program which you can use to perform sandbox tests of your code.
experiment.cpp: A program which you will use to experiment with the performance of your hash table.
experimentClasses.h/experimentClasses.cpp: A module containing classes used while experimenting with the performance of HashTable.
experimentHashes.h/experimentHashes.cpp: A module containing the hash functions which are used in the experimental program.

Part I: `HashTable`

For the first part of this assignment, you will implement a separate chaining hash table as we discussed in class. (You will not implement a linear probing hash table.) One of the constructors allows the user to specify a maximum load factor; the other should use a sensible default value (e.g. 0.75). Unit tests have been provided for this lab; once you have completed your HashTable implementation, the appropriate unit tests should pass. Further, all operations on your HashTable (other than getKeys and getItems) must run in \(O(1)\) average time.

Note that hashing functions have been provided in hashes.h and hashes.cpp for keys of type int and string. If you want to use any other type of data as the key in your hash table, you’ll need to write an appropriate hashing function. Do not worry about experiment.cpp for this part of your lab; we will use that file later.

You should consider using an array of statically-allocated vector objects to represent the buckets of your HashTable. Statically-allocated vectors are easier to use and less error-prone, especially when they are stored in an array. You could also use the STL class list (a linked list implementation) or even the provided STLList class.

Memory Errors and Negative Modulus

In C++, the modulus operator can return a negative number:

cout << -5 % 3 << endl; // prints "-2"!
int h = hash(key); // could be a negative int
int index = h % capacity; // could be a negative number -- bad index!

This could be a problem if we then use that negative number as an array index!

To solve this, we can simply increase the result of modulus by the modulus base if turns out to be negative:

int x = -5 % 3;
if (x < 0) {
    x += 3;
}
cout << x << endl; // prints "1"

A helper function to do this for you, positive_modulus, appears in hashes.h/hashes.cpp. Make sure to use it!

Removing From A `vector`

The vector class provides a means to remove an element from the end of a list (pop_back) but doesn’t give a clear interface for removing elements e.g. from an arbitrary index. Vectors provide a general function called erase which can delete entire contiguous regions:

vector<int> vec = ...;
vec.erase(vec.begin() + start_index, vec.begin() + end_index);

The above function works much like a Python slice: the start index is included but the end index is not. Thus, remove_range_from_vector(1,3) would remove the second and third elements (indices 1 and 2) from the vector. If you want to delete a single element from a vector, you can do something like this:

vector<int> vec = ...; // whatever defines this vector
int index = ...; // e.g. 4
vec.erase(vec.begin() + index, vec.begin() + index + 1);

Part II: Scrabble Assistant

Source: OpenClipArt.org

Once you have a working HashTable, you can write the Scrabble Assistant. We will design the application in two parts: a ScrabbleAssistant class (which does all of the interesting work) and a main function (which provides the interface that allows the end user to interact with that object).

The work of the ScrabbleAssistant can be broken into the following parts:

Keeping track of which words are legal.
Finding the anagrams of a particular word.
Finding the “power set” of a string.
Eliminating duplicate strings in a list.
Finding all of the legal plays given a string containing the player’s tiles.

Legal Words

The spellcheck method of ScrabbleAssistant takes a string and returns a boolean indicating whether it is a word; that is, spellcheck("puppy") should return true but spellcheck("cloombish") returns false. The catch is that this function must run in \(O(1)\) average time!

To accomplish this, we will construct a hash table in ScrabbleAssistant which tracks legal words. The constructor of ScrabbleAssistant receives a vector of all of the legal words; you should add each of those words as a key to the hash table. (The value can be anything – false, 0, the word itself – because we will ignore it.) This way, the spellcheck method only needs to check this dictionary.

Once you have finished this task, the spellcheck-related tests should pass.

Anagrams

The anagramsOf method determines the dictionary words which can be created by rearranging a given string of characters. For instance, the word “bat” contains exactly the same letters as the word “tab” and no other English words contain exactly those letters. We also need this function to run in \(O(1)\) average time!

Similar to the above, we can accomplish this by pre-processing our word list in the ScrabbleAssistant constructor. First, write a helper function to sort the letters in a string (e.g. sortString("bat") should return "abt"). (You don’t have to do this yourself; the std::sort function of the algorithm library can help. Try std::sort(mystr.begin(), mystr.end()).) Then, in the constructor of ScrabbleAssistant, create a dictionary that maps each sorted string (e.g. "abt") to a vector of all words that sort to that string (here, a vector containing "bat" and "tab"). You’ll need to write this constructor code to add the words to the dictionary one at a time. This way, your findAnagrams function can return the vector to which the given string is mapped.

Power Set

Source: OpenClipArt.org

The anagrams function lets us find all of the words made of exactly a given string of letters, but it doesn’t allow us to find words made of fewer letters. For instance, the string "at" can be made from letters in "atb", but it is not an anagram of them. To find all words, we can write a function to compute the power set of a given string: the strings which can be made by removing any number of letters from it. The power set of "atb", for instance, is made of the strings "atb", "at", "ab", "a", "tb", "t", "b", and "". Computing this power set does take \(O(2^n)\) time; fortunately, most words are small and so this doesn’t cost all that much.

You should write a stringPowerSet function to compute the power set of a given string. Here’s a recursive algorithm that you can use to write the function:

Eliminating Duplicate Strings

By the time we’re finished looking for all of the words we can make, we might find the same word multiple times. You should therefore write a deduplicate function that, given a vector<string>, returns a vector<string> which contains each word only once. You could approach this in different ways:

Nested loop over the vector, looking for matches. This is \(O(n^2)\) time.
Linear search for adjacent equal strings. This only works if the vector is sorted, but takes \(O(n)\) time. (The sort takes \(O(n \log n)\) time.)
Add every string in the vector as a key to a hash table (with a dummy value like 0, as described above for checking valid words). Then, get all keys from the hash table. This takes \(O(n)\) time on average.

You can use any of the above approaches, although the last is the most efficient and least error-prone.

Finding Words

Once you’ve written the anagramsOf and stringPowerSet routines, you can combine them to solve the problem of finding words. The user will provide a string containing all of the letters available to make a word. You can find all legal words as follows:

Find the power set of that string.
Get all of the anagrams for each string in the power set.
Put all of these anagrams into a single vector.
Deduplicate that vector.
Be happy!

Once you’ve implemented the findWords method as described above, your ScrabbleAssistant will be complete! You can run the provided program by providing the name of a dictionary file; if you don’t specify one, /usr/share/dict/words will be used. Congratulations!

Part III: Experimentation

For the final part of this lab, you will write some functions to perform experiments on your HashTable to compare its performance with that of the provided STLBST class (in the adts/impls/ directory). Your objective will be both to show that HashTable works well when given a good hashing algorithm and to show that it can work poorly when given a bad hashing algorithm.

The Experiment

Source: OpenClipArt.org

Most of the code for this experiment has been written in experiment.cpp. This code will create three different kinds of dictionaries, each of which contains a data type that stores a string and an int. One dictionary will be a HashTable using a “good” hash, another will be a HashTable using a “bad” hash, and the third will be the binary search tree STLBST. The experiment will add some number of elements to each dictionary and retrieve all of them; it reports the average amount of time per element spent inserting and retrieving mappings.

There are two different pair-of-int classes used in this experiment, both of which are defined in experimentClasses.cpp: the StringIntGoodHash class and the StringIntBadHash class. These classes are identical in every way except their name. In experimentHashes.cpp, however, each is given a different hash function. You should study the difference between these hash functions to understand what they do.

Your Task

In experiment.cpp, there is a single function that you need to write. The getExperimentKeys function is used by the experiment to determine which keys to add to the dictionary. You should write this function to produce keys that exploit the weakness of the bad hashing function; that is, write this function to cause hash collisions for StringIntBadHash. Note that every key returned by getExperimentKeys must be distinct; the experiment will not run if your mappings contain duplicate keys.

Once you have written your getExperimentKeys function, you use the experiment program to produce a graph much as we did with the mystery function program. This program does not require any arguments, so you can run it like so:

./experiment | gnuplot

This will both save a file called experiment.png as well as open a window showing you the results of your experiment. If you have successfully exploited the bad hash function, the runtime of the bad hash experiment should be comparatively high.

After you have completed your experiment, create a written summary that contains:

Your experimental graph
A description of how your getExperimentKeys function exploits the bad hashing algorithm
A brief discussion of why the graph looks like it does (in terms of the big-O complexity of the different dictionaries)

A couple paragraphs should be sufficient. As with previous assignments, this write-up is required to be in PDF form. If you would like to use LaTeX, the WrittenLab.tex file has been provided for you and you can use \includegraphics{experiment.png} to add your graph to the file.

Memory

Your code must run without any memory errors; make sure to use valgrind to find them. Memory leaks, on the other hand, are acceptable in small amounts and will not significantly impact your grade. In particular, “still reachable” memory is acceptable; “lost” memory should not occur in large amounts (e.g. millions of bytes).

Coding Style Requirements

You are required to observe some good coding practices:

Source: OpenClipArt.org

You should pick meaningful variable names.

// Good
int* pixels = new int[size];
// Bad
int* p = new int[size];

You should use correct and consistent indentation. Lines of code within a block (that is, surrounded by { and }) should be indented four spaces further than the lines surrounding them.
```
// Good
if (condition) {
    cout << "Test" << endl;
}
// Bad
if (condition) {
cout << "Test" << endl;
    }
```
You should use a block whenever possible, even if it’s not necessary. This helps you avoid subtle or messy bugs in the future.
```
// Good
if (condition) {
    cout << "Something" << endl;
}
// Bad
if (condition)
    cout << "Something" << endl;
```
Any new methods or fields in your header files should have comments explaining their purpose and behavior. You are permitted to omit documentation for methods that are inherited from other classes; that is, if your class has a foo method because its superclass has a foo method, you don’t need to document that method.
```
// Good
public:
    /**
     * Saves the image represented by this object to a file.
     * @param filename The name of the file to save.
     * @return The number of pixels in the saved image.
     * @throws runtime_error If the image could not be saved due to an I/O error.
     */
    int save(std::string filename);

// Bad
public:
    int save(std::string filename);
```
Your method/field documentation does not have to be in the format above, but you must describe the method’s behavior, its parameters, its return value, and any exceptions that it may throw. (If you’re indifferent, the above syntax is a good one to know; it’s a de facto standard used by Javadoc, Doxygen, and other tools that automatically process source code comments into other formats like searchable webpages.)

Peer Review

As this is a team assignment, you are required to complete a peer review of your lab partner or partners. You must do this even if everything is fine. You should complete this peer review after you have turned in your assignment. The peer review comes in the form of a simple Google Form and can be accessed here; in most cases, it will likely take less than a minute to complete. If you have trouble accessing this peer review, please make sure that you are logged into Google using your Swarthmore credentials.

Your peer review will be kept private; only course staff will have access to this information. You can also update your peer review after you have submitted it. However, if you do not submit a peer review, you will lose participation grade points. Please don’t forget!

Summary of Requirements

When you are finished, you should have

A working HashTable data structure
A ScrabbleAssistant class that powers the provided program
An experimental write-up demonstrating the importance of good hash functions
No memory errors
Code which conforms to the style requirements above
A completed peer review

Lab 8: Scrabble Assistant

Overview

Starting Code

Part I: HashTable

Memory Errors and Negative Modulus

Removing From A vector

Part II: Scrabble Assistant

Legal Words

Anagrams

Power Set

Eliminating Duplicate Strings

Finding Words

Part III: Experimentation

The Experiment

Your Task

Memory

Coding Style Requirements

Peer Review

Summary of Requirements

Part I: `HashTable`

Removing From A `vector`