CS35 Lab5: Doubly Linked Lists

and Move to Front

Due 11:59pm Wednesday, 6 October 2010
For this assignment, you will further explore linked structures by implementing part of a doubly linked list. Additionally, you will implement a Move to Front heuristic and decide if it helps speed up query performance. You will primarily implement a placeList which allows users to search for cities in the United States.

A skeleton version of the programs will appear in your cs35/labs/05 directory when you run update35. The program handin35 will only submit files in this directory.

You may work with one partner on this assignment. If you work with a partner, be sure to put both names on your assignment when submitting.

For this lab, you will be using data in the file /usr/local/doc/places.txt which contains a list of 30000+ US cities. Below is a snippet of the file:
19078,39.878212,-075.323082,Ridley Park,Delaware,PA,7196
19079,39.901112,-075.267628,Sharon Hill,Delaware,PA,5468
19082,39.951662,-075.285429,Upper Darby,Delaware,PA,50
Each line contains the ZIP code, latitude, longitude, city name, county name, state, and population, all separated by commas. The population data is not terribly accurate as indicated even in this small example. Places with an unknown population are listed with a population of 50.

To aid in processing the data, I have provided you with two classes Place and PlaceIterator. You should only need the browse the appropriate .h files to get a feel of what these classes do. You do not need to read or understand the corresponding .cpp files. The PlaceIterator class allows you to quickly loop over the data as shown in test.cpp.

 PlaceIterator it(fname);
   p = it.next();
Your initial goal is to implement the PlaceList class described in placeList.h and partially implemented in placeList.cpp. The PlaceList class should be implemented as a doubly linked list, such that each node in the list has pointers to both the next and previous node. A templated DNode class is provided for you in dnode.*. You should not change the names, parameter list, or return types of any of the methods in the PlaceList class, but you are welcome to add additional methods as you see fit. Note that the PlaceList class is not templated. You can only create lists of Place objects. Each occurrence of a DNode in the PlaceList class is always DNode<Place>. Because PlaceList is not templated (it only uses the templated DNode class), you should implement your PlaceList in placeList.cpp.
Querying and Move to Front
In addition to practicing using linked structures and understanding doubly linked lists, a goal of this lab is to test a move to front heuristic for improving query performance. Think about how getPlace or hasPlace is implemented in a linked list. Searching for something near the end of the list takes longer. The move to front heuristic takes recent queries and moves the appropriate item to the front of the list. The hope is that by moving items towards the front, they will be easier to find later.

To test this hypothesis, you will run a series of tests. Each test will have the following general setup:

Randomly generate a list of queries
Execute the queries on a PlaceList and time the results
For the first part, you will generate lists of random queries in two separate ways. In the first way, you will select each place randomly with equal (or uniform) probability. In the second way, you will select places randomly but with probability weighted by population, such that bigger places will be picked more often.

For the second part, you will execute queries two different ways. Once without applying the move to front heuristic on any query, and once applying the move to front heuristic on all queries. You should have four tests total.

Since this is not a stats class, I have provided a placeSampler class which allows you to sample places uniformly or weighted by population. test.cpp shows examples on using each of these methods.

string fname = "/usr/local/doc/places.txt"; 
PlaceSampler sample(fname); 
Place p;
int nsamps  = 100;
for(i=0; i<nsamps; i++){
   p=sample.uniform(); //or sample.randomByPop()
   cout << i <<  ": ";

You should generate a PlaceList of queries and then execute these queries sequentially on the list of all Places. When comparing the runtimes with and without the move to front heuristic, use the same query list, so you are timing the effect of the heuristic, not the distribution of queries.

Further experiment with your test code by generating different number of queries (try 100, 500, 1000, etc.). You longest tests should run for at least 20 seconds on uniformly random queries and no move to front heuristic.

Along with your C++ source code, you should hand in a file named README.txt. Your README should include a brief summary of your results, including a discussion of the advantages/disadvantages of the move to front heuristic for uniformly sampled data and data that have elements that are significantly more popular than others. Describe how you tested your implementations. Do you have any problems/bugs in your code? For each of the methods in your PlaceList class, give big-Oh runtime bounds. You may separately highlight best-case, worst-case, and/or expected case bounds if the run-time depends on some factors other than the number of elements n in the list. Also provide the average time per query in each of the four tests (Uniform w/o heuristic, Uniform with heuristic, Population w/o heuristic, Population with heuristic). Be sure to state the number of queries executed in each test when reporting your results.
If you need extra motivation and have trouble tracking the crazy moves of doubly linked nodes, perhaps the following can inspire you, though it mostly describes a Take it back now y'all heuristic instead of move to the front.

Once you are satisfied with your code, hand it in by typing handin35. This will copy the code from your cs35/labs/05 to my grading directory. You may run handin35 as many times as you like, and only the most recent submission will be recorded.