CS35 Lab10: Oracle of Bacon

Due by midnight Wednesday, December 9

A skeleton version of the programs will appear in your cs35/lab/10 directory when you run update35. The program handin35 will only submit files in this directory.

You are encouraged to work with a partner on this lab.

Introduction
The Oracle of Bacon searches a movie database to link Kevin Bacon to other actors by a path of movies and co-stars that appeared in those movies. For example, if you wanted to link Kurt Russell with Kevin Bacon, the Oracle might report:
Kurt Russell was in Mean Season, The (1985) with Andy Garcia
Andy Garcia was in Air I Breathe, The (2007) with Kevin Bacon

Kurt Russell has a Kevin Bacon number of 2
This linking concept can be extended to any two actors. For example, if you wanted to link Reese Witherspoon with John Wayne, the Oracle might report:
Reese Witherspoon was in Rendition (2007) with Alan Arkin
Alan Arkin was in America at the Movies (1976) with John Wayne

Reese Witherspoon has a John Wayne number of 2

The same linking concept that originated with the Bacon Number has now been extended to mathematics with the Erdos Number and even to the wacky combination of the two with the Erdos-Bacon Number.

In order to implement the search for links between actors in the movie database you will use a number of the data structures and techniques that we've explored throughout this course, including lists, queues, dictionaries, and breadth-first search.

Requirements

Your program should provide an interactive menu that allows the user to request the following information from the database:

  1. Given an actor's name, report all the movies that he or she appeared in.
  2. Given a movie title and year, report all the actors that appeared in the movie.
  3. Given two actor's names, report the shortest chain that links them. If no chain exists, then inform the user.

Your program should handle incorrect information gracefully such as when the actor name or movie title entered is not in the database.

Data Files

The data for this lab can be found in the directory /usr/local/doc/. The complete set of data is over 100MB. A number of trimmed down versions of the data are available. I strongly urge you to initially test your program on the smaller versions before attempting the larger ones. At a minimum your program should be able to handle the two links version of the data.


Getting Started

  1. I have provided you a starting point file called OracleOfBacon.cpp that you must complete. It contains a function that will open and read the movie data. Run make and test that this works.
  2. In order to manipulate this large set of data efficiently, you should create two hash tables. Note that these two tables should be quite large in size (on the order of 99,999 buckets) and should remain in existence throughout the life of the program. This does not mean that the tables should be global variables! You can either define them in the main program and pass them to functions that need them, or define a class with methods that have access to the tables.

    One table will map actor names to a list of movie titles. The other table will map movie titles (including the year) to a list of actors that appeared in the movies. For the lists, you may use the ArrayList data structure provided or your own implementation.

  3. In the main program, create an initial version of the interactive menu. Note that if you read in the user's menu choice as an integer, then the newline that follows their choice will still be sitting in the input stream. In order to dispose of this you can do cin.get() before you try to read in the next string. Remember that you can use cin.getLine(...) to read in a string containing spaces.

    Using the two hash tables, your program should be able to (1) look up an actor's name and report all the movies the actor appeared in; and (2) look up a movie title and year and report all the actors that were in the movie.

  4. Once these simpler requests are working correctly you are ready to tackle the harder problem of finding the links between two actors. I have provided two starting point files called OracleNode.h and OracleNode.cpp that you must complete. Use these to define and implement a class to represent the nodes you create during your search. This class is relatively simple and is primarily used to group together the data you need to report the final results.
  5. Implement the search itself in the OracleOfBacon.cpp main program. You will not need to create a premanent graph. Instead, for each request from the user for the links between two actors, you will dynamically create OracleNodes and add them into a queue for breadth-first search.
  6. Because the movie database is so large, you should ensure that only new information is added to the queue. This can be done by creating a third hash table that simply stores all the actor's names encountered during the search. If the name is in the table, then you know that the data associated with that actor is already in the queue and need not be generated again. This third hash table should be created anew for each search.

Optional feature

If you'd like an additional challenge, try this optional feature. For a given actor, report another actor whose path is finite and maximal. In other words, given Kevin Bacon as a query, who is the least linkable to Kevin Bacon and how long is his/her path? Note the solution may not be unique. You just have to report one candidate.

Submit
Once you are satisfied with your code, hand it in by typing handin35. This will copy the code from your cs35/lab/10 to my grading directory. You may run handin35 as many times as you like, and only the most recent submission will be recorded. If you worked with a partner, only one of you needs to handin the code.