A skeleton version of the programs will appear in your
cs35/lab/10 directory when you run update35. The
program handin35 will only submit files in this
directory.
You are encouraged to work with a partner on this lab.
Introduction
The
Oracle of Bacon searches a
movie database to link Kevin Bacon to other actors by a
path of movies and co-stars that appeared in those movies. For
example, if you wanted to link Kurt Russell with Kevin Bacon, the
Oracle might report:
Kurt Russell was in Mean Season, The (1985) with Andy Garcia
Andy Garcia was in Air I Breathe, The (2007) with Kevin Bacon
Kurt Russell has a Kevin Bacon number of 2
This linking concept can be extended to any two actors. For example,
if you wanted to link Reese Witherspoon with John Wayne, the Oracle
might report:
Reese Witherspoon was in Rendition (2007) with Alan Arkin
Alan Arkin was in America at the Movies (1976) with John Wayne
Reese Witherspoon has a John Wayne number of 2
The same linking concept that originated with the Bacon Number
has now been extended to mathematics with the
Erdos Number
and even to the wacky combination of the two with the
Erdos-Bacon
Number.
In order to implement the search for links between actors in the
movie database you will use a number of the data structures and
techniques that we've explored throughout this course, including
lists, queues, dictionaries, and breadth-first search.
Requirements
Your program should provide an interactive menu that allows the user
to request the following information from the database:
- Given an actor's name, report all the movies that he
or she appeared in.
- Given a movie title and year, report all the actors
that appeared in the movie.
- Given two actor's names, report the shortest chain that links
them. If no chain exists, then inform the user.
Your program should handle incorrect information gracefully such as
when the actor name or movie title entered is not in the database.
Data Files
The data for this lab can be found in the directory
/usr/local/doc/. The complete set of data is over 100MB. A
number of trimmed down versions of the data are available. I strongly
urge you to initially test your program on the smaller versions before
attempting the larger ones. At a minimum your program should be able
to handle the two links version of the data.
- Bacon0Links.txt
All records for only the movies that
Kevin Bacon appeared in (57 records, 1.9KB)
- Bacon1Links.txt
All records for Kevin Bacon and his
co-stars (25324 records, 898KB)
- Bacon2Links.txt
All records for Kevin Bacon, his
co-stars, and his co-stars co-stars (424616 records, 16MB)
- BaconModernPopLinks.txt
All records for actors that
have appeared in at least 10 movies since 1970 (747497 records,
27MB)
- BaconAllLinks.txt
All records except for TV show
appearances or direct release to video (3,146,321 records, 113MB)
Getting Started
- I have provided you a starting point file called
OracleOfBacon.cpp that you must complete. It contains a
function that will open and read the movie data. Run make and
test that this works.
- In order to manipulate this large set of data efficiently, you
should create two hash tables. Note that these two tables should be
quite large in size (on the order of 99,999 buckets) and should remain
in existence throughout the life of the program. This does not mean
that the tables should be global variables! You can either define them
in the main program and pass them to functions that need them, or
define a class with methods that have access to the tables.
One table will map actor names to a list of movie titles. The
other table will map movie titles (including the year) to a list of
actors that appeared in the movies. For the lists, you may use the
ArrayList data structure provided or your own
implementation.
- In the main program, create an initial version of the
interactive menu. Note that if you read in the user's menu choice as
an integer, then the newline that follows their choice will still be
sitting in the input stream. In order to dispose of this you can do
cin.get() before you try to read in the next string.
Remember that you can use cin.getLine(...) to read in a
string containing spaces.
Using the two hash tables, your program should be
able to (1) look up an actor's name and report all the movies the actor
appeared in; and (2) look up a movie title and year and report all the
actors that were in the movie.
- Once these simpler requests are working correctly you are
ready to tackle the harder problem of finding the links between two
actors. I have provided two starting point files called
OracleNode.h and OracleNode.cpp that you must
complete. Use these to define and implement a class to represent the
nodes you create during your search. This class is relatively simple
and is primarily used to group together the data you need to report
the final results.
- Implement the search itself in the OracleOfBacon.cpp
main program. You will not need to create a premanent graph.
Instead, for each request from the user for the links between two
actors, you will dynamically create OracleNodes and add them
into a queue for breadth-first search.
- Because the movie database is so large, you should ensure that
only new information is added to the queue. This can be done by
creating a third hash table that simply stores all the actor's names
encountered during the search. If the name is in the table, then you
know that the data associated with that actor is already in the queue
and need not be generated again. This third hash table should be
created anew for each search.
Optional feature
If you'd like an additional challenge, try this optional
feature. For a given actor, report another actor whose path is finite
and maximal. In other words, given Kevin Bacon as a query, who is the
least linkable to Kevin Bacon and how long is his/her path? Note the
solution may not be unique. You just have to report one candidate.
Submit
Once you are satisfied with your code, hand it in by typing
handin35. This will copy the code from your
cs35/lab/10 to my grading directory. You may run
handin35 as many times as you like, and only the most recent
submission will be recorded. If you worked with a partner, only one of
you needs to handin the code.