CS35 Final Project Warmup

Oracles of Kevin Bacon or Baseball

Due 11:59pm Wednesday, 23 April 2008

This assignment is not complete yet.
You encouraged to work with a partner on this project.

The Oracles of Bacon and Baseball
The Oracle of Bacon searches a movie database to link two Actors/Actresses by a path of movies and co-stars that appeared in those movies. For example, if I want to link Kevin Bacon to Marilyn Monroe, the oracle of Bacon would report:
The Oracle says: Marilyn Monroe has a Kevin Bacon number of 2.

Marilyn Monroe  was in Misfits, The (1961)  with Eli Wallach
Eli Wallach was in Mystic River (2003) with Kevin Bacon 

Try playing this game yourself. Then check the Oracle. The Oracle usually finds a shorter answer. The number of movies you need to report is the link number. Can you find two movie stars with a link number greater than 3? Greater than 5? It is harder than it sounds. A similar concept is the Oracle of Baseball which links baseball players through common teams. For example, linking Babe Ruth to David Ortiz reports:


Babe Ruth   played with   Ben Chapman 	for the 1930 New York Yankees
Ben Chapman   played with   Early Wynn 	for the 1941 Washington Senators   
Early Wynn 	played with 	Tommy John 	for the 1963 Cleveland Indians
Tommy John 	played with 	Roberto Kelly 	for the 1988 New York Yankees
Roberto Kelly 	played with 	David Ortiz 	for the 1997 Minnesota Twins 	

Give this Oracle a try too if you enjoy baseball. The terms Bacon Number and Erdős number are based on similar linking numbers in movie and math publication contexts. For more insanity, read about the Erdős-Bacon Number.

Your final project will ask you to implement a basic version of the Oracle of Bacon or the Oracle of Baseball. This week, you will be asked to start thinking about this project in an even more limited context.

Link Queries
For this week's lab, you will design one or more classes such that you can parse a data file and answer the following queries on movie data or baseball data (movie data shown below).
  1. Given an Actor/Actress, report all movies that he/she appeared in.
  2. Given a movie, year, report all actors/actresses in that movie.
The basic approach is to use two hasmaps. One hashmap uses actors/actresses first and last names as keys and the other uses movie, year pairs. To simplify the implementation you can treat both types of keys as Strings and simply concatenate the movie/year or first name/last name pairs. For example, Fargo (1996) can be represented as the key "Fargo 1996". You could also create Movie and Actor classes and use these as keys, but you have to be careful to define an appropriate equals() method. The values in each hashmap should be some object type that can store a list of movies (in the actor/actress hashmap) or a list of actors/actresses (in the other map).
Data Files

For the baseball data, there is only one file in /usr/local/doc/BaseballLinks.txt. Each line represents one record with the following tab delimited fields: player ID, first name, last name, year, team name, team abbreviation. A completely random ;) example is shown below:

vanslan01	Andy	Van Slyke	1991	Pittsburgh Pirates	PIT

For the movie data, there are a number of files, each with the same format, but a different number of records. Each line represents one record with the following tab delimited fields: first name, last name, movie title, year. A sample is shown below:

Diedrich        Bader   Miss Congeniality 2: Armed & Fabulous  2005

Because the full movie data file is over 100 MB, I have made some shorter versions. All are in /usr/local/doc directory

Requirements
By the lab deadline, your code should be able to answer the following queries for the data of your choice (baseball or movies, you do not need to support both). In the example below, I will refer to the movie data.
  1. Given an Actor/Actress, report all movies that he/she appeared in.
  2. Given a movie, year, report all actors/actresses in that movie.
  3. Write a short main program that tests your implementation.
  4. Allow an interactive option where users can repeatedly enter names of Actors/Actresses and see a list of movies in which they appeared. You should gracefully handle the case when a user types in an actor that is not in the database, but you do not need to handle the case where a user misspells a name or disambiguate names when more than one actor has the same name.
I recommend using one or more hash maps for this assignment. One can be a hash map on Actors/Actresses, and the other can be on movie-year pairs. Determine what the values of the entries in each hash table should be. You can decide to use something other than hash tables. Try to design a flexible approach so you can reuse much of you code for this lab in your final assignment.
Submitting
Along with your Java source code, you should hand in a short README file. These files will be imported automatically via handin35. Your README should give a short description of your implementation and any known problems.
About the Data
The data for the baseball player info is used in the site baseball-reference and is available in an alternate (more verbose) format at baseball-databank.org.

The movie data comes from The Internet Movie Database and is available in an alternate (way more huge) format as part of their alternate interfaces. The data has been cleaned up a bit to remove TV shows and a number of unsavory direct to video releases (e.g., "The Land Before Time XIII: The Wisdom of Friends").