CS41 — Interesting Problems



For the first few weeks of the semester while we are working on some math preliminaries, the class will be divided into teams, each of which will be assigned one of the interesting problems below. Your team will design an algorithm to solve your problem, implement it, analyze it's complexity, and present your solutions both in front of the class and in a written summary. The work should be done without reading ahead, or consulting other texts teams, students, or the internet.

Groups will be assigned by either yourselves or me and will not change for the duration of the project.

By the next lab please submit a 1-2 page write up of how you are attacking the problem, progress, and pending work.

By the end of the project, groups should have working code, an analysis of their algorithm, a discussion of the approach used including test data, and a discussion of things you might have liked to do but did not have time for.

CPU scheduling
You have recently been hired by a large data center to develop a new algorithm for scheduling jobs on a single processor. Clients can submit jobs to the data center. Each job a(i) takes a known time t(i) to process on a single CPU. Clients will pay for jobs you succesfully run, but only if the job is completed by a specified deadline d(i). If you succeed in completing a job before the deadline, you are paid and amount p(i), otherwise you are paid nothing. Suppose you are given a set of n jobs a(i)=<t(i), d(i), p(i)> and you want to maximize the profit you can obtain by scheduling the jobs. It may not be possible to schedule all the jobs and have them all finish before their respective deadline. Design an algorithm that will create a job schedule and maximize the profit. You may assume that all times, deadlines, and profits are integers. How long does it take for you algorithm to run? What parameters does the run time of your algorithm depend on, e.g., number of jobs, maximum profit, latest deadline, longest running job? How do you know if it is correct? Do you think your asymptotic running time is optimal?

Sum of n dice
Most people are familiar with basic probabilities when rolling one die or two dice. For example, what is the probability of a roll of two random six sided dice summing to seven? But not much is typically known about k > 2 dice. In particular, if two players each throw n dice, what is the probability that they will roll the same sum? Design an algorithm to solve this problem. Your program should determine for n six-sided dice, the number of different ways to get a sum of k. You should consider each die unique so a roll of (1,2,3) is different than (3,1,2). This is supposed to be a programming exercise, not an exercise in advanced probability, so think more like a computer scientist. If you feel like you are bogged down in probability, please contact me and I can help you out with answering any questions you may have about the problem and guide you in better direction. How long does it take for you algorithm to run? What parameters does the run time of your algorithm depend on? How do you know if it is correct? Do you think your asymptotic running time is optimal? What is the maximum number of dice for which you can compute an answer? What is the limiting factor?

Google Maps Spatial Queries
Google maps and other map search engines allow users to search using text queries that describe geographic locations. Another way to search for objects would be to allow searches to be describe by a geographic box, circle, or in the most general case, an arbitrary two dimensional simply connected planar region. How would you design a data structure to store and query two dimensional point data using such query regions. It suffices to focus on bounded rectangular axis-aligned queries. Can you design a structure to support queries and updates? Analyze the space and runtime complexity of your approach.

Keep 'em separated
Election fever is sweeping through campus and a variety of student organizations are planning to staff a variety of booths over the weekend to support their cause. Groups have place their rectangular booths across Cunningham field (the rugby teams were away that weekend) in a haphazard manner. With no football team and no ruggers on campus to break up anticipated disputes, the campus administration has decided it would be safest if they could set up a temporary partition or fence to separate the two main groups of booths: Democrats and Republicans. The partition must be a simple straight line. Depending on the configuration of booths, such a partition may not be possible. Design an algorithm that determines if two groups of booths can be separated by a single line and returns one possible line if a separation is possible, or informs users that no such separation is possible. What is the complexity of your algorithm. Will it work on all possible configurations of booths, or only under limiting assumptions? Explain.

Intersection monitoring
Traffic in Philadelphia has become a mess. City planners would like to remote traffic monitors to observe traffic patterns to eventually commission construction project to relieve the traffic. Suppose a traffic monitor can only be placed at intersections and can only observe traffic along any road meeting at that intersection until the next intersection along that road. (It is easier for me to draw a picture of this). Naturally, the city would like to monitor all roads in the city with the minimal number of monitors. Design an algorithm that can determine the minimal number of monitors needed and their location such that all roads segments can be seen by at least one monitor.
Union of Rectangles
Given a collection of n axis-aligned rectangles, design an algorithm to compute the area of their union. If the rectangles are disjoint, the problem is easy. I want you do consider the case in which multiple rectangles can overlap. One application where this might be relevant is in photogrammetry and remote sensing. Suppose an aircraft or sattelite takes a series of snapshots of the Earth. Figuring out the percent of the Earth covered by the snapshots is roughly equivalent to computing the union of the rectangular regions of all the snapshots (curvature of Earth's surface and non-axis-aligned rectangles make the problem harder. If you solve this problem quickly, think about rectangles that may be rotated)
Range Minimum Queries
What were the high and low values of the S&P 500 in the past day, week, month, year, or decade. More generally, given two calendar days, what were the high and low values of the market between the two given days. Since the cases are symmetric, we are interested in an algorithm for supporting such range minimum queries. Given an array A of size n, we wish to preprocess A such that we can report the minimum value in any subarray of A quickly. Design a data structure and a query algorithm to solve this problem. Analyze you solution in terms of preprocessing time, space utilization for the data structure, and query time. Two obvious extremes are to do zero preprocessing and simply scan the list (O(1) preproc, O(n) space, O(n) query) or to compute all possible answers ahead of time and use a tree or hash table to find the results (O(n*n) preproc, O(n*n) space, O(log n) or O(1) query). Can you improve on these bounds? The best we could hope for is (O(1) preproc, O(n) space, O(1) query). Perhaps this is impossible as we have no time to preprocess, but the next best hope is to scan the array a few times, build a small index, and still achieve O(1) query after linear preprocessing time.
Art Gallerys
A local museum is exhibiting a large collection of priceless art. Naturally they would not want the art to be stolen so they would like to set up a number of cameras to observe the room. Unfortunately with artists being creative types the room is not a rectangle which can be observed with a single camera. Suppose the room is decribed by a simple but non-convex polygon. We wish to place cameras at the vertices of the polygon such that the entire interior of the polygon is visible by the cameras. Suppose the each camera can see arbitrarily far in a straight line as long as no walls obstruct the view. Furthermore assume the cameras have a viewing angle of 360 degrees. How many cameras are needed to cover a polygonal art gallery with n vertices? Can you prove both upper and lower bounds? This may sound a lot like the intersection monitoring problem which turned out to be rather hard (in terms of runtime) to find a solution. This problem has a polynomial solution. Your upper bound should not be in terms of big-Oh. Use exact values if possible (i.e., include the constant).
Line Segment Intersection
Given a collection of n line segments, report all intersections between the line segments. A brute force quadratic solution is possible by computing all possible pairs, but can you devise a more clever solution that may depend on the actual number of intersections k? Such algorithms all called output sensitive. Is there an upper bound to k. This problem is often used to solve spatial join operations in spatial databases. Given two sets of polygons, compute the union, intersection, or difference of the two sets. You only need to solve the segment intersection problem, but the second problem is interesting too.
Image classification with limited memory
A common task in image processing is to identify the number, size, and shape of connected regions in the image that have the same (or similar) intensity values. For example, a traffic camera might detect cars as darker blobs against a lighter higway background. An infared camera might identify animals as brighter spots against a dark backround. Suppose I am given a grid of pixels where each pixel has an integer intensity value. I would like to assign a unique label to each connected component in the image where a connected component is a connected region in the image where all pixels in the component have the same intensity value and for which I can go from any pixel in the connected component to any other pixel in the component via a path that is completely within the component. One easy solution is to run a breadth first search on each unvisited pixel to identify its connected compenent. This works great if I can load the entire image into memory. Suppose I add the constraint that only a constant number of rows fit into memory. Design an algorithm that still solves the problem, but reduces the number of times I need to load a row into memory.