Week 8, Monday: Searching

review Quiz 3 (see cs21/inclass/w08-searching/q3.py)
lab 7 design due this week

searching

Imagine we are searching for some x in a list L. For example, we could have x=20 and L=[5,60,34,89,16,33]. Different search questions we can ask:

is x in the list L? True/False
where is it in the list?
how many times is it in the list?
if it is not in the list, what is the closest/best match?

For now, let's just focus on the first question. Python gives us a simple way to answer that:

>>> x=20
>>> L=[5,60,34,89,16,33]
>>> print(x in L)
False

If we didn't have the in operator, we could easily write this as a function:

def search(x, L):
   """return True if x found in list L"""
   for item in L:
      if item == x:
         return True

   # only gets here if not found
   return False

Analysis of Algorithms

Part of computer science is analyzing algorithms to see how good they are. Is the above search(x,L) function better, worse, or the same as just using x in L? Is there another way to search for x in L? How can we compare algorithms?

One simple way to compare algorithms is to just time how long they take on the same data. Another way is to examine the algorithm and figure out (roughly) how many steps it will take to execute. Furthermore, it is often useful to know how the number of steps needed depends on the size of the input data.

For example, in the above search(x,L) function, if len(L) is 10, then there are 10 comparisons done (if item == x). There are a few more steps in that function (e.g., returning True or False), but the for loop is the main part. If we double the size of the list L, we will double the number of steps needed. This is called a linear search, because there is a linear dependence between the size of the list and the number of steps needed (or the time it takes to run). If we were to plot the size of the list vs the number of steps needed, it would be a straight (linear) line.

binary search

If the data were sorted from low to high, is there a better way to search for x in L?

L = [5, 16, 33, 34, 60, 89]

If we are still searching for x=20, there is no point searching past the 33, since we know the list is sorted from low to high, and we are already past what we are looking for.

Here is another example: if I say "I am thinking of a number from 1 to 100", and that when you guess, I will tell you if you are too high or too low, what is the first number you would guess?

A linear search would guess one number at a time, from 1 to 100.

A binary search would pick the middle and first guess 50. If that is too low, we can then concentrate on the upper half (51-100), and we have already eliminated half of the list! In a binary search, each guess should split the remaining list in half, and eliminate one half.

importing our own modules

I want us to write two search functions: linearSearch(x,L) and binarySearch(x,L), which we can use in our other programs. For example, if we put these two functions in a file called searches.py, I want to be able to say from searches import * in my other programs and be able to use those two search functions!

Here is the start of searches.py, with the linearSearch(x,L) function in it. Note the test code at the bottom of the file:

def linearSearch(x,L):
  """return True if x is in L, False if not"""

  for item in L:
    if item == x:
      return True

  # only gets here if not found!
  return False

if __name__ == "__main__":
  # put test code here
  L = range(10)
  x = 5
  print("%d %s %s" % (x, str(L), linearSearch(x,L)))
  x = 500
  print("%d %s %s" % (x, str(L), linearSearch(x,L)))

The if __name__ == "__main__": part just decides if this searches.py file is being imported or run. If it is being run, like python searches.py, then we want to run the test code. If it is being imported, to be used in another program, then we do not want to run the test code (if we are importing it, we assume it already works!).

On Wednesday we will add the binarySearch(x,L) function to the searches.py file.