knerr cs21 notes...
back to schedule
WEEK08: searching, analysis of algorithms
---------------------------------------------------------------
M: linear and binary search
ANNOUNCEMENTS:
- quiz 4 this Friday...while loops, file I/O!
- Lab 7 due *next* Tuesday night (Nov 3)
- if you want to use Andy Danner's lab 6 functions in your
lab 7, do this:
from lab06 import *
then use his GetAConsonant, GetMenuOption, IsAVowel, PrintMenu
functions. For the function help info, do this:
>>> import lab06
>>> help(lab06)
NOTE: we're mostly done with teaching you python at this point.
There's lots more you could learn about python, but you've got
the basics now, and enough to write some interesting programs.
From here on out we'll focus more on Computer Science, rather
than computer programming...
SEARCHING:
- basic search question...is x in y, where x is some single
value, and y is a group of values
- for us, x is a simple variable, and y is a list
- if x = 12, and y = [7,30,-29,-4,88,49,-29,100,92], is x in y??
- you can actually use the "in" operator to answer this:
>>> x = 12
>>> y = [7,30,-29,-4,88,49,-29,100,92]
>>> x in y
False
>>> x = 88
>>> x in y
True
- how does the "in" operator work?? how would you search through
the list and see if a certain value is there or not??
LINEAR SEARCH: go one at a time, through the whole list (a for loop!)
def linSearch(x, nums):
for i in range(len(nums)):
if nums[i] == x: # item found, return the index value
return i
return -1 # loop finished, item was not in list
- is linSearch any good? how long does it take? does it
matter if len(nums) is 10 or 10,000,000,000???
- in the best case, the item you're looking for is first in the
list. In the worst case, the item is last or not in the list:
best case: takes 1 "step"
worst case: takes len(nums) "steps"
- if we double the size of the list (nums), the worst case also
doubles in size. This means the time it takes linSearch to work
is proportional to the size of the list
BINARY SEARCH:
- *if* the list is sorted, there's a better way!
- start in the middle of the list -- if that's the one you're looking
for, then you're done. If not, what you're looking for must either
be greater or less than the middle value. Either way, you can ignore
one half of the list and start the process over with the other half
of the list. Here's an example:
>>> y.sort()
>>> print y
[-29, -29, -4, 7, 30, 49, 88, 92, 100]
suppose we're looking for x = 12 in y.
start with the 4th element (30). That's not it, and we know the
list is sorted, and we know 12 is less than 30, so let's refocus
on the left half of the list [-29, -29, -4, 7].
Pick the middle of the left-half -- either the 1st or 2nd element,
and check for x. Let's say we picked the 1st element (-29), which
is not what we are looking for, and x=12 is greater than -29, so
now we can refocus on just [-4, 7] (and so on....)
- the point is, each time we *halve* the list.
- if the list has 8 elements in it, we can only halve that 3 times
until we've either found x or it's not in the list
- if the list has 16 elements in it, we can only halve that 4 times.
- so each time we double the size of the list, it's only one more
step we need to make!!! (unlike the linear search, where each time
we double the size of the list, we double the number of steps needed)
- so the time for binary search to do it's work is equal to "how many
times can we halve the size of the list"
- the question "how many times can we halve the list" can be answered
with logarithms: log(base 2) of N = how many times we can halve N.
try this in python:
>>> import math
>>> math.log(8,2)
3.0
>>> math.log(16,2)
4.0
>>> math.log(32,2)
5.0
each time we double N, that's only one more step...
and this pays off big when N is large:
>>> math.log(1000000,2)
19.93
so even if the list had 1 million elements, it would still only
take about 20 steps to find what we're looking for (linear search
would take 1 million steps in the worst case).
YOUR TURN:
- copy and try these programs for N=100000, 200000, 400000, 800000:
/home/jk/inclass/linearsearch.py
/home/jk/inclass/binarysearch.py
$ python linearsearch.py
enter N: 100000
linear search:
List built
1 2 3 4 5 6 7 8 9 10
average: 0.0199360847473
- note how linear search's time approximately doubles each time
you double N (try it!!! run the code for different N)
- also note how much faster binary search is, and how it's time
changes as you double N