WEEK09: searching, analysis of algorithms --------------------------------------------------------------- F: binary search, real-world example...anagrams REVIEW QUIZ 4 BINARY SEARCH: - last time we talked about linearSearch (and wrote searches.py example) - binarySearch is like the guessing game (guess a # from 1 to 100) but with info on each incorrect guess (too high or too low) - list of items must already be in order/sorted!! - let's try an example from the computer's point of view (ie, can't "see" all items in L at once): we have a variable, x we have a list of items, L, that are "in order" algorithm: find the middle item, compare it to x if x == middle item in L, return middle index elif x is < middle item in L, refocus on left half of list elif x is > middle item in L, refocus on right half of list repeat the above until we find x in L or we don't... How do we find the middle item in L??? low_index = 0 high_index = len(L) - 1 middle_index = (low_index + high_index)/2 EXAMPLE: x = 7 L = [-20, -12, -4, 1, 7, 44, 45, 46, 58, 99, 145] index: 0 1 2 3 4 5 6 7 8 9 10 low mid high - mid = (low + high)/2 = (0 + 10)/2 = 5 - compare x = 7 to L[5] = 44 - since 7 is less than 44, we set high = mid - 1 = 4 low = 0 high = 4 (now we focus on the list from index 0 to index 4) - recalculate mid = (low + high)/2 = (0 + 4)/2 = 2 - compare x = 7 to L[2] = -4 - since 7 is greater than -4, set low to mid + 1 low = 3 high = 4 (now we focus on the list from index 3 to index 4) - recalculate mid = (low + high)/2 = (3 + 4)/2 = 3 - compare x = 7 to L[3] = 1 - since 7 is greater than 1, set low to mid + 1 (again) low = 4 high = 4 (now we focus on the list from index 4 to index 4) - recalculate mid = (low + high)/2 = (4 + 4)/2 = 4 - compare x = 7 to L[4] = 7 - FOUND X in L, so return mid index (4) WORKSHEET: - copy this worksheet, which includes pseudo-code, and do the searches: cp /home/jk/searchWorksheet . (Note: assume you are the computer, doing a binarySearch for x in the list L) NOTE: comparisons work with numbers and strings! >>> "jeff" > "frances" True >>> "jeff" > "Jeff" True >>> "jeff" > "jeffrey" False >>> "apple" < "zebra" True YOUR TURN: - code up the binarySearch(x, L) algorithm in your searches.py file and TEST it ANALYSIS of BINARY SEARCH: How many "steps" does each algorithm take, if len(L) = N?? best case worst case --------- ---------- linear search: 1 N binary search: 1 ??? For the binary search, each time through the while LOOP we split the list in half. If N = 1000000, how many times can we split one million in half?? >>> N = 1000000 >>> while N > 1: ... print N ... N = N/2 ... 1000000 500000 250000 125000 62500 31250 15625 7812 3906 1953 976 488 244 122 61 30 15 7 3 If you count that up, it's about 19 times. Here's the mathy way to say that: log-base-2 of N >>> from math import * >>> log(1000000, 2) 19.931568569324174 If I double the amount of data we are searching through (N = 2000000), now how many steps does binary search require (in the worst case)??? >>> log(2000000, 2) 20.931568569324174 JUST 1 more step than the N=1000000 case!!! How many "steps" does each algorithm take, if len(L) = N?? best case worst case --------- ---------- linear search: 1 N binary search: 1 log(N) TIMING SEARCHES: - write some code to show how the time for a linear search doubles when we double the size of the data (N) - also, how does the "in" operator work? Does it do a linear search or a binary search?? from searches import * from random import * from time import time ############################################## def main(): """ask user for size of data, run searches""" N = input("size of data, N: ") T = input(" num trials, T: ") L = [] for i in range(N): L.append(randrange(-N,N)) L.sort() lintotal = 0 bintotal = 0 intotal = 0 for t in range(T): x = randrange(-N,N) t1 = time() linearSearch(x, L) t2 = time() lintotal = lintotal + (t2 - t1) t1 = time() binarySearch(x, L) t2 = time() bintotal = bintotal + (t2 - t1) t1 = time() x in L t2 = time() intotal = intotal + (t2 - t1) print "ave linear time: %0.7f" % (lintotal/float(T)) print "ave binary time: %0.7f" % (bintotal/float(T)) print "ave in time: %0.7f" % (intotal/float(T)) ############################################## main() $ python timesearches.py size of data, N: 200000 num trials, T: 20 ave linear time: 0.0186138 ave binary time: 0.0000101 ave in time: 0.0085083 $ python timesearches.py size of data, N: 400000 num trials, T: 20 ave linear time: 0.0469730 ave binary time: 0.0000128 <--- does not double! ave in time: 0.0235038 ANAGRAMS: $ python anagrams.py -- Welcome to A N A G R A M -- Enter a word and I'll find the anagrams (QUIT to quit): miles -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- miles limes smile slime -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - the Anagram code just creates a list of all possible permutations of the given word. If you type in cat, Anagram creates the following list: ['cat', 'act', 'atc', 'cta', 'tca', 'tac'] - each permutations is then searched for in a list of real English words. - what happens when I type in an 8-letter word??? - why does it run so slow??? - for a 3-letter word there are 6 permutations. How many are there for a 4-letter word? How about a 5-letter word?? - for a 9-letter word there are factorial(9) permutations to look up. Should we use a linear or a binary search to look up each permutation in the list of English words???! >>> from math import * >>> factorial(9) 362880