WEEK08: searching, analysis of algorithms --------------------------------------------------------------- M: linear and binary search ANNOUNCEMENTS: - quiz 4 this Friday...while loops, file I/O! - Lab 7 due *next* Tuesday night (Nov 3) - if you want to use Andy Danner's lab 6 functions in your lab 7, do this: from lab06 import * then use his GetAConsonant, GetMenuOption, IsAVowel, PrintMenu functions. For the function help info, do this: >>> import lab06 >>> help(lab06) NOTE: we're mostly done with teaching you python at this point. There's lots more you could learn about python, but you've got the basics now, and enough to write some interesting programs. From here on out we'll focus more on Computer Science, rather than computer programming... SEARCHING: - basic search question...is x in y, where x is some single value, and y is a group of values - for us, x is a simple variable, and y is a list - if x = 12, and y = [7,30,-29,-4,88,49,-29,100,92], is x in y?? - you can actually use the "in" operator to answer this: >>> x = 12 >>> y = [7,30,-29,-4,88,49,-29,100,92] >>> x in y False >>> x = 88 >>> x in y True - how does the "in" operator work?? how would you search through the list and see if a certain value is there or not?? LINEAR SEARCH: go one at a time, through the whole list (a for loop!) def linSearch(x, nums): for i in range(len(nums)): if nums[i] == x: # item found, return the index value return i return -1 # loop finished, item was not in list - is linSearch any good? how long does it take? does it matter if len(nums) is 10 or 10,000,000,000??? - in the best case, the item you're looking for is first in the list. In the worst case, the item is last or not in the list: best case: takes 1 "step" worst case: takes len(nums) "steps" - if we double the size of the list (nums), the worst case also doubles in size. This means the time it takes linSearch to work is proportional to the size of the list BINARY SEARCH: - *if* the list is sorted, there's a better way! - start in the middle of the list -- if that's the one you're looking for, then you're done. If not, what you're looking for must either be greater or less than the middle value. Either way, you can ignore one half of the list and start the process over with the other half of the list. Here's an example: >>> y.sort() >>> print y [-29, -29, -4, 7, 30, 49, 88, 92, 100] suppose we're looking for x = 12 in y. start with the 4th element (30). That's not it, and we know the list is sorted, and we know 12 is less than 30, so let's refocus on the left half of the list [-29, -29, -4, 7]. Pick the middle of the left-half -- either the 1st or 2nd element, and check for x. Let's say we picked the 1st element (-29), which is not what we are looking for, and x=12 is greater than -29, so now we can refocus on just [-4, 7] (and so on....) - the point is, each time we *halve* the list. - if the list has 8 elements in it, we can only halve that 3 times until we've either found x or it's not in the list - if the list has 16 elements in it, we can only halve that 4 times. - so each time we double the size of the list, it's only one more step we need to make!!! (unlike the linear search, where each time we double the size of the list, we double the number of steps needed) - so the time for binary search to do it's work is equal to "how many times can we halve the size of the list" - the question "how many times can we halve the list" can be answered with logarithms: log(base 2) of N = how many times we can halve N. try this in python: >>> import math >>> math.log(8,2) 3.0 >>> math.log(16,2) 4.0 >>> math.log(32,2) 5.0 each time we double N, that's only one more step... and this pays off big when N is large: >>> math.log(1000000,2) 19.93 so even if the list had 1 million elements, it would still only take about 20 steps to find what we're looking for (linear search would take 1 million steps in the worst case). YOUR TURN: - copy and try these programs for N=100000, 200000, 400000, 800000: /home/jk/inclass/linearsearch.py /home/jk/inclass/binarysearch.py $ python linearsearch.py enter N: 100000 linear search: List built 1 2 3 4 5 6 7 8 9 10 average: 0.0199360847473 - note how linear search's time approximately doubles each time you double N (try it!!! run the code for different N) - also note how much faster binary search is, and how it's time changes as you double N