WEEK09: searching, analysis of algorithms
---------------------------------------------------------------
 F: binary search, real-world example...anagrams


REVIEW QUIZ 4

BINARY SEARCH:
 - last time we talked about linearSearch (and wrote searches.py example)
 - binarySearch is like the guessing game (guess a # from 1 to 100) but
   with info on each incorrect guess (too high or too low)

 - list of items must already be in order/sorted!!

 - let's try an example from the computer's point of view
   (ie, can't "see" all items in L at once):

    we have a variable, x
    we have a list of items, L, that are "in order"

    algorithm:
      find the middle item, compare it to x
      if x == middle item in L, return middle index
      elif x is < middle item in L, refocus on left half of list
      elif x is > middle item in L, refocus on right half of list
      repeat the above until we find x in L or we don't...


  How do we find the middle item in L???

  low_index = 0
  high_index = len(L) - 1
  middle_index = (low_index + high_index)/2

EXAMPLE:

x = 7
L = [-20, -12, -4, 1, 7, 44, 45, 46, 58, 99, 145]   

index: 0    1   2  3  4   5   6  7    8   9   10   

     low                 mid                 high 


 - mid = (low + high)/2 = (0 + 10)/2 = 5

 - compare x = 7 to L[5] = 44
 - since 7 is less than 44, we set high = mid - 1 = 4
   
      low = 0
      high = 4

 (now we focus on the list from index 0 to index 4)

 - recalculate mid = (low + high)/2 = (0 + 4)/2 = 2

 - compare x = 7 to L[2] = -4
 - since 7 is greater than -4, set low to mid + 1

      low = 3
      high = 4

 (now we focus on the list from index 3 to index 4)

 - recalculate mid = (low + high)/2 = (3 + 4)/2 = 3
 - compare x = 7 to L[3] = 1
 - since 7 is greater than 1, set low to mid + 1 (again)

      low = 4
      high = 4

 (now we focus on the list from index 4 to index 4)

 - recalculate mid = (low + high)/2 = (4 + 4)/2 = 4
 - compare x = 7 to L[4] = 7
 - FOUND X in L, so return mid index (4)


WORKSHEET:

 - copy this worksheet, which includes pseudo-code, and do the searches: 

     cp /home/jk/searchWorksheet  .

 (Note: assume you are the computer, doing a binarySearch for x in the list L)


NOTE: comparisons work with numbers and strings!

>>> "jeff" > "frances"
True
>>> "jeff" > "Jeff"
True
>>> "jeff" > "jeffrey"
False
>>> "apple" < "zebra"
True


YOUR TURN:

 - code up the binarySearch(x, L) algorithm in your searches.py file
   and TEST it

ANALYSIS of BINARY SEARCH:

 How many "steps" does each algorithm take, if len(L) = N??

                 best case          worst case
                 ---------          ----------
linear search:      1                   N
binary search:      1                   ???

 For the binary search, each time through the while LOOP we split
 the list in half. If N = 1000000, how many times can we split
 one million in half??

>>> N = 1000000
>>> while N > 1:
...   print N
...   N = N/2
... 
1000000
500000
250000
125000
62500
31250
15625
7812
3906
1953
976
488
244
122
61
30
15
7
3

 If you count that up, it's about 19 times.

 Here's the mathy way to say that: log-base-2 of N

>>> from math import *
>>> log(1000000, 2)
19.931568569324174

 If I double the amount of data we are searching through (N = 2000000),
 now how many steps does binary search require (in the worst case)???

>>> log(2000000, 2)
20.931568569324174

 JUST 1 more step than the N=1000000 case!!!


 How many "steps" does each algorithm take, if len(L) = N??

                 best case          worst case
                 ---------          ----------
linear search:      1                   N
binary search:      1                   log(N)


TIMING SEARCHES:

 - write some code to show how the time for a linear search
   doubles when we double the size of the data (N)

 - also, how does the "in" operator work? Does it do a linear
   search or a binary search??

from searches import *
from random import *
from time import time

##############################################

def main():
  """ask user for size of data, run searches"""

  N = input("size of data, N: ")
  T = input("  num trials, T: ")
  L = []
  for i in range(N):
    L.append(randrange(-N,N))

  L.sort()

  lintotal = 0
  bintotal = 0
  intotal = 0
  for t in range(T):
    x = randrange(-N,N)

    t1 = time()
    linearSearch(x, L)
    t2 = time()
    lintotal = lintotal + (t2 - t1)

    t1 = time()
    binarySearch(x, L)
    t2 = time()
    bintotal = bintotal + (t2 - t1)

    t1 = time()
    x in L
    t2 = time()
    intotal = intotal + (t2 - t1)

  print "ave linear time: %0.7f" % (lintotal/float(T))
  print "ave binary time: %0.7f" % (bintotal/float(T))
  print "ave   in   time: %0.7f" % (intotal/float(T))

##############################################

main()

$ python timesearches.py 
size of data, N: 200000
  num trials, T: 20
ave linear time: 0.0186138
ave binary time: 0.0000101
ave   in   time: 0.0085083

$ python timesearches.py 
size of data, N: 400000
  num trials, T: 20
ave linear time: 0.0469730
ave binary time: 0.0000128   <--- does not double!
ave   in   time: 0.0235038


ANAGRAMS:

$ python anagrams.py 

  -- Welcome to A N A G R A M --
  
  Enter a word and I'll find the anagrams (QUIT to quit): miles

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
miles
limes
smile
slime
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

 - the Anagram code just creates a list of all possible permutations
   of the given word. If you type in cat, Anagram creates the following
   list: ['cat', 'act', 'atc', 'cta', 'tca', 'tac']

 - each permutations is then searched for in a list of real English words.

 - what happens when I type in an 8-letter word???

 - why does it run so slow???

 - for a 3-letter word there are 6 permutations. How many are there
	 for a 4-letter word? How about a 5-letter word??

 - for a 9-letter word there are factorial(9) permutations to look
   up. Should we use a linear or a binary search to look up each 
   permutation in the list of English words???!


>>> from math import *
>>> factorial(9)
362880