WEEK08: searching, analysis of algorithms
---------------------------------------------------------------
 F: timing, real-world example...anagrams


PYLAB:
 - copy over pylabLinePlots.py and try it out
 - http://matplotlib.sourceforge.net (good data plotting from python) 
 - note how logn is better than linear for large N (we'll look
   at the nlogn and quad lines next week)

SHUFFLE and SORT:

>>> from random import *
>>> L = range(10)
>>> shuffle(L)
>>> print L
[4, 2, 1, 5, 8, 7, 0, 3, 6, 9]
>>> 
>>> L.sort()
>>> print L
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

sort() is a list method, shuffle is a function from the
random library. Both do their work "in place", meaning the
list (L) is changed.

TIMING SEARCHES:

 - write some code to show how the time for a linear search
   doubles when we double the size of the data (N)

 - also, how does the "in" operator work? Does it do a linear
   search or a binary search??

from searches import *
from random import *
from time import time

##############################################

def main():
  """ask user for size of data, run searches"""

  N = input("size of data, N: ")
  T = input("  num trials, T: ")
  L = []
  for i in range(N):
    L.append(randrange(-N,N))

  L.sort()

  lintotal = 0
  bintotal = 0
  intotal = 0
  for t in range(T):
    x = randrange(-N,N)

    t1 = time()
    linearSearch(x, L)
    t2 = time()
    lintotal = lintotal + (t2 - t1)

    t1 = time()
    binarySearch(x, L)
    t2 = time()
    bintotal = bintotal + (t2 - t1)

    t1 = time()
    x in L
    t2 = time()
    intotal = intotal + (t2 - t1)

  print "ave linear time: %0.7f" % (lintotal/float(T))
  print "ave binary time: %0.7f" % (bintotal/float(T))
  print "ave   in   time: %0.7f" % (intotal/float(T))

##############################################

main()

$ python timesearches.py 
size of data, N: 200000
  num trials, T: 20
ave linear time: 0.0186138
ave binary time: 0.0000101
ave   in   time: 0.0085083

$ python timesearches.py 
size of data, N: 400000
  num trials, T: 20
ave linear time: 0.0469730
ave binary time: 0.0000128   <--- does not double!
ave   in   time: 0.0235038

ANAGRAMS:

 - copy /home/jk/inclass/anagrams.py and run the code:

$ python anagrams.py 

  -- Welcome to A N A G R A M --
  
  Enter a word and I'll find the anagrams (QUIT to quit): miles

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
miles
limes
smile
slime
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

 - look at the code to see how it works. You should be able to understand
   all but the Anagram function, which uses recursion (we'll cover that
   next week)

 - the Anagram function just creates a list of all possible permutations
   of the given word. If you type in cat, Anagram creates the following
   list: ['cat', 'act', 'atc', 'cta', 'tca', 'tac']

 - this list of all permutations is sent to displayAnagram which only
   prints out the real English words.

 - what happens when I type in an 8-letter word???

 - why does it run so slow???

 - for a 3-letter word there are 6 permutations. How many are there
	 for a 4-letter word? How about a 5-letter word??


YOUR TURN:

 - change the displayAnagram function to use a binary search, instead
   of this linear search:

       if w in english and w not in uniqalist:

   make that something like this:

       if binarySearch(w,english) and w not in uniqalist:

   so your binarySearch function needs to return a True if the word is
   found, and a False if not found.


RESULTS:

 - how many anagrams does the word "education" have?

 - how long does it take to figure that out with 
   binarysearchanagrams.py vs. anagrams.py??