WEEK11: recursion
---------------------------------------------------------------
 F: binary search and merge sort

BINARY SEARCH:

 - this algorithm works well with recursion, since at each level
   we eliminate half of the list and repeat the proceedure on the
   smaller remaining list

 - here are two ways to code binary search using recursion:

def recbinsearch1(x, y, low, high):
  """
  return True if x is in y. use binary search,
  assume y is already sorted...
  """
  if high < low:
    return False
  else:
    mid = (low + high) / 2  
    if x == y[mid]:
      return True
    elif x < y[mid]:
      return recbinsearch1(x,y,low,mid-1)
    else:
      return recbinsearch1(x,y,mid+1,high)

# ----------------------------------------------------- #

def recbinsearch2(x, y):
  """
  return True if x is in y. use binary search,
  assume y is already sorted...
  """
  if len(y) <= 0:
    return False
  else:
    low = 0
    high = len(y)-1
    mid = (low + high) / 2  
    if x == y[mid]:
      return True
    elif x < y[mid]:
      return recbinsearch2(x,y[:mid])
    else:
      return recbinsearch2(x,y[(mid+1):])

 - what are the differences between the two versions??
 - run timeBinarySearches.py and see if you can explain the timing results

Here are some interesting merge sort links:


MERGE SORT:

 - merge sort uses a divide-and-conquer method to sort a list:

    given a list, split it in two (if the size of the list > 1)
    sort each of the two halves of the list
    merge the two sorted halves back into the original list

 - this is easy to express using recursion:

def mergeSort(L):
  n = len(L)
  if n > 1:
    L1 = L[0:n/2]
    L2 = L[n/2:]
    mergeSort(L1)
    mergeSort(L2)
    merge(L1,L2,L)

 - that last step assumes we have a function called merge that
   merges the two sorted lists back into one big sorted list

YOUR TURN:

 - copy /home/jk/inclass/timeSorts.py and view the mergeSort function

 - run timeSorts.py and see how long mergeSort takes for N=5000,10000,20000
   (you may want to comment out selectionSort and insertionSort
    for the N=20000 run)

Here's what I get:

        N     5000       10000       20000
  -----------------------------------------
   L.sort     0.0053     0.0114      0.0102
selection     7.1494    14.5595     51.2008
insertion    11.7596    19.9309     77.8990
    merge     0.1288     0.1150      0.2420

 So merge sort is much better than selection or insertion, but
 not as good as the default .sort() method.

 - how does mergeSort's time depend on N, the size of the list?
 - how many "steps" does mergeSort take to finish it's work?

 Each level of recursion requires copying all N values from the
 halved sub-lists back to the bigger original list. So that's
 N copies * how many levels of recursion. Each level of recursion
 requires splitting a given list into two smaller lists. So "how many 
 levels of recursion" is really just "how many times can we split
 the list in two?", which we already know is log(base2)N.

 So mergeSort's time  =  N * logN, which is better than NxN.

 See /home/jk/inclass/pylabLinePlots.py for a graphical comparison
 of NlogN vs logN vs linear vs quadratic.