knerr cs21 notes...

back to schedule

WEEK09: sorting (and more analysis of algorithms)
---------------------------------------------------------------
 W: selection sort analysis, start recursion

REVIEW QUIZ 4:

 - see /home/jk/inclass/q4.py and make sure you understand it!
 - see /home/jk/inclass/q4extra.py and make sure you understand it!

 (note how you can change *elements* of a list, and those changes
  are seen back in main)


FROM LAST TIME:

 - we fixed up anagrams.py to use a binary search
 - how do anagrams.py and binarysearchanagrams.py compare
   when given a 7-letter word (like parsley)? how about
   and 8-letter word (like predator)?

   (see /home/jk/inclass/binarysearchanagrams.py for details)


SORTING:

 - how does selection sort compare to python's built-in sort?

list size (n): 20000
L.sort time: 0.0246
-----------------------------------
selectionSort(L) time: 50.9708

    *** clearly L.sort is faster...why??

 - how does selection sort's time depend on the size of the list (N)?
 - try our selection sort for N=5000,10000,20000...what happens to
   the run time?

ANALYSIS:

 - how many "steps" does selection sort require to sort a list?

  for i in range(len(L)):    <--- for loop depends on N
    ismall = i  
    # find index of smallest element of rest of list 
    for j in range(i+1,len(L),1):   <--- another (nested) for loop
      if L[j] < L[ismall]:
        ismall = j
    # swap smallest element with L[i]
    L[i],L[ismall] = L[ismall],L[i]

 Here's how selection sort works on a sample list of length 7:

     [ 9,    50,    -2,     6,    9,    17,    20]
     <-- find smallest num, swap to position 0 -->
     [ -2,    50,    9,     6,    9,    17,    20]
             <-- find smallest, swap to pos  1 -->
     [ -2,     6,    9,    50,    9,    17,    20]
                    <-- find small,  swap to 2 -->
     [ -2,     6,    9,    50,    9,    17,    20]
                           <-- find/swap  to 3 -->
     [ -2,     6,    9,    9,    50,    17,    20]
                                 <-- swap to 4 -->
     [ -2,     6,    9,    9,    17,    50,    20]
                                        <--  5 -->
     [ -2,     6,    9,    9,    17,    20,    50]
                                               <->
     [ -2,     6,    9,    9,    17,    20,    50]


 See how that looks like an NxN matrix? Actually, it 
 looks like 0.5*(NxN), but that's still proportional to N-squared!!

 Another way to look at it is like this:

  - in the first line, you have to scan N elements to find the smallest
  - in the second line, you have to scan N-1 elements
  - in the third line, you have to scan N-2 elements
  - and so on until you scan 3, 2, and 1 element

 Summing all of those up can be written 
 S = N + N-1 + N-2 + ... + 3 + 2 + 1

 Now for some "fancy" math fun -- add two of those Sums together,
 but write the second one in reverse order:

   S = N + N-1 + N-2 + ... + 3  + 2   + 1
+  S = 1 +  2  +  3 + ... + N-2 + N-1 + N
---------------------------------------
  2S = N+1 + N+1 + N+1 + ... + N+1 + N+1 + N+1

 and note there are N N+1's on the right side of that equation,
 so 2S = N*(N+1), or S = 0.5*(N-squared + N)

 As N gets very large, the N-squared term is much bigger than
 the N term, so this is proportional to N-squared


YOUR TURN:

 - something proportional to N-squared is called quadratic, 
   because if you double N, the "something" quadruples!

 - run the /home/jk/inclass/lineplots.py program again to 
   see the comparison of things proportional to 
   N (linear) vs NlogN vs N-squared (quad)

RECURSION:

 - the python built-in sort uses a divide-and-conquer algorithm,
   but before we can learn that algorithm we need to understand
   recursive functions: a function that calls itself is recursive

 - recursion is most useful when, to solve a problem, you break 
   the original problem into smaller pieces, but can apply the 
   same algorithm to the smaller pieces

 - a great example is binary search and the number guessing game:

        * i'm thinking of a number between 1 and 1000
        
        * if you guess 500, and are told that's not it, but the
          answer is less than 500, then you can apply the same
          strategy to a smaller problem (i'm thinking of a number
          between 1 and 500...you guess 250)

 - two things you need to use recursion:

     1. a base case, or some way to stop the recursion
     2. all chains of recursion need to lead to the base case

   for example, in the binary search guessing game, the base case
   is when you guess the correct number, and all guesses lead to
   smaller and smaller ranges of numbers that contain the base
   case (the correct number)

FACTORIAL EXAMPLE:

 - the factorial function is defined as follows:

       n! = 1 if n=0, and n*((n-1)!) if n > 0

   so 0! = 1
      1! = 1 * 0! = 1 * 1 = 1
      2! = 2 * 1! = 2
      3! = 3 * 2! = 6
   and so on

   You can try this out in python:

>>> from math import *
>>> factorial(3)
6
>>> factorial(4)
24
>>> factorial(5)
120

   And the above definition is easily expressed using recursion.

   What is the base case??    n = 0
   How do we get to a smaller-sized problem??  n! = n * (n-1)!

   Will this work???

def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n-1)

 - try running /home/jk/inclass/factorialstack.py to see the stack
   for the recursive factorial function:

$ python factorialstack.py

please enter a positive integer...
           n: 4
-------------------------------
  in factorial...stacknum = 1, n = 4
  calling 4 * factorial(3)
    in factorial...stacknum = 2, n = 3
    calling 3 * factorial(2)
      in factorial...stacknum = 3, n = 2
      calling 2 * factorial(1)
        in factorial...stacknum = 4, n = 1
        calling 1 * factorial(0)
          in factorial...stacknum = 5, n = 0
          returning 1
        returning 1 
      returning 2 
    returning 6 
  returning 24 
factorial(n): 24

 - NOTE: you can easily write a factorial function that does not
   use recursion. There are, however, some algorithms that are not
   easy to write without using recursion.