# Week 8, Wednesday: Searching

### binary search

Here is an example of the binary search:

``````****** looking for g in:

['a', 'b', 'b', 'f', 'g', 'g', 'h', 'j', 'm', 'p', 'q', 'r', 'u', 'u', 'v', 'v']
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
L                                  M                                       H

Low:  0
High: 15
Mid: (low+high)/2 = 15/2 =  7

g < j, so high = mid - 1 = 6

['a', 'b', 'b', 'f', 'g', 'g', 'h', 'j', 'm', 'p', 'q', 'r', 'u', 'u', 'v', 'v']
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
L              M              H

Low:  0
High:  6
Mid: (low+high)/2 = 6/2 =  3

g > f, so low = mid + 1 = 4

['a', 'b', 'b', 'f', 'g', 'g', 'h', 'j', 'm', 'p', 'q', 'r', 'u', 'u', 'v', 'v']
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
L    M    H

Low:  4
High:  6
Mid: (low+high)/2 = 10/2 =  5

g found at index 5.
``````

And here is an example of what happens when the item we are looking for is NOT in the list:

``````****** looking for x in:

['c', 'e', 'g', 'i', 'k', 'k', 'l', 'o', 'o', 'r', 'r', 's', 's', 'z', 'z', 'z']
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
L                                  M                                       H

Low:  0
High: 15
Mid: (low+high)/2 = 15/2 =  7

x > o, so low = mid + 1 = 8

['c', 'e', 'g', 'i', 'k', 'k', 'l', 'o', 'o', 'r', 'r', 's', 's', 'z', 'z', 'z']
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
L              M                   H

Low:  8
High: 15
Mid: (low+high)/2 = 23/2 = 11

x > s, so low = mid + 1 = 12

['c', 'e', 'g', 'i', 'k', 'k', 'l', 'o', 'o', 'r', 'r', 's', 's', 'z', 'z', 'z']
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
L    M         H

Low: 12
High: 15
Mid: (low+high)/2 = 27/2 = 13

x < z, so high = mid - 1 = 12

['c', 'e', 'g', 'i', 'k', 'k', 'l', 'o', 'o', 'r', 'r', 's', 's', 'z', 'z', 'z']
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
LMH

Low: 12
High: 12
Mid: (low+high)/2 = 24/2 = 12

x > s, so low = mid + 1 = 13

------ Low (13) > High (12) -------

x is not in the list.
``````

Notice that the algorithm continues until either the item is found, or the low and high indecies go past each other (low > high).

Here is the pseudo-code for binary search:

``````set low = lowest-possible index
set high = highest possible index
LOOP:
calculate middle index = (low + high)/2
if item is at middle index, we are done (found it! return True)
elif item is < middle item,
set high to middle index - 1
elif item is > middle item,
set low to middle index + 1
if low is greater than high, item not here (done, return False)
``````

Write the `binarySearch(x, L)` function in `searches.py` and add some test code to make sure it works.

### testing with `assert`

The `assert` statement is one way to add test code to your modules and libraries. Here is an example of adding `assert` statements to our `searches.py` file:

``````if __name__ == "__main__":

N = 100
L = range(N)
assert linearSearch(N/2, L) == True
assert binarySearch(N/2, L) == True
assert linearSearch(N*2, L) == False
assert binarySearch(N*2, L) == False
assert linearSearch(2, []) == False
assert binarySearch(2, []) == False
``````

When you run that, there will be no output if all tests pass. If one of the tests fails, you will see an `AssertionError` message.

### analysis of algorithms

One simple way to compare algorithms is to count how many steps each requires, or measure how long each takes to run on the same computer with the same data. In addition to this, it is often useful to know how the number of steps (or the runtime) will change as the size of the data increases. For instance, if we are searching for an item in a list, will the runtime double if we double the size of the list? If so, we call this a linear algorithm, since the time needed to search the list is directly proportional to the size of the list.

How many steps will `linearSearch` take, for a list with `N` items in it? How about `binarySearch`? Let's look at each and compare the algorithms.

For a linear search, we need to, at worst, check each item in the list. There's just a simple compare (`if item == x`) done for each item in the list. So, for the worst case, there will be `N` compares or `N` steps needed. If the item we are looking for is the first item in the list, then only 1 compare is needed, but that is the best possible case. A better measure would be the average case, requiring `N/2` compares. Still, this clearly depends on the size of the list, `N`. So, if we double the size of the list, linear search will require, on average, or in the worst case, twice as many steps.

Since binary search splits the list in half each time, there are far fewer compares or steps. Given a list of size `N=16`, it is not hard to see there will at worst be 5 compares (just think about splitting 16 in half five times: 16-8-4-2-1). Mathematically, this can be computed with `log-base-2 of N`. If the size of the list is now doubled to `N=32` items, there is only one additional step required! (log-base-2 of 32 is 6, log-base-2 of 64 is 7, etc).

In python, you can explore the base-2 logs with the following:

``````>>> from math import *
>>> log(16,2)
4.0
>>> log(32,2)
5.0
>>> log(64,2)
6.0
>>> log(1000000,2)
19.931568569324174
>>> log(2000000,2)
20.931568569324174
``````

So, for one-million items in a sorted python list, binary search will require at most, about 20 steps. If we double the size of the list to two-million, binary search now requires only about 21 steps. Notice how this is very different from the linear search! A linear search would require, at worst, two-million steps, compared to binary searches 21 steps.

To summarize, we say linear search is an O(N) algorithm, and binary search is an O(logN) algorithm. The big-O notation just shows how each algorithm behaves versus the size of the data (N). Here are some other examples we will see in this class:

``````O(logN)     one extra step, each time size of data doubles
O(N)        number of steps doubles, each time size of data doubles
O(NlogN)
O(N*N)      number of steps quadruples, each time size of data doubles
``````

And here is a simple plot of each, comparing the number of steps or runtime versus the size of the data:

How many steps for each of the following? And what happens to the number of steps when n doubles to 2n?

``````#############################################
# 1
n = int(raw_input("n: "))
for i in range(n):
print i
#############################################
# 2
n = int(raw_input("n: "))
for i in range(100):
print i*n
#############################################
# 3
n = int(raw_input("n: "))
for i in range(n):
print i
for j in range(n):
print j
#############################################
# 4
n = int(raw_input("n: "))
for i in range(n):
for j in range(n):
print i, j
#############################################
# 5
n = int(raw_input("n: "))
for i in range(n):
for j in range(i,n):
print i, j
#############################################
# 6
n = int(raw_input("n: "))
for i in range(n):
for j in range(10):
print i, j
#############################################
# 7
n = int(raw_input("n: "))
while n > 1:
print n
n = n/2
#############################################
# 8
L = [1,2,5,7,13,21,24,25,26,33,34,38,50,57,58,63]
n = len(L)
mid = n/2
print L[mid]
#############################################
# 9
n = int(raw_input("n: "))
for i in range(n):
k = n
while k > 1:
print i, k
k = k/2
#############################################
``````