CS35 Lab1: Introduction to C++

Due 11:59pm Wednesday, September 8th

The goal of this lab is to gain more comfort with C++ functions and arrays. We will practice using arrays to process a list of numbers in various ways. A empty version the program will appear in your cs35/labs/01 directory when you run update35. The program handin35 will only submit files in this directory. In later labs you can work with a partner, but for this lab you should work on your own.

Introduction

For this lab you will prompt the user to enter a series of integers in a given range. Then you will perform some statistical analyses on the data and report the results. The most common statistical measure is the mean, which is simply the average. Another useful measure is the standard deviation, which provides an indication of how much the individual values in the data differ from the mean. A small standard deviation indicates that the data is tightly clustered. A large standard deviation indicates that the data is widespread. Finally, a histogram is a graphical way of summarizing the data by dividing it into separate ranges and then indicating how many data values fall into each range. Here is a sample run of a program that performs these three statistical analyses:

This program calculates statistics on a set of given data.
It calculates the mean, standard deviation, and prints a 
histogram of the values.

Enter integers in the range 0-100 to be stored, -1 to end.
> 90
> 89
> 97
> 95
> 123
Invalid value, try again.
> 84
> 71
> 78
> 88
> 82
> 100
> 55
> -1
Read in 11 valid values.

Mean: 84.455

Standard deviation: 12.324

Histogram:
   0 -    9: 
  10 -   19: 
  20 -   29: 
  30 -   39: 
  40 -   49: 
  50 -   59: *
  60 -   69: 
  70 -   79: **
  80 -   89: ****
  90 -   99: ***
 100 -  100: *
Program Requirements
Develop your program incrementally. Create a main function first. Declare the array that will hold the data here and pass it to the functions for processing. Add the required functions one at a time, testing each one by calling it from main. Remember that you will need to declare the functions above main and then define them below main. Once you are convinced that a function is correct, move on to the next required function.
Tips
Use printf to format your histogram nicely. You can read more about printf online.

One possible way of computing the histogram is to scan the entire array of input values and count the number of values that fall in a particular bin. This is rather slow. Another option is to create an array of bucket counts (one for each bin in the histogram), and for each value, scan the array of buckets to determine if value v should go in bucket i. This is also slow (Is it equally slow?). A final option is for each value in the data array to compute the ID of the bucket containing that value, and updating the appropriate count. After processing all data values, you can then scan the list of bucket counts and print out the histogram. I encourage you to aim for this approach.

Instead of typing in a bunch of numbers each time to test your program, you can save some sample data in a file, e.g., test1.txt containing only the input values:

90
89
97
95
123
84
71
78
88
82
100
55
-1
Then you can use input redirection to have your program read input from a file, e.g., ./stats < test1.txt. Try it. I have included test1.txt as one sample test. You may want to add others. This may be how your instructor tests your submisison, so it is a good idea to try it before I do (Hint: I do not use small data sets)
Optional Extensions
There are some optional extensions you could add to this lab. Below I list a few exercises you may wish to try. These exercises are entirely optional and will neither raise nor lower your grade. Try these exercise only after you have completed the required portion of the lab.

An alternate definition of the standard deviation is to compute the average of the squares of the entries and subtractthe squares of the averages of the entries (parse that sentence carefully). Try this method and try to reuse your mean function twice to avoid code duplication.

The median of a set of numbers is the number that would appear in the middle of a sorted list of numbers. Write a function to compute the median of your values. Is is necessary to sort? Can you reuse some of the ideas used to compute the histogram?

Submit
Once you are satisfied with your code, hand it in by typing handin35. This will copy the code from your cs35/labs/01 to my grading directory. You may run handin35 as many times as you like, and only the most recent submission will be recorded.