Arrays and Sorting

Topics Covered:

Things To Do Before Class

Things To Do After Class

Sample Problem

How would you write the pseudocode for a program that recorded grades for four courses, calculated and displayed the average, and then reported which courses were above the average? You might come up with something such as:

NUM_COURSES = 4;
Print "Enter grade #1: "
Get course1grade
Print "Enter grade #2: "
Get course2grade
Print "Enter grade #3: "
Get course3grade
Print "Enter grade #4: "
Get course4grade

avgGrade = (course1grade + course2grade + course3grade 
	+ course4grade) / NUM_COURSES
Print "Average:  ", avgGrade

Print "Above average courses:"
If course1grade > avgGrade
	Print "1"
If course2grade > avgGrade
	Print "2"
If course3grade > avgGrade
	Print "3"
If course4grade > avgGrade
	Print "4"

What if we wanted to use this program for any number of courses? How will we know how many course grade variables to use? What if we wanted to also record course codes, and what if we wanted to print a report of the courses and final grades in order from lowest grade to highest?

Lists of Data

When you are reading data in from a file, you will often need to sort it. Records in a file are generally not sorted in any particular order: they are usually stored in the order in which they were added to the file. When you are printing reports or summarizing data, you may find that this order is not appropriate. For example, if you wanted to print a list sales people grouped by sales region, showing the average sales per region and the average sales overall, you will need to sort your record before printing the report.

There are many sort algorithms available (last session you had an introduction to the Bubble Sort), but when the data is stored in a file, you can't sort it. You need to bring all of the data or records into memory and sort them there.

When we want to store a set of data or a list of items in memory, we use an array. An array is like a variable, except that instead of holding only one value, it can hold a number of values! You can safely hold one egg in your hand, but an egg carton, with all it's 12 compartments, can safely hold 12 eggs! An array is like a variable that is an egg carton... an array is defined to hold a particular type of data, and it has a number of "compartments" in which to hold different data values.

The "compartments" in an array are called elements. When you define an array, not only do you have to give that array variable a name and the data type that array should hold, but you will eventually need to identify the number of elements the array should have.

When you store a value in an array, or try to read a value from an array, you will need to know how to refer to a particular array element. For example, you can put a value in the 5th element, or see what's inside the 10th element. We refer to the element numbers using subscripts or indexes. A subscript or index can't be a negative value.

Different languages have different rules for declaring how many elements an array shall have, and what the values of the indexes will be. For example, in Visual Basic, you can define an array to hold 10 elements, from 1 to 10. You could also make that 10-element array have indexes from 2001 to 2010. In Java, an array with 10 elements will always be from 0 to 9. In pseudocode, you usually assume an array is 0-based. In other words, assume that the first index in any array is 0.

Arrays need to be declared. You need to give the array a name, number of elements, and a data type. An array can hold only one kind of data. For example, an array holds floating-point values, or an array holds strings. An array can't hold floating-points and strings. We usually declare an array with statements such as:

float arrayName[n]
string anotherArray[1 to 10]

Where n is the number of elements the array will hold.

Accessing Array Elements

There are lots of reasons to use an array, and lots of ways to populate an array with values. The array might be filled with values from the keyboard, or values from a file. The array might get its values from some kind of calculation, or a combination of any of these things. You must always indicate which index of the array you are accessing in the square brackets:

Print arrayName[5]
Assign "Fred" to arrayName[10]

If we wanted to write code to fill in an array based on user inputs, we might try something like this:

integer numbers[10]
counter = 1
While counter <= 10
   Print "Enter number ", counter
   Get numbers[counter]
   Add 1 to counter

If you wanted to write the code to print all the array elements and then calculate the average, you might have:

counter = 1
While counter <= 10
   Print numbers[counter]
   Add numbers[counter] to total
Print total / 10

Exercise

A string array called names[] with 10 elements contains the last names of all the salespeople in a company. Write the pseudocode to ask the user for a salesperson's name, then sequentially search the array for that person's name. If the name matches one in the list, display the index number. Otherwise, display a "not found" message.

Exercise

What if there is more than one sales person with the same last name? Modify your code to display the index values for all matching sales people (e.g. "Matches: 2 5"). If there are no matches at all, display "Matches: None".

Exercise

A floating-point array called grades[] contains grades for 25 courses a student took while at college. Write the pseudocode to figure out and display the largest grade and the smallest grade, without sorting the grades.

Sorting Arrays

In the previous session you learned how the bubble sort worked. We'll get to the logic for that in a moment, but first we need to look at one important piece: swapping.

Problem: You have two variables - num1 and num2. Write the statement(s) to swap the value from num1 into num2 and the value from num2 into num1.

What statements would you write? Is it sufficient to say:

Assign num2 to num1
Assign num1 to num2

What does this do? If we say that num1 = 5 and num2 = 3, the result of these two statements is:

The statement that assigns num2 to num1 wipes out num1's original value before we can assign it to num2. If we reversed these two statements, we would encounter a similar problem. We need to be able to hang onto one of the values before we overwrite it with another value.

It would be useful to make use of a third variable that could temporarily hold one of the values. We could assign the value of num1 to the variable, assign num2 to num1, and then assign the value of the variable to num2:

Assign num1 to tempValue
Assign num2 to num1
Assign tempValue to num2

This is the standard set of statements that swaps the values in two variables. This is going to be an important part of most sort algorithms.

The Bubble Sort Algorithm

Cool links to sort algorithms: Animated Bubble Sort, Another Animated Bubble Sort (press Init to start the applet, the press the play (>) button), Yet Another Animation (this one lets you watch the code, too, and so does this one). Compare the speed of various sort algorithms. If you're feeling adventurous, check out how to sort a two-dimensional array (we'll learn about 2-d arrays next week).

Recall a few things about the bubble sort from the previous session:

How would we determine if two adjacent elements need to be swapped? We'd have to check to see if the current element was larger or smaller (depending on how you're sorting) than the next element and, if necessary, swap them. If we sort a list of grades from lowest to highest, we might say:

If grades[currentElement] > grades[currentElement + 1] Then
	tempGrade = grades[currentElement]
	grades[currentElement] = grades[currentElement + 1]
	grades[currentElement + 1] = tempGrade

How would the code for one pass through the list look? We would have a loop that compared a set of elements, starting with 1/2, then 2/3, then 3/4, etc. until we reached the end of the list. Recall that the number of comparisons is the number of elements minus one, so as long as we had variables for these values, we might say:

numComparisons = numberElements - 1
currentElement = 1
While currentElement <= numComparisons
	If grades[currentElement] > grades[currentElement + 1] Then
		tempGrade = grades[currentElement]
		grades[currentElement] = grades[currentElement + 1]
		grades[currentElement + 1] = tempGrade
	Add 1 to currentElement

Trace this code using the following sample data:
grades[1] = 60
grades[2] = 85
grades[3] = 70
grades[4] = 68
grades[5] = 50

At the end, you should find that the order of the list is now 60, 75, 68, 50, 85.

We know from the previous session that it will take more than one pass to sort the entire list. We have to continue going through the array until we've completed each pass. The pseudocode we've written so far needs to be repeated for each pass. Therefore, we'll need an outer loop that executes the inner loop for each pass. This loop will start with the first pass and end at the last pass (numElements - 1). In addition, we want to make sure we don't waste time swapping the last elements that have already been placed in their proper positions: we have to ensure that this loop iterates one less time for each pass. We can do this by decrementing the numComparisons variable in each iteration of the outer loop:

passNumber = 1
numComparisons = numberElements - 1
While passNumber <= numberElements - 1
	currentElement = 1
	While currentElement <= numComparisons
		If grades[currentElement] > grades[currentElement + 1] Then
			tempGrade = grades[currentElement]
			grades[currentElement] = grades[currentElement + 1]
			grades[currentElement + 1] = tempGrade
		Add 1 to currentElement
	Subtract 1 from numComparisons
	Add 1 to passNumber

Starting over with the original list of grades, trace the algorithm to the very end to make sure it works.

Exercise

What if your array is sorted before you've completed all the passes? How can you tell if the array is sorted early? Modify the pseudocode - use a boolean flag to perform an "early exit" if the array is sorted early.

Parallel Arrays

Let's go back to the program we discussed in the beginning: For this one, record a list of final grades for a set of 6 courses, calculate the average grade, and then print how many grades were above the average. Try this out before checking the answer below.

float grades[1 to 6]
counter = 1
Do
	Print "Enter grade #", counter
	Get grades[counter]
	Add grades[counter] to totalGrade
	Add 1 to counter
	While counter <= 6
Calculate average as totalGrade / 6
Print "Average grade: ", average
counter = 1
numAbove = 0
Do
	If grades[counter] > average
		Add 1 to numAbove
	Add 1 to counter
	While counter <= 6
Print counter, " grades above average."

What if you also wanted to record the course codes, and display only those course codes where the final grade is above the average? You would have to record the course codes in a separate array. Then you'd have to ensure that the courseCodes array and the grades array are parallel. In other words, element 1 of courseCodes[] will contain the course code and element 1 of grades[] will contain the final grade for that course. This will also be true for element 2 of both arrays, and so on. These two arrays would be parallel.

Homework Exercise

Modify the previous example to record course codes and final grades for 6 courses. Display the average grade and the course codes for any courses whose grade is above the average.