Activity 1
Tell students that they will be exploring the relationship between different sets of data. “Suppose you surveyed 100 kids from ages 3 to 18 and for each of them, you recorded their age and their height. What kind of relationship would you expect to see between a person’s age and his/her height?” Students should recognize that as students get older, their height increases.
Sketch a quick scatter plot to represent fictional data about age and height. The scatter plot should show positive correlation, but should not be precisely linear. Depending on the class, you can ask students if the relationship will be linear, i.e., “Does a person grow the same amount each year?”
Label the scatter plot Positive Correlation and tell students a positive correlation can be thought of as “Whenever one of the quantities increases, the other quantity also increases.” Remind them that they don’t have to increase by the same amount—just they’re both increasing. “What are some other real-life examples of two things that might show a positive correlation?” Encourage students to brainstorm examples of a positive correlation. Some possibilities include outside temperature vs. air conditioning bill, study time vs. test scores, fat grams vs. calories, etc.)
Ask students to work in pairs and list two or three different pairs of data that would have a positive correlation. Remind students that the key is that when one of the quantities increases, the other one also increases. After each pair has a couple of ideas, have students list them on the board. Depending on class size, have each pair list one or more ideas. Ask students if there are any combinations on the board that are “stronger” than others. In other words, are there any pairs where when one quantity increases, the other quantity always increases, maybe even by a fixed amount? You can suggest, if nothing on the board meets this criteria, the relationship between the number of hours driven and the distance driven by someone driving 60 miles per hour. Explore the relationship between the quantities with a strong positive correlation and sketch another scatter plot to represent fictional data about these two quantities, making sure the data is more closely clumped together around an imaginary line with positive slope.
“This data definitely has a positive correlation, but we can also say it has a strong positive correlation. In other words, it is much clearer with this data that when one quantity increases, the other quantity also increases. For our example with age and height, it’s possible that when people age one year, their height barely increases, and sometimes their height increases by an average amount, and sometimes they grow incredibly fast. In general, there is still a positive correlation, but not as strong a correlation as in our second example.”
(Note: Students who have difficulty understanding how to interpret correlation often lose sight of the presence of both variables because they tend to think about correlation as one thing. Emphasize for them that there are two directions that each variable can move. Positive correlation means the second variable moves in a positive direction as the first variable moves in a positive direction. Positive correlation also requires that the second variable moves in a negative direction as the first variable moves in a negative direction. Highlight the qualitative rather than the quantitative characteristics before using real data.)
Activity 2
Repeat this activity with negative correlation, starting with an example such as the amount of money in people’s bank accounts and the length of vacation they go on. “As the length of their vacation increases, what happens to the amount of money in their bank account?” Students should note that it decreases. “Does it always decrease by the same amount?” Have students provide reasons for variance in the rate at which the amount of money would decrease (airline flights, changing hotels, eating at fancier restaurants, etc.)
“These two quantities have a negative correlation: when one quantity increases, like the length of their vacation, the other quantity decreases.” Sketch some fictional data on a scatter plot to represent a negative correlation. Make sure the data is not strongly correlated (it should have some spread to it). Have students work in pairs to come up with quantities with negative correlation and share their ideas as before. If there isn’t one that has a strong negative correlation, suggest the height of a candle and the length of time for which it has been burning, or, reversing the strong positive correlation idea, the distance a person is from his/her destination based on how long s/he has been driving at a constant rate. Sketch some fictional data for the situation with a strong negative correlation, and explain that this data has a strong negative correlation.
Activity 3
Now ask students about the relationship between a person’s height and the number of movies s/he has seen in the theater in the last year (or two other unrelated topics). Ask students to describe the relationship between the two quantities. They may come up with some ideas (for instance, an extremely tall person may be less likely to go to the theater because s/he doesn’t want to obstruct someone else’s view), but point out that in general, for 99% of the population, there is no real relationship between the data. Sketch a scatter plot that could represent this data and point out that there is no pattern. Make the data cluster around particular values of average height and a reasonable number of movies to have seen in the last year. Also, include an outlier or two to represent extremely tall people that don’t visit the theater. “Because one quantity doesn’t seem to increase or decrease based on the other quantity, we say this data has no correlation.” Circle the cluster of data and tell students, “This data also appears to cluster around a particular area, indicating that most of the people represented by this scatter plot are around the same height and went to the movies around the same number of times. This is an example of clustering: when lots of data points are all grouped together in the same area. What about these other data points?” Indicate the outliers. “What do they represent?”
Even though students may not use the term “outliers,” they should recognize that they “lie” on the outer edges of the data represented by the scatter plot and are far from the cluster of data.
Activity 4
“In mathematics, we generally want to use numbers to quantify or explain relationships. It’s okay to say that a set of data shows a positive correlation, for example, but we like to have a way to quantify that correlation, to assign a number to it.”
Have students work in groups with the applet at http://illuminations.nctm.org/ActivityDetail.aspx?ID=146.
(This activity can also be done with a graphing calculator, but the applet makes it much easier to plot points and move them (by checking “Move Points”) and watch the results update in real time.)
Some groups should plot data that has a positive correlation, some that has a negative correlation, and some that has no correlation at all. As students plot the data, have them click on the “Computer Fit” check box and note the value of r that the Web site returns.
“What happens to the value of r as your positive correlation becomes stronger and stronger?” (The value of r increases.) “What is the greatest value of r you can achieve?” Students should be able to get to 0.9 quite easily, but it’s doubtful they will get to exactly 1.0. Tell students, “The greatest possible correlation is 1. When the correlation is 1, what do you think the data looks like?” Based on the fact that the Web site applet also draws a line of best fit, guide students toward the recognition that data that all falls on the same line has a correlation of 1.
Repeat the same line of questions about negative correlations, noting that the smallest possible correlation is −1.
“What do you think the value of r is for data that is not correlated at all?” Students should guess that it is 0.
“The value of r is called the correlation coefficient and is a numerical measure of the correlation of our data. It’s difficult to compute by hand, but calculators and spreadsheet programs like Excel can compute it quite easily (as can this Web site).”
Have students work in small groups to develop a study in which they determine two quantities they want to measure; design a plan; collect, analyze, and interpret data; and make predictions. Students will present their findings and visual representations to the class. Discussion of the relationship between the variables and correlation coefficient must be included. Students must connect the findings to the context of the study and thus make real-world interpretations.
An optional quiz on scatter plots is available at the following Web site. The quiz may be used to assess student understanding of the topic of scatter plots:
http://www.regentsprep.org/Regents/math/ALGEBRA/AD4/PracPlot.htm
Extension:
- Routine: Asking students to create and develop their own data based on their individual interests is a motivating force. Topics such as current movie box office sales, top-ten lists of video game, possible music downloads, and sports records all have large data troves that are relatively easy to search. Use partner grouping to help students understand the relationship between variables, line of best fit, and correlation coefficient.
Another way to review student understanding of correlations in scatter plots is to give the online quiz at the following Web site:
http://www.mathopolis.com/questions/q.php?id=3072&site=1&ref=/data/scatter-xy-plots.html&qs=3072_3073_3074_3075_3772_3773_3774_3775_3776
- Small Group: Assign groups to develop and write lists of data categories they believe will have strong correlations. For example, student grade level and average shoe size, or daily high temperature and ice cream sales. For students with more skill in representing correlations, assign writing lists where negative correlation is more likely. For example, average daily temperatures and winter clothing sales.
- Expansion: Have students do additional experimentation with the applet at
http://illuminations.nctm.org/ActivityDetail.aspx?ID=146. Then ask students to form some general conclusions about strongly correlated data and weakly correlated data.