Analogical Reasoning With Examples Facilitates Inferring General Principles in Spite of the Risk of Negative Transfer

Analogical Feature Matching Predicts Ability to Apply a General Principle to New Surface Features

Robert S. Ryan

Kutztown University

Jonathan W. Schooler

University of Pittsburgh

Paper presented at the annual meeting of the American Psychological Society, New Orleans, June 6 - 9, 2002.

Correspondence:

Robert S. Ryan

385 Old Main

Kutztown, University

Kutztown, PA 19530

rryan@kutztown.edu

Abstract

Participants solved or matched features of word problems that required different procedures. Solvers’ success in practice predicted performance on new-surface-feature test problems if the goal was the same as in practice. Feature matchers’ success in practice was similarly predictive, but even if the goal was new. This suggests that solving improved an abstract representation of a procedure, whereas analogical mapping improved an abstract representation of an implied general principle.

Analogical Feature Matching Predicts Ability to Apply a General Principle to New Surface Features

People often try to use a previously learned problem as an analogy to an unfamiliar problem. However, there are different reasons why a new problem could appear unfamiliar. One possibility is that it could have new surface features. For example, high school algebra students may have learned how to solve a mixture problem that involves combining different amounts of solutions that have different concentrations of acid in them. Then they might encounter a speed and distance problem that involves averaging two different speeds traveled for different amounts of time. If the unknown in the mixture problem were the concentration of acid in the combined mixture and the unknown in the speed and distance problem were the average speed, then the solution procedures for the two problems would actually be the same. However, people who have learned how to correctly apply a set of operations to one set of surface features typically do not spontaneously recognize how to correctly apply them to a new set (Gick & Holyoak, 1980; Gick & Holyoak, 1983; Reed, 1984; Reed, Ernst, & Banerji, 1974).

A second reason why a new problem could appear unfamiliar is that it could have a different goal. Two problems could each describe a situation in which there were the same kinds of problem elements related in the same way, but they could have different problem elements as the unknown. For example, a mixture problem like the one described above could have the concentration of acid in the combined mixture as its unknown, whereas another mixture problem could describe the same situation and provide the concentration of acid in the combined mixture, but leave something else as the unknown. For example the unknown could be the concentration of acid in one of the initial solutions, or even one of the amounts of solution. The procedure for solving for the old unknown would not apply directly to solving for the new one. However, the old solution procedure could be derived from the general principle of weighted averaging. If a person could infer that this was the underlying principle for solving the first problem, they might be able to use it to generate a modification of the procedure that could be used to solve the second problem. On the other hand, if the second problem had not only a new goal, but also different surface features, then the problem solver would need to understand the newly generated procedure well enough in an abstract sense to be able to apply it to the structure of the new problem in spite of the new surface features. Thus, each of these kinds of differences between an old and a new problem could present its own kind of challenge for a person trying to use the old problem as an analogy for the new one.

Gick and Holyoak (1980; 1983) found that training in how to solve two analogous example problems, and also comparing them, improved the spontaneous recognition of the analogy to a third problem. Other studies (e.g., Cummins, 1992) confirmed that it was specifically the intraproblem comparison that improved the recognition of a structure in spite of differences in surface features. However, the goal of the test problems in those studies differed from the goal in the training examples only in terms of surface features. Therefore, the previous studies did not make clear whether either type of training (i.e., to solve or to compare) affected people’s ability to apply a procedure to new surface features when that procedure was newly derived from a general principle, rather than just the trained procedure.

Hypothesis

In order to examine this question we used algebra word problems that could be represented not just in terms of either surface features or an abstract solution procedure, but also in terms of a more general principle. We predict that, on the one hand, training in only solving will affect people’s understanding of the applicability of the trained solution procedure, but not their understanding of the applicability of a new procedure derived from the general principle. On the other hand, training in both solving and comparing will affect people’s understanding of the applicability of both.

We propose the following theoretical framework to explain the effects of solving only versus solving plus comparing on people’s concrete and abstract representations of a procedure and a general principle. The problems’ procedures can be represented concretely (i.e., in terms of their surface features) or abstractly (in terms of amounts and ratios). The general principle of weighted averaging, however, is more likely to be represented abstractly. The surface features that the participant encounters at test will affect whether a concrete or an abstract representation is activated. If the surface features of the test problem are the same as the training problems, then a concrete representation will be activated, which will most likely be a representation of the procedure. If the surface features are different, then an abstract representation will be activated, which could be either the trained procedure or the general principle.

The goal that the participant encounters at test, however, will directly affect whether the trained procedure or the general principle is activated. If the goal is the same, then the trained procedure (either in a concrete or an abstract form) will be activated. If the goal is different, then the general principle will be activated. We also predict that participants who practice solving-only will be more likely to respond first to the surface features of the test problems, whereas those who both solve and compare will be more likely to respond first to the goal. Finally, we propose that solving-only in training will improve people’s ability to apply the procedure to new surface features provided it is the procedure in which they were trained. On the other hand, matching features as well as solving in training will improve people’s ability to apply either the trained procedure or a procedure generated from the general principle to a test problem with new surface features.

Method

Participants

The participants were 157 psychology undergraduate students at the University of Pittsburgh who participated as part of the requirements of their Introductory Psychology course. The data from 7 participants were not included because of procedural errors, leaving 150 participants' data in the analysis.

Materials

The materials consisted of problem pairs for training and problems for pretest and posttest (See the Appendix). Both the training problems and the test problems were constructed so that it would be possible for them to be either different from one another, or the same, in terms of both their surface features and their goals. The members of the training pairs always had different surface features from one another, but they sometimes had the same goal and sometimes had different goals. Thus, when subjects compared training problems, they were focused specifically on whether the goals were the same or different. The two types of problems used in training (i.e., the two types of surface features) were mixture and distance problems. The two goals used in training were to find either the combined ratio (a final concentration of a substance in the mixture, or a final speed) or one of the initial ratios.

There were four kinds of test problems that were determined by how the test problem differed or did not differ from the training problems. One kind of test problem was called old-type, that is, it had the same surface features, as one of the types used as training problems. It was also an old-goal problem, that is, it required finding the same unknown as one of the unknowns used in the training problems. An old-type test problem was always a mixture problem, and an old-goal test problem always required finding the final average ratio. Solving this test problem would require knowledge of the procedure, but the knowledge would not have to be transferred to new surface features. A second kind of test problem was called a new-type problem, but it was still an old-goal problem. The new type was always of a type that we called a group average problem. It was about the average value on some characteristic for two different sized groups of people. Solving this problem would require transferring knowledge of the procedure for a mixture or distance problem to the new surface features. A third kind of test problem was an old-type problem, but it was a new-goal problem. That is, it had the relatively difficult goal of finding one of the amounts (the quantity of mixture, or a time traveled), rather than a ratio. Solving this problem would require knowledge of the general principle (i.e., the equation for weighted averaging) in order to generate a new procedure, but the new procedure would not have to be transferred to new surface features. The fourth kind of test problem was a new-type, new-goal problem. Solving this problem would require knowledge of the general equation, and the newly generated procedure derived from that equation would have to be transferred to new surface features.

Procedure

The procedure consisted of a pretest, a three part training session, and then a posttest. In the pretest the participants were presented with one each of the four kinds of test problems in a random order. They were allowed three minutes to work on each problem. Which of two equivalent tests was used as the pretest or the posttest was counterbalanced.

The three-part training procedure consisted of pairs of problems used as worked examples, guided practice, and unguided practice. There were three such training conditions, and the tasks during training differed depending upon the participants’ condition. All participants were trained in how to solve the examples. However, the first two of the conditions also included other tasks.

In the first condition (N = 50), called the match features condition, the problems were presented explicitly as pairs. The participants matched those elements from the members of the pairs that corresponded in terms of being either initial ratios (or their associated amounts), or being the final, averaged ratio (or its associated amount). They did this by simply writing a list of pairs of elements. Thus the match-features participants were mapping the members of the training pairs together as one would do when forming an analogy. In the second condition (N = 50), called the explain steps condition, the training problems were presented sequentially rather than explicitly as being in pairs. The participants wrote an explanation for why each step of the problem was necessary as they performed each step during solving. In the third condition (N = 50), called the solve only condition, the participants solved individually presented training problems. In this paper, however, we are interested in the differential effects of solving alone versus solving with comparing. Therefore, we will only present the differences between the first and third conditions.

The two pairs of training problems shown in the Appendix were used as worked examples. All participants received the same two worked examples in the same order. The first was a same-pair in which finding the final ratio was the goal in both. The second was a different-pair in which the goal of the first member was to find the final ratio and the goal of the second member was to find the initial ratio. The experimenter allowed the participants to read through the worked examples one at a time. The experimenter also gave a brief oral explanation of each one after the participants read them and answered any questions to make sure the participants understood the examples. This procedure usually took 10 to 15 minutes.

Two more training pairs were used for guided practice. All participants received the same two training pairs in the same order. The first pair was a same-pair in which finding the initial ratio was the goal. The second pair was a different-pair in which the goal of the first member was to find the initial ratio and the goal of the second member was to find the final ratio. The guided practice was untimed. The experimenter guided the participants until it appeared they had all reached the correct solution. This usually took about 10 minutes. In case any participant had not solved the problem correctly, the experimenter eventually told the participants the correct answer. In the match features condition, the experimenter also guided the subjects through the matching task and made sure that they had done it correctly.

After the guided practice, the participants spent 15 minutes in unguided practice on more pairs of problems. The pairs were a random mixture of same-pairs and different-pairs with the constraint that there were different types of pairs within the first three pairs. The match-features participants used these pairs of problems to practice doing the matching task as well as solving, whereas the solve-only participants used them only to practice solving. The procedure concluded with the posttest. The same procedure was used on the posttest as was used on the pretest.

Design and Analysis

We analyzed the performance results on the pretest and posttest for each test problem separately. The analysis had a 2 (time of test) by 3 (match-features, explain-steps, solve-only) design. The time of test factor was within subjects, whereas the training condition factor was between subjects. This allowed us to calculate a MSE for each test problem to use for contrasts. We were interested in two sets of contrasts, but only involving the match-features and the solve-only conditions. First, we were interested in contrasts between pretest and posttest within individual conditions to show whether either type of training resulted in improved performance on each problem. Second, we were interested in the interaction between time of test and condition (i.e., the match-features versus the solve-only conditions). This allowed us to assess whether either of these training conditions was superior to the other in producing such improvement. Finally, we measured success at the solving and matching training tasks. This allowed us to compute correlations between success at each training task and success at each of the posttest problems.

Results

Each participant was given a score of one point for a correct solution for each problem or a zero for a failure to solve the problem. For each problem, an analysis of variance was conducted on those scores with time of test as a within subjects factor and training condition as a between subjects factor.

Effects of Training on Improvements in Test Problem Performance

Table 1 shows the amount of improvement from pretest to posttest on each problem as a function of type of training. The F statistics are for individual contrasts between the pretest and posttest scores for each condition.

Table 1.

Proportion Correct on the Pretest and Posttest, Gain, and Results of Pre/Post Contrasts for Each Test Problem as a Function of Training Condition.

______________________________________________________________________________________

Time of Test Pre/Post Contrast

Problem _________________________ _________________________

Condition Pretest Posttest Gain F(1,147) p

______________________________________________________________________________________

Old-Type/Old Goal

Match Features 44 88 44 33.29 <.00001

Explain Steps 22 76 54 50.15 <.00001

Solve Only 46 86 40 27.52 <.00001

New-Type/Old Goal

Match Features 48 68 20 5.42 =.021

Explain Steps 36 60 24 7.81 =.006

Solve Only 28 52 24 7.81 =.006

Old-Type/New Goal

Match Features 14 26 12 6.54 =.012

Explain Steps 6 2 -4 .727 N.S.

Solve Only 6 4 -2 .182 N.S.

New-Type/New Goal

Match Features 14 32 18 13.38 =.00035

Explain Steps 4 8 4 .661 N.S.

Solve Only 8 18 10 4.13 =.044

_______________________________________________________________________________________________________

Matching features resulted in relatively large improvements on all of the test problems, whereas solving resulted in similarly large improvements only on the old-goal problems. The solvers achieved a smaller, but significant, improvement on the new-type, new-goal problem.

Comparison of Improvements Between Conditions

The analysis of variance for the old-type, old-goal problem indicated that the pretest to posttest gain of the match-features participants was not significantly greater than that of the solvers, F(1, 147) < 1. The same was true for the new-type, old-goal problem, F(1, 147) < 1. For the old-type, new-goal problem, however, the gain of the feature matchers was significantly greater than that of the solvers, F(1, 147) = 4.45, p = .037, MSE = .055, although this was not true on the new-type, new-goal problems, F(1,147) = 1.32, p > .05, MSE = .061.

Correlations Between Success in Training and Test Problem Performance

Success in training was measured by calculating the proportion of correct items out of the number attempted during the 15 minutes of unguided practice. In the solve-only condition the items were the problems correctly solved. In the match-features condition the items were the problem elements correctly matched. The measure of success on the test problems was the posttest scores for each problem. Table 2 shows the correlations between the measures.

Table 2.

Correlations Between Success in Training and Success on Test Problems.

_____________________________________________________________________________________

Success in Training

______________________________

Success on the Match Features Solve Only

Test Problems

_____________________________________________________________________________________

Old-Type/Old-Goal .1399 (p=.3324) .1497 (p=.3047)

New-Type/Old-Goal .4128 (p=.0029) .3407 (p=.0166)

Old-Type/New-Goal .1878 (p=.1915) .1249 (p=.3924)

New-Type/New-Goal .3522 (p=.0121) .1815 (p=.2150)

_____________________________________________________________________________________

Neither success at solving-only nor success at matching-features (in addition to solving) during training was related to applying a solution procedure to a test problem with surface features that were the same as the training problems (i.e., old-type problems). This was true regardless of whether the solution procedure being applied was the trained procedure (i.e., old-goal) or a new one that was generated from the underlying general principle (i.e., new-goal).

On the other hand, the new-type problems required applying a procedure to new surface features. For the new-type, old-goal problem, the procedure to be applied was the trained procedure. Both success at solving and matching features predicted ability to successfully apply the trained procedure to a problem with new surface features. However, for the new-type, new-goal problem, the procedure to be applied was the new procedure that had to be generated from the general principle. Success at solving did not predict ability to apply this procedure to new surface features, just as it had not predicted ability to apply the trained procedure. However, success at matching features did predict ability to apply the newly generated procedure to a test problem with new surface features.

Discussion

The results suggest the following interpretation regarding each test problem. For the same goal, same type problem, the solvers respond to the surface features first. Because they are the same as the training problems, the solvers activate a concrete representation, which is their representation of the procedure. That is sufficiently useful for this test problem, and so they succeed. The feature matchers respond to the goal first. Because it is the same, they activate both of their representations of the procedure. Because the surface features are the same, they choose the concrete procedure, and, like the solvers, they succeed. There is no difference in the average quality of their concrete representations, therefore, there is no difference in gains on this problem between the solvers and the feature matchers.

The improvement from pretest to posttest for both groups suggests that both types of training improved the participants’ concrete representations of the trained procedure. However, the lack of a correlation for either group between their success in training and their success on this problem suggests that one does not have to be especially good at the training task to benefit from it.

For a same goal, different type problem, the solvers again respond to the surface features first. Because this time they are different, they activate their abstract representation of the procedure. That is what is needed due to the change in surface features, therefore to the extent they have a good abstract representation, they succeed. Feature matchers respond to the goal first. Because it is the same, they activate both of their representations of the procedure. Because the surface features are different, they choose their abstract rather than their concrete version of the procedure. That is what is needed, therefore, as with the solvers, to the extent they have a good abstract representation, they succeed. There is no difference in the average quality of their abstract representations, therefore, there is no difference in gains on this problem between the solvers and the feature matchers.

These improvements suggest that both types of training must improve the participants’ abstract representations of the trained procedure. Unlike the concrete representation, however, the improvement in the abstract representation does appear to be related to being more proficient at the training tasks because success at them was correlated with test problem performance.

For a different goal, same type problem, the solvers respond to the surface features first. Since they are the same, they activate their concrete representation, which is their representation of the procedure. Once they have activated a concrete representation, they cannot choose between a procedure and a general principle because there is no concrete general principle. Because they do not activate the general principle, they cannot use it to generate a new procedure, and therefore they fail. As a matter of fact, even some of the few participants who could solve this problem on the pretest are now even less likely to activate the general principle. Therefore, there is even a slight decrease in the average success rate from pretest to posttest. The feature matchers on the other hand are more likely to respond to the goal first. Since it is different, they activate their representation of the general principle. To the extent that they have a good representation of the general principle, they succeed at the problem. Because the training

helps them to develop at least a somewhat better representation of the general principle, there is a modest, but statistically significant, gain in success on this problem from pretest to posttest. This results in a slightly superior gain in performance on this problem for the feature matchers, compared to the slight loss on the part of the solvers. Therefore, the difference in gain scores between the feature matchers and the solvers is also significant.

The improvement from pretest to posttest for the feature matchers suggests that such training improves their representation of the general principle. The evidence from this problem does not make clear whether or not the training improved the representation of the general principle for the solvers because the framework we propose explains their failure as being due to not activating such a representation. In any event, whatever improvements occur do not appear to be related to proficiency at the training tasks because of the lack of correlation between training and test performance.

For a different goal, different type problem, the solvers respond to the surface features first. Since they are different, they activate their abstract representation, which is their representation of the general principle. To the extent that they have a good representation of the general principle, they succeed at the problem. Because the training helps them to develop at least a better representation of the general principle, there is a significant gain in performance on this problem from pretest to posttest. The feature matchers respond to the goal first. To the extent that they have a good representation of the general principle, they succeed at the problem. Because the training helps them to develop at least a better representation of the general principle, they also achieve a significant gain in performance on this problem from pretest to posttest. The fact that the gain is greater here than for the previous problem suggests that perhaps when the surface features are the same, the feature matchers have some tendency (although to a lesser extent than the solvers) to activate the procedure instead of the principle. Even though the feature matchers achieved a larger gain than the solvers, since the solvers also achieved a significant gain, the difference was not large enough to reach significance. Therefore, there was no condition by testtime interaction.

Unlike for the previous problem, the results for this problem do support the claim that both types of training resulted in an improved representation of the general principle. The evidence from this problem differs from that from the previous problem in another way as well. The evidence from the previous problem suggested that the improvement in the representation of the general principle was not related to proficiency at the training tasks. Yet success at one of the training tasks, matching features, was significantly correlated with performance on this problem. Although this problem, like the previous problem, required using the general principle, unlike the previous problem, it also requires applying the newly generated procedure to new surface features. Therefore, the correlations suggest that proficiency at matching features, but not at solving, is related to being able to make the best use of the procedure generated from the general principle.

Conclusion

Participants who were more proficient at the matching task in training may have had a clearer understanding of the weighted averaging principle, whereas those who were more proficient at solving may simply have been better at the arithmetic operations. This may help to explain why better feature matchers were also better at applying the newly generated procedure to the new-type, new-goal problems. For the solvers, the training improved their ability to generate the new procedure. However, being better at the arithmetic operations does not guarantee being able to recognize the applicability of the new procedure to a problem with new surface features. Applying the newly generated procedure to the new goal problem also requires a better abstract understanding of the structure of the test problem. Therefore the correlation between success at feature-matching in training and success at the new-goal, new-type problem at test may be due to better feature matchers having that abstract understanding.

References

Cummins, D. D. (1992). Role of analogical reasoning in the induction of problem categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 1103 - 1124.

Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306355.

Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1-38.

Reed, S. K. (1984). Estimating answers to algebra word problems. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 778-790.

Reed, S. K., Ernst, G. W., & Banerji, R. (1974). The role of analogy in transfer between similar problem states. Cognitive Psychology, 6, 436-450.

Appendix

Examples of Pairs of Training Problems With Their Equations, and Examples of Test Problems With Their Equations

A = Amount. In these problems the amounts can be amounts of mixtures or amounts of time. The amounts always serve as the weights in the formula.

R = Ratio. In these problems the ratios can be percentages that indicate the concentration of some substance in the mixtures, or they can be speeds.

Subscripts: 1 and 2 indicate the two initial amounts or their associated ratios. C indicates the combined amount or the ratio associated with it.

General equation for weighted averaging:

A₁*R₁+A₂*R₂=(A₁+A₂)*R_C(1)

Pair 1: Training Problems That Have the Same Solution Procedure Along With Their Equations

Problem A: Mixture problem - Goal: Find combined ratio. A grocery store sells rice that is a mixture of white rice and brown rice. They have 150 lbs. of mixed rice that is 60% brown rice (in other words a proportion of .60). If they combine it with 100 lbs. of mixed rice that is 10% brown rice (a proportion of .10), then what is the resulting percentage (that is, proportion times 100) of brown rice in the whole 250 lbs. of mixed rice?

A₁*R₁+A₂*R₂=(A₁+A₂)*X(2)

Problem B: Distance problem - Goal: Find combined ratio. Two airplanes leave from the same city at the same time heading for the same destination. The first airplane flies for 2 hours at 150 mph. Then it encounters engine trouble and slows down to 100 mph. It flies for 8 more hours at 100 mph. The second airplane arrived at the destination at the same time as the first plane, but it flew at the same speed for the full 10 hours. How fast was the second airplane flying?

A₁*R₁+A₂*R₂=(A₁+A₂)*X(3)

Pair 2: Training Problems That Have Different Solution Procedures Along With Their Equations

Problem A: Mixture problem - Goal: Find combined ratio. A dairy farmer mixed 1 quart of milk that was 2 % fat (in other words a proportion of .02) with 3 quarts of milk that was 5% fat (a proportion of .05). What was the percentage (that is, proportion times 100) of fat in the whole 4 quarts of milk?

A₁*R₁+A₂*R₂=(A₁+A₂)*X(4)

Problem B: Distance problem - Goal: Find initial ratio. A college student, Bill, and his girl friend, Hillary, attend two different colleges. They have agreed to meet at a location that is exactly half way between them. Bill and Hillary began driving to the meeting place at exactly the same time. Hillary, who always drives at 75 mph. will arrive in 8 hrs. Bill begins by traveling at 80 mph for the first 6 hrs., but he needs to slow down for the last 2 hrs. of the trip because he wants to arrive at the same time as Hillary. At what speed should Bill drive for the last 2 hours?

A₁*R₁+A₂*X=(A₁+A₂)*R_C(5)

Appendix (cont.)

Old-Type / Old-Goal. A chemist combines 5 qts. of a 40% acid solution with 15 qts. of a 20% acid solution. What is the resulting % of acid of the whole 20 qts. of solution?

A₁*R₁+A₂*R₂=(A₁+A₂)*X(6)

New-Type / Old-Goal. In a small town with a population of 25,000 people, there are 5,000 long time residents, and their average income is $40,000.00 per year. The other 20,000 people are newcomers, and their average income is $45,000.00 per year. What is the average yearly income of the entire town?

A₁*R₁+A₂*R₂=(A₁+A₂)*X(7)

Old-Type / New-Goal. A wine company has 40 gallons of wine that is 25% alcohol. They need to combine it with some wine that is 4% alcohol, so that the resulting wine will be 10% alcohol. How much of the 4% alcohol wine should be added to the original 40 gallons of wine?

A₁*R₁+X*R₂=(A₁+X)* R_C(8)

New-Type / New-Goal. In an experiment on the effects of smoking on heart rate, the average heart rate for 35 male subjects in the experimental group was 78 beats per minute. The average heart rate for the female subjects in the experimental group was 76. The average heart rate for the entire experimental group was 76.5. How many female subjects were there?

A₁*R₁+X*R₂=(A₁+X)*R_C(9)