1
Implications of the ACTâR Learning Theory: No Magic Bullets
John R. Anderson
Christian D. Schunn
Carnegie Mellon University
From Ebbinghaus onward, psychology has seen an enormous amount of research invested in the study of learning and memory. This research has produced a steady stream of results and, with a few mini-revolutions along the way, a steady increase in the understanding of how knowledge is acquired, retained, retrieved, and utilized. Throughout this history, there has been a concern with the relation of this research to educational applications. Anderson (2000a, 2000b) summarized some of the research and made efforts to identify the implications of this research for education. However, he left both textbooks feeling very dissatisfiedâthat the intricacy of research and theory on the psychological side was not showing through in the intricacy of educational application. Psychology offers many claims of relevance of cognitive psychology research for education. However, these claims are loose and vague and contrast sharply with the crisp theory and results that exist in the field.
To be able to rigorously understand the implications of cognitive psychology research, one needs a rigorous theory that bridges the gap between the detail of the laboratory experiment and the scale of the educational enterprise. This chapter is based on the ACTâR theory (Anderson, 1993, 1996), which has been able to explain learning in basic psychology experiments and in a number of educational domains. ACTâR has been advertised as a âsimple theory of learning and cognition.â It proposes that complex cognition is composed of relatively simple knowledge units that are required according to relatively simple principles. Human cognition is complex and reflects the complex composition of basic elements and principles, just as a computer can produce complex aggregate behavior from simple computing elements.
The ACTâR perspective places a premium on practice that is required to learn permanently components of the desired competence. To learn a complex competence, the ACTâR theory claims that each component of that competence must be mastered. It is a sharp contrast to many educational claims, supposedly based in cognitive research, that there are moments of insight or transformations when whole knowledge structures become reorganized or learned. In contrast, ACTâR implies that there is no âfree lunchâ and each piece of knowledge requires its own due learning. Given the prevalence of the âfree-lunchâ myth, this chapter endeavors to show that it is not true empirically and explains why it cannot be true within the ACTâR theory.
This chapter describes the ACTâR theory and its learning principles. In light of this theory, this chapter identifies what the authors think are the important implications of psychological research for education. This chapter also addresses why much of the research on learning and memory falls short of significant educational application, devoting special attention to the issues of insight, learning with understanding, and transfer, which are part of the free-lunch myth. Finally, we describe how we have tried to bring the lessons of this analysis to bear in the design of our cognitive tutors (Anderson, Boyle, Corbett, & Lewis, 1990; Anderson, Corbett, Koedinger, & Pelletier, 1995).
THE ACTâR THEORY
The ACTâR theory admits three basic binary distinctions. First, there is a distinction between two types of knowledgeâdeclarative knowledge of facts and procedural knowledge of how to do various cognitive tasks. Second, there is the distinction between the performance assumptions about how ACTâR deploys what it knows to solve a task and the learning assumptions about how it acquires new knowledge. Third, there is a distinction between the symbolic level in ACTâR that involves discrete knowledge structures and a subsymbolic level that involves neural-like, activation-based processes that determine the availability of these symbolic structures. We will first describe ACTâR at the symbolic level. The symbolic-level analysis of the knowledge structures in a domain basically corresponds to a task analysis of what needs to be learned in that domain. However, as is seen here, the availability of these symbolic structures critically depends on the subsymbolic processes.
Declarative and Procedural Knowledge
Declarative knowledge reflects the factual information that a person knows and can report. According to ACTâR, declarative knowledge is represented as a network of small units of primitive knowledge called chunks. Figure 1.1 is a graphical display of a chunk encoding the addition fact that 3 + 4 = 7 and some of its surrounding facts; these are some of the many facts that a child might have involving these numbers. Frequently, one encounters the question, âWhat does it mean to understand 3 or to understand numbers in general?â The answer in ACTâR is quite definite on this matter: Understanding involves a large number of declarative chunks such as those in Fig. 1.1 as well as a large number of procedural units that determine how this knowledge is used. According to the ACTâR theory, understanding requires nothing more or less than such a set of knowledge units. Understanding a concept results when one has enough knowledge about the concept to flexibly solve significant problems involving it.
FIG. 1.1. A graphical display of a chunk encoding the addition fact 3 + 4 = 7.
Procedural knowledge, such as mathematical problem-solving skill, is represented by a large number of rule-like units called productions. Production rules are condition-action units that respond to various problem-solving conditions with specific cognitive actions. The steps of thought in a production system correspond to a sequence of such condition-action rules that execute or (in the terminology of production systems) fire. Production rules in ACTâR specify, in their condition, the existence of specific goals and often involve the creation of subgoals. For instance, suppose a child was at the point illustrated here in the solution of a multicolumn addition problem:
Focused on the tens column, the following production rule might apply in the ACTâR simulation of multicolumn addition (Anderson, 1993):
IF the goal is to add n1 and n2 in a column and n1 + n2 = n3
THEN set as a subgoal to write n3 in that column
This production rule specifies in its condition the goal of working on the tens column and involves a retrieval of a declarative chunk like the 3 + 4 = 7 fact in Fig. 1.1. In its action, it creates a subgoal to write out the digit that might involve things like processing a carry. It is many procedural rules like this, along with the chunks, that in total, produce what we recognize as competence in a domain such as mathematics.
Goal Structures
It might seem that these chunks and productions are all separate, disjointed pieces of knowledge and that there is nothing in the ACTâR theory to produce the overall organization and structure in cognition. However, this ignores the contribution of the goal structure. Each task is decomposed into a sequence of subgoals, which in turn, may be decomposed into a sequence of subgoals. ACTâR maintains a stack of goals, onto which subgoals are added, and are still remembered once the subgoals are achieved. Only the most recently added subgoal is used to select productions at any one point in time; once it is achieved, it is removed from the goal stack. This hierarchical organization of subgoals and limited focus of processing imposes a strong order on the way in which knowledge is accessed and skills are applied. In multicolumn addition, for instance, there is a goal structure that organizes the overall addition into specific column additions and processing carries. This produces an overall algorithmic-like process to solving multicolumn addition. Of course, some tasks may have multiple possible goal structures and so, permit for more variable behavior.
A simple example from education for which goal structures have played a prominent role is the case of multicolumn subtraction. As it is typically taught in America, multicolumn subtraction involves a subgoal of coordinating borrowing, especially from zero (Van Lehn, 1990). Learning problems often occur because these goal structures are not particularly obvious. Many of the bugs in multicolumn subtraction are related to mastering the borrowing subgoal. For instance, when a child converts a 3 to a 13, but does not debit the next column, the child is failing to recognize the increment operator as part of the borrowing subgoal.
Learning Symbolic Structures
The important educational question concerns how these declarative and procedural units are learned. The ACTâR analysis of their acquisition is relatively straightforward. There are two ways in which declarative chunks can be acquired. The first way is encoding information from the environment. For example, a child might encode the fact 3 + 4 = 7 from reading an addition table. The second way is the storage of the results of past goals. For example, at some point in time, a child might have had the goal of finding the sum of 3 and 4 and solved this by counting. The result of this counting process, the sum 7, would be stored with the goal chunk. Thus, the addition fact in Fig. 1.1 could simply be a stored goal. This process of caching the results of past mental computations into chunks that can then be retrieved plays a major role in Loganâs (1988) theory of skill acquisition. He accumulated a significant amount of data showing that this process is important in the development of expertise.
Thus, ACTâR holds that declarative knowledge can be acquired in a passive, receptive mode (encoding from the environment) or in an active, constructive mode (storing the result of past mental computations). The two modes of knowledge acquisition offer different advantages and disadvantages. Passive reception has the advantage of efficiency and accuracy. It is easier to read the sum of 3 + 4 than to calculate it, and there is not the danger of miscalculation. On the other hand, if one practices generating the knowledge, one is practicing a back-up strategy useful for when retrieval fails. However, according to ACTâR there is no inherent difference in the memorability of the two types of knowledge. There has been a fair amount of experimental work in memory on what is called the generation effect, which is concerned with the supposed advantage of self-generated material (e.g., Bums, 1992; Hirshman & Bjork, 1988; Slamecka & Graf, 1978; Slamecka & Katsaiti, 1987). The generation effect is actually somewhat elusive and not always obtained. When it does occur, it seems related to redundancy of encoding. That is, generating knowledge for oneself is helpful only if the generation process produces multiple ways to retrieve the material. There are no magical properties conveyed on a knowledge structure just because it was self-generated. If all things were equal, it would be preferable to have children learn by generating the knowledge (due to the redundant encoding). However, because of difficulties of generation and dangers of misgeneration, things are not always equal, and it can be preferable to tell the knowledge.
For procedural knowledge, production rules are learned in ACTâR by a process we call analogy. For analogy to work, two things have to happen. First, ACTâR must come on a situation in which it wants to solve a goal. In the case of the previously mentioned production rule for addition, the learner would come to a goal of wanting to perform multicolumn addition and be focused on adding two numbers in a column. Second, the learner needs an example of the solution of such a goal. So, there might be an example of solving 4 + 5 in some column. In this situation, the ACTâR analogy mechanism tries to abstract the principle in the example and form a production rule embodying this principle that can be applied in the current situation. Once formed, this production rule is then available to apply in other situations. ACTâRâs theory of procedural learning claims that procedural skills are acquired by making references to past problem solutions, while actively trying to solve new problems. Thus, it is both a theory of learning by doing and a theory of learning by example.
Simply providing the learner with examples is not sufficient to guarantee learning in the ACTâR theory. The sufficiency of the production rules acquired depends on the understanding of the example. Example understanding can influence learning in two ways. First, it can influence which examples are retrieved for analogizing. When presented with a goal that cannot be solved with existing productions, ACTâR looks for previous examples that it has encountered involving similar goals. Obviously, the way it represents the previous examples and the current goal will affect which examples are retrieved. For instance, if the goal of solving one problem (e.g., solving algebra problems in class) is seen as very different from the goal of solving another problem (e.g., evaluating phone company rates), then the relevant example and accompanying solution procedure will not be retrieved. Second, example understanding influences the productions that are acquired by analogy to a given example. In the case of multicolumn subtraction, one could understand an example involving a column subtraction of 8 â 3 = 5 as either subtracting the top from the bottom number or as subtracting the smaller from the bigger number. The former understanding produces the correct production rule (always subtract the top number from the bottom number), whereas the latter understanding produces a buggy rule (always subtract the smaller from the larger). Similarly, Pirolli and Anderson (1985) showed that students can learn very different rules for recursive programming from the same example programs. Both of these factors place a premium on the explanations that accompany examples in instruction. Chi, Bassok, Lewis, Reimann, and Glaser (1989) found that better learners of physics are those who more carefully study and try to understand examples. This self-explanation effect can be understood in terms of whether students generate adequate understandings of the examples (see Chi, chap. 4, this volume).
Learning of both chunks and productions at the symbolic level in ACTâR are examples of all-or-none learning. In a single moment, a new symbolic structure is formed and is permanently added to the system....