Learning by Teaching a Synthetic Student: Using SimStudent to Study the Effect of Tutor Learning

Principal Investigator: 
Project Overview
Background & Purpose: 

The purpose of this project is to build an on-line (web-based) learning environment for students to learn algebra equation solving by teaching a synthetic computer agent, called SimStudent, which acts as a peer learning companion.  The on-line learning environment is called APLUS -- Artificial Peer Learning environment Using SimStudent.  We investigate cognitive, metacognitive, and social factors in a series of classroom (in vivo) studies to understand how and when students learn most by teaching using this learning environment (a.k.a, the effect of tutor learning).

Setting: 

The project will be conducted both in the Human-Computer Interaction Institute at the Carnegie Mellon University and at the School of Education at the University of Pittsburgh. We also have a tight connection to the NSF funded Pittsburgh Science of Learning Center that supports conducting classroom studies in local schools. We plan to conduct studies in actual school settings in one or two local high school algebra I classrooms per year.

Research Design: 

This is a cross-sectional project designed to generate causal [experimental, quasi-experimental] evidence. The planed study is integrated into a larger project where various factors on the effect of tutor learning are evaluated, but in this study, we are focusing on a particular metacognitive factor to study the self-explanation effect for tutor learning. The study contains an intervention [An on-line, web-based, learning environment where students are asked to self-explain their tutoring activities] and a comparison condition [the same learning environment without self-explanation]. Original data are collected through assessments of learning or achievement tests and survey research [self-completed questionnaires]. Pre- and post-tests will be also used to measure students’ learning achievement. The test items would contain (1) items to solve equations, (2) items that require explanations, (3) items that require adaptive use of conceptual knowledge, and (4) items that require error detection. We plan to use straightforward, between student comparisons using the learning gain as the dependent variable.

Findings: 

We conducted four classroom (in vivo) evaluation studies to test four major hypothesis: (1) The engineering hypothesis --  the APLUS and SimStudent technologies are robust enough to let students learn algebra equations by teaching SimStudent in authentic classroom settings. (2) The Self-Explanation hypothesis -- tutor learning will be facilitated when students are asked to explain and justify their tutoring activities.  (3) The motivation hypothesis -- when students are motivated to teach SimStudent better, then they will be more engaged in tutoring hence the tutor learning will be facilitated.  (4) The meta-tutor hypothesis -- students can be taught how to tutor better, and improving the quality of students' tutoring would enhane tutor learning as well. 

Study I (Baseline Study): 

The initial classroom trial was conducted in March 2010 to compare the effects of the proposed intervention with commercially available Cognitive Tutor Algebra I. There were 100 students in two high schools participated in the classroom study.  Students were randomly assigned into one of two conditions – one for learning by teaching SimStudent, and one for learning with Algebra I Cognitive Tutor.  All students took an on-line pre-, post-, and delayed-test.

The major findings include that (i) prior knowledge significantly influences the effect of learning by teaching – for students with insufficient training on the target problems, learning by teaching may be significantly less effective than learning by tutored problem solving, and (ii) problem selection is a key for a successful learning – the data showed that when tutoring SimStudent, students repeatedly used only similar, perhaps “easy” problems, and thus the overall accuracy of solving the problems improved as the tutoring session advanced, but that did not necessarily lead to great learning gain.

The above findings gave us insights into a better system design, including a problem bank that shows suggested problems to use, which has been integrated into more recent studies.  

Study II (Self-Explanation Study):

We conducted a second classroom study where we measured the effect of self-explanation for tutor learning. For this study, we modified SimStudent so that it occasionally asks “why” questions to have the student justify his/her tutoring activities. These questions are asked (i) when SimStudent is given a new problem to solve, (ii) when the student provides negative feedback on a step that SimStudent has suggested, and (iii) when the student demonstrates a step as a hint after SimStudent failed to perform the step “correctly” (note that the student could incorrectly provide negative feedback even when SimStudent made a correct). The study was conducted in December 2010 at a local high school under the supervision of the Pittsburgh Science of Learning Center. There were 160 students participated from a total of eight Algebra I classes taught by three different teachers.

The major findings suggest the following: (i) There is a main effect for the test that students performed better on the post-test than the pre-test.  The test consists of three sections of procedural and two sections of conceptual questions regarding algebra equation solving. (ii) There is no condition difference in terms of the test scores. However, (iii) students in the self-explanation condition used significantly fewer problems for tutoring (about 17 problems for self-explanation vs. 14 for control) when time on task was controlled. 

Study III (Game Show Study):

The goal of the Competitive Game Show study was to test the impact of motivational factors on tutor learning.  To achieve this goal, we implemented an online virtual Game Show where a pair of SimStudents tutored by two different students competes against each other by solving problems entered by the students as well as the host of the Game Show. Because students will compete against each other with their own SimStudents displayed in the Game Show platform, we allow students to customize their avatars. There are also two computer-controlled agents (i.e., pre-trained SimStudents) participating to the Game Show – one with a relatively high ability to solve difficult problems and one with a low ability.

The Game Show Study was held in April 2011 at a local middle school under the supervision of the Pittsburgh Science of Learning Center.  There were 140 students (8th and advanced 7th) in seven Algebra I classrooms taught by a total of three teachers.  The study was conducted in the same format as the previous studies – namely, a pre-test followed by three days of intervention, followed by a post-test.  There was also a delayed-test in two weeks after the last day of intervention.

Major findings from this study analysis include the following: (i) There is a significant test effect; students performed better on the post-test than on the pre-test.  Additionally, there was a weak trend of the Game Show condition performing better than the control condition where no Game Show was available.  (ii) Game Show students tutored fewer problems when controlling for time on task.  Despite the fewer number of problems tutored, Game Show students still achieved the same test scores as the control students.  (iii) Within the tutoring session, Game Show students appeared to be more engaged because they responded to Self Explanation prompts more often and more deeply than control students. (iv) Despite a deeper level of engagement, there was no condition difference in motivation based on questionnaire responses.  (v) Within the Game Show condition, there was no apparent correlation between Game Show performance and learning gain, although we did observe predatory-type relationships; students with higher ratings tended to challenge students with lower ratings for an easy win.  We therefore posit that our current Game Show design does not effectively align performance goals and learning goals, and we are investigating how to better align this. 

Study IV (Meta Tutor Study):

The final planned study in our proposal was to have a Meta Tutor integrated in our interface; that is, we embedded a third computer agent in the SimStudent program to assist students, upon request, with problem selection and when to administer a quiz to their SimStudent.  We hypothesized that our Meta Tutor, Mr. Williams, would increase students’ strategic approaches to learning.  Due to the popularity of customizing SimStudent avatars among the students in previous studies, we kept that feature for Study IV.

The Meta Tutor Study was held in April 2012 at a local middle school under the supervision of the Pittsburgh Science of Learning Center.  There were 173 students (advanced 7th grade and regular 8th grade) in 9 Algebra I and Pre-Algebra classes taught by a total of three teachers.  An additional study was also conducted at another local high school under the supervision of the Pittsburgh Science of Learning Center.  There were 3 classes of about 50 lower level 9th grade Algebra I students, taught by one teacher.  The studies were conducted in the same format as the previous studies – namely, a pre-test followed by three days of intervention, followed by a post-test.  There was also a delayed-test two weeks after the last day of intervention.

We are still collecting data from Study IV.  The detailed report on the findings from Study IV will be shared at a later date. 

Cross Study Analysis on Study I/II/III:

We extended analysis across the three studies to identify cognitive and social facilitators of learning, noting the existence of positive correlations between student learning and (i) the number of the target problems tutored (i.e., equation with variables on both sides) than more easier type of problems, (ii) the number of self-explanations given on those target problems, (iii) the correctness of student-provided hints and feedback given to SimStudent. 

There were also important correlations between variables showing how SimStudent was tutored and tutee (i.e., SimStudent’s) learning outcome.  When students were median-split into two groups based on the number of quiz sections SimStudent passed (showing SimStudent outcome), the following variables show significant difference between high and low groups: (i) the correctness of feedback and hint, (ii) the probability of disregarding SimStudent’s hint request, (iii) the frequency of the repetition in the problems tutored. 

There is also significant correlation between the tutor learning (measured as a gain from pre- to post-test) and the tutee learning (measures as the number of quiz sections SimStudent passed).

The above results show that when teaching correctly and appropriately, students learn more by teaching. Unsuccessful students (who could not manage their SimStudents to pass the quiz) taught SimStudent inaccurately and inappropriately (perhaps likely) without recognizing their poor teaching behavior.  In other words, learning by teaching does not happen automatically.

We also examined how the prior knowledge affected tutor learning.  First, the students’ prior knowledge (measured as their pre-test score) is a strongly predictive of the post-test scores both for procedural and conceptual tests.  Secondly, SimStudent’s prior knowledge (measured as its accuracy in performing steps at the startup time) is also significantly influential for the accuracy of students’ response, which in turn has a significant correlation with tutor learning as mentioned above.  One potential explanation of this finding is that it is easier to recognize steps as correct, than to identify incorrect steps as incorrect.  Students arguably learned by observing correct steps performed by SimStudent, aka learning from examples. 

These findings have been published in the Journal of Educational Psychology (2012). 

Cross Study Analysis on Study II/III (Self Explanation and Game Show study): 

This analysis focused on students’ shallow teaching, leading to SimStudent’s shallow learning.  We found that (i) there is a significant correlation between students’ learning and SimStudent’s learning, and (ii) the probability of a student correctly detecting SimStudent’s errors is a significant predictor of student learning.  We also investigated the quiz, which is the students’ diagnostic tool to test SimStudent knowledge.  Our observations are that (i) a fixed quiz set helps students to teach, but has a greater chance of encouraging SimStudent’s shallow learning, and (ii) blocked randomization is a better method, because it starts with a fixed type and alternates the numbers and variables within a problem.

These finding have been published as a peer-reviewed conference paper presented at the International Conference on Intelligent Tutoring Systems (2012).

Publications & Presentations: 

Journal Papers

Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., Stylianides, G. J., & Koedinger, K. R. (in press).Studying the Effect of Competitive Game Show in a Learning by Teaching Environment. International Journal of Artificial Intelligence in Education. [Invited paper for the special issue on the Best of ITS 2012]

Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., William, W. C., Stylianides, G. J., & Koedinge, K. R. (in press). Cognitive anatomy of tutor learning: Lessons learned with SimStudent. Journal of Educational Psychology.

Conference Papers 

Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., William, W. C., Stylianides, G., et al. (2012).Shallow learning as a pathway for successful learning both for tutors and tutees. In N. Miyake, D. Peebles & R. P. Cooper (Eds.), Proceedings of the Annual Conference of the Cognitive Science Society (pp. 731-736). Austin, TX: Cognitive Science Society. [38% acceptance rate out of 537 submissions]

Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., Stylianides, G., Cohen, W. W., et al. (2012).Motivational factors for learning by teaching: The effect of a competitive game show in a virtual peer-learning environment. In S. Cerri & W. Clancey (Eds.), Proceedings of International Conference on Intelligent Tutoring Systems (pp. 101-111). Heidelberg, Berlin: Springer-Verlag. [16% acceptance rate out of 177 submissions]

Carlson, R., Matsuda, N., Koedinger, K. R., & Rose, C. (2012). Building a Conversational SimStudent. In S. Cerri & W. Clancey (Eds.), Proceedings of International Conference on Intelligent Tutoring Systems (pp. 563-569). Heidelberg, Berlin: Springer-Verlag.

Matsuda, N., Cohen, W. W., Koedinger, K. R., Keiser, V., Raizada, R., Yarzebinski, E., et al. (2012). Studying the Effect of Tutor Learning using a Teachable Agent that asks the Student Tutor for Explanations. In M. Sugimoto, V. Aleven, Y. S. Chee & B. F. Manjon (Eds.), Proceedings of the International Conference on Digital Game and Intelligent Toy Enhanced Learning (DIGITEL 2012) (pp. 25-32). Los Alamitos, CA: IEEE Computer Society. [13% acceptance rate out of 56 submissions] A finalist for the best paper award.

Ogan, A., Finkelstein, S., Mayfield, E., D'Adamo, C., Matsuda, N., & Cassell, J. (2012). “Oh, dear Stacy!” Social interaction, elaboration, and learning with teachable agents. Proceedings of CHI2012 (39-48). [23% acceptance rate out of 1577 submissions]

Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., Stylianides, G., Cohen, W. W., et al. (2011).Learning by Teaching SimStudent – An Initial Classroom Baseline Study comparing with Cognitive Tutor. In G. Biswas & S. Bull (Eds.), Proceedings of the International Conference on Artificial Intelligence in Education (pp. 213-221): Springer. [32% acceptance rate, of 153]

Matsuda, N., Cohen, W. W., Koedinger, K. R., & Stylianides, G. (2010). Learning to solve algebraic equations by teaching a computer agent. In M. F. Pinto & T. F. Kawasaki (Eds.), Proceedings of the Conference of the International Group for the Psychology of Mathematics Education (Vol. 2, pp. 69).

Matsuda, N., Keiser, V., Raizada, R., Tu, A., Stylianides, G., Cohen, W. W., et al. (2010). Learning by Teaching SimStudent: Technical Accomplishments and an Initial Use with Students. In V. Aleven, J. Kay & J. Mostow (Eds.), Proceedings of the International Conference on Intelligent Tutoring Systems (pp. 317-326). Heidelberg, Berlin: Springer.

Matsuda, N., Cohen, W. W., Koedinger, K. R., Stylianides, G., Keiser, V., & Raizada, R. (2010). Tuning Cognitive Tutors into a Platform for Learning-by-Teaching with SimStudent Technology. In Proceedings of the International Workshop on Adaptation and Personalization in E-B/Learning using Pedagogic Conversational Agents (APLeC) (pp.20-25), Hawaii.

Matsuda, N., Lee, A., Cohen, W. W., & Koedinger, K. R. (2009). A Computational Model of How Learner Errors Arise from Weak Prior Knowledge. In N. Taatgen & H. van Rijn (Eds.), Proceedings of the Annual Conference of the Cognitive Science Society (pp. 1288-1293). Austin, TX: Cognitive Science Society.

Matsuda, N., Cohen, W. W., Sewall, J., Lacerda, G., & Koedinger, K. R. (2008). Why tutored problem solving may be better than example study: Theoretical implications from a simulated-student study. In B. P. Woolf, E. Aimeur, R. Nkambou & S. Lajoie (Eds.), Proceedings of the International Conference on Intelligent Tutoring Systems (pp. 111-121). Heidelberg, Berlin: Springer. http://www.cs.cmu.edu/~mazda/Doc/ITS2008/index.htm

Matsuda, N., Cohen, W. W., Sewall, J., Lacerda, G., & Koedinger, K. R. (2007). Predicting students performance with SimStudent that learns cognitive skills from observation. In R. Luckin, K. R. Koedinger & J. Greer (Eds.), Proceedings of the international conference on Artificial Intelligence in Education (pp. 467-476). Amsterdam, Netherlands: IOS Press. http://www.cs.cmu.edu/~mazda/Doc/AIED2007/index.htm

Matsuda, N., Cohen, W. W., Sewall, J., Lacerda, G., & Koedinger, K. R. (2007). Evaluating a simulated student using real students data for training and testing. In C. Conati, K. McCoy & G. Paliouras (Eds.), Proceedings of the international conference on User Modeling (LNAI 4511) (pp. 107-116). Berlin, Heidelberg: Springer. http://www.cs.cmu.edu/~mazda/Doc/UM2007/index.htm

Matsuda, N., Cohen, W. W., & Koedinger, K. R. (2005). Building Cognitive Tutors with Programming by Demonstration. In S. Kramer & B. Pfahringer (Eds.), Technical report: TUM-I0510 (Proceedings of the International Conference on Inductive Logic Programming) (pp. 41-46): Institut fur Informatik, Technische Universitat Munchen. http://www.cs.cmu.edu/~mazda/Doc/ILP2005/index.html

Matsuda, N., Cohen, W. W., & Koedinger, K. R. (2005). Applying Programming by Demonstration in an Intelligent Authoring Tool for Cognitive Tutors. In AAAI Workshop on Human Comprehensible Machine Learning (Technical Report WS-05-04) (pp. 1-8). Menlo Park, CA: AAAI association. http://www.cs.cmu.edu/~mazda/Doc/AAAI05/index.htm

Technical Reports

Noboru Matsuda, William W. Cohen, Jonathan Sewall, and Kenneth R. Koedinger (2006). Applying Machine Learning to Cognitive Modeling for Cognitive Tutors, Technical report CMU-ML-06-105, School of Computer Science, Carnegie Mellon University. http://www.cs.cmu.edu/~mazda/Doc/ML-06-105/index.htm

Noboru Matsuda, William W. Cohen, Jonathan Sewall, and Kenneth R. Koedinger (2006). What characterizes a better demonstration for cognitive modeling by demonstration? Technical report CMU-ML-06-106, School of Computer Science, Carnegie Mellon University. http://www.cs.cmu.edu/~mazda/Doc/ML-06-106/index.htm

Other Products: 

This project will produce an on-line learning environment available as a web application where students learn linear algebra equation by teaching SimStudent, a teachable peer student. The tutoring interaction is much like as human-to-human tutoring – posing a problem, providing feedback on steps performed, and answering what-to-do-next questions.