Measuring the Impact of Online Discourse in Undergraduate STEM Courses: Semi-automatic Assessment of Large Discussion Board Corpora

Principal Investigator: 
Project Overview
Background & Purpose: 

The investigators seek to explore the extent to which the quantity and quality of student participation in course discussion boards (a.k.a. online asynchronous discussions (OAD)) is associated with retention in or dropping out from undergraduate computer science and industrial engineering majors. This study represents a planning and pilot study using data from discussion board enhanced STEM courses at the University of Southern California. The ultimate goal of the investigators' intended future research will be to produce knowledge usable in making more effective use of this learning technology.

Setting: 

University of Southern California

Research Design: 

The project uses a comparative research design and will generate evidence that is descriptive [observational], associative/correlational [quasi-experimental], and causal [quasi-experimental and statistical modeling]. Original data is being collected on undergraduate and graduate engineering students using school records, assessments of learning, observation [Web logs], and survey research [self-completion questionnaire]. Instruments being used include a multidimensional questionnaire that measures 4 constructs associated with efficacy, motivation, multidisciplinary behavior and interaction.

This is a quantitative study designed to connect participation in online discussions to course performance, socio-demographic characteristics of students, and course retention. We will develop and explore new measures that we can use for analyzing qualitative characteristics of individual student contributions or classifying discussion thread patterns. We will validate whether they can be used as a variable for analyzing student discussions.

Preliminarily, we have designed our model with two dependent variables (retention and performance) and eleven independent variables recognizing that after initial fitting of our model as these variables may change. The multivariate model has the following variables:
 

  • Total population mean across student groups
  • Level of significance for group I (each group)
  • Effect size of discourse interaction type 1 (questioning)
  • Effect size of discourse interaction 2 (response)
  • Effect size of discourse interaction 3 (off line comment)
  • Effect size of gender
  • Effect size of course achievement (grades/concept inventory/z scored project grades sums)
  • Effect size of familial responsibility
  • Effect size of engineering efficacy/engagement/interest
  • Effect size of minority status
  • Effect size of employment status
  • Effect size of level of course engagement (# interactions)
  • Effect size of year in school (undergrad versus grad)
  • Sample mean for groups i through j for discourse interaction type 1
  • Sample mean for groups i through j for discourse interaction type 2
  • Sample mean for groups i through j for discourse interaction type 3
  • Sample mean for groups i through j for gender
  • Sample mean for groups i through j for course achievement (grades/or concept inventory/z scored project grades sums)
  • Sample mean for groups i through j for familial responsibility
  • Sample mean for groups i through j for engineering efficacy/engagement/interest
  • Sample mean for groups i through j for minority status
  • Sample mean for groups i through j for employment status
  • Sample mean for groups i through j for level of course engagement (# interactions)
  • Sample mean for groups i through j for year in school (undergrad versus grad)
  • Maximum error for groups i and j variate response vector.

We will use statistical techniques including hierarchical linear modeling with two levels.

Findings: 

We have analyzed the type of content in computer science student discussions, including abbreviations, quotes, emoticons, programming constructs, error messages, technical terms used, and typos. Depending on the type of content analysis, some of these can be transformed or filtered out. A set of additional dialogue characteristics are captured as variables for analyzing student Q&A discussions. They include both quantitative variables (e.g. number of messages, thread length) and qualitative variables (e.g. how early they ask questions before the deadline).

Other Products: 

We will develop and explore new measures that we can use for analyzing qualitative characteristics of individual student contributions or classifying discussion thread patterns. Candidate qualitative measures include:

  • Coherence of text written by the student
  • How much help was given to the student and how much help he/she provided.
  • Positive and negative tones or emotional expressions in the text
  • Whether the initial question was answered.
  • Kinds of topics discussed, kinds of questions asked