Estudio de la fiabilidad de test multirrespuesta con el método de Monte Carlo

  1. José Calaf Chica 1
  2. María José García Tárrago 1
  1. 1 Universidad de Burgos
    info

    Universidad de Burgos

    Burgos, España

    ROR https://ror.org/049da5t36

Journal:
Revista de educación

ISSN: 0034-8082

Year of publication: 2021

Issue: 392

Pages: 63-96

Type: Article

DOI: 10.4438/1988-592X-RE-2021-392-479 DIALNET GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Revista de educación

Abstract

During the twentieth century many investigations have been published about the reliability of the multiple-choice tests for subject evaluation. Specifically, there are a lot of theoretical and empirical studies that compare the different scoring methods applied in tests. A novel algorithm was designed to generate hypothetical examinees with three specific characteristics: real knowledge, level of cautiousness and erroneous knowledge. The first one established the probability to know the veracity or falsity of each answer choice in a multiple-choice test. The cautiousness level showed the probability to answer an unknown question by guessing. Finally, the erroneous knowledge was false knowledge assimilated as true. The test setup needed by the algorithm included the test length, choices per question and the scoring system. The algorithm launched tests to these hypothetical examinees analysing the deviation between the real knowledge and the estimated knowledge (the test score). The most popular test scoring methods (positive marking, negative marking, free-choice tests and the dual response method) were analysed and compared to measure their reliability. In order to validate the algorithm, this was compared with an analytical probabilistic model. This investigation verified that the presence of the erroneous knowledge generates an important alteration in the reliability of the most accepted scoring methods in the educational community (the negative marking method). In view of the impossibility of ascertaining the existence of erroneous knowledge in the examinees using a test, the examiner could penalize its presence with the use of negative marking, or looking for a best fitted estimation of the real knowledge with the positive marking method.

Bibliographic References

  • Akeroyd, Michael. 1982. “Progress in Multiple Choice Scoring Methods, 1977/81.” Journal of Further and Higher Education 6(3):86–90.
  • Betts, Lucy R., Tracey J. Elder, James Hartley, and M. Trueman. 2009. “Does Correction for Guessing Reduce Students’ Performance on Multiple-Choice Examinations? Yes? No? Sometimes?” Assessment and Evaluation in Higher Education.
  • Budescu, David, and Maya Bar-Hillel. 1993. “To Guess or Not to Guess: A Decision-Theoretic View of Formula Scoring.” Journal of Educational Measurement.
  • Burton, Richard F. 2004. “Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking.” Assessment and Evaluation in Higher Education.
  • Burton, Richard F. 2005. “Multiple-Choice and True/False Tests: Myths and Misapprehensions.” Assessment and Evaluation in Higher Education.
  • Bush, Martin. 2015. “Reducing the Need for Guesswork in Multiple- Choice Tests.” Assessment and Evaluation in Higher Education.
  • Espinosa, María Paz, and Javier Gardeazabal. 2010. “Optimal Correction for Guessing in Multiple-Choice Tests.” Journal of Mathematical Psychology.
  • Hammond, E. J., A. K. McIndoe, A. J. Sansome, and P. M. Spargo. 1998. “Multiple-Choice Examinations: Adopting an Evidence-Based Approach to Exam Technique.” Anaesthesia.
  • Hsu, Fu Yuan, Hahn Ming Lee, Tao Hsing Chang, and Yao Ting Sung. 2018. “Automated Estimation of Item Difficulty for Multiple-Choice Tests: An Application of Word Embedding Techniques.” Information Processing and Management.
  • Jennings, Sylvia, and Martin Bush. 2006. “A Comparison of Conventional and Liberal (Free-Choice) Multiple-Choice Tests.” Practical Assessment, Research and Evaluation.
  • Kurz, Terri Barber. 1999. “A Review of Scoring Algorithms for Multiple- Choice Tests.” Annual Meeting of the Southwest Educational Research Association (San Antonio, TX, January 21-23, 1999.
  • Lin, Chih Kai. 2018. “Effects of Removing Responses With Likely Random Guessing Under Rasch Measurement on a Multiple-Choice Language Proficiency Test.” Language Assessment Quarterly.
  • Moon, Jung Aa, Madeleine Keehner, and Irvin R. Katz. 2020. “Test Takers’ Response Tendencies in Alternative Item Formats: A Cognitive Science Approach.” Educational Assessment.
  • Papenberg, Martin, Birk Diedenhofen, and Jochen Musch. 2019. “An Experimental Validation of Sequential Multiple-Choice Tests.” Journal of Experimental Education.
  • Parkes, Jay, and Dawn Zimmaro. 2016. Learning and Assessing with Multiple-Choice Questions in College Classrooms.
  • Riener, Gerhard, and Valentin Wagner. 2017. “Shying Away from Demanding Tasks? Experimental Evidence on Gender Differences in Answering Multiple-Choice Questions.” Economics of Education Review.
  • Slepkov, Aaron D., and Alan T. K. Godfrey. 2019. “Partial Credit in Answer- Until-Correct Multiple-Choice Tests Deployed in a Classroom Setting.” Applied Measurement in Education.
  • Warwick, Jon, Martin Bush, and Sylvia Jennings. 2010. “Analysis and Evaluation of Liberal (Free-Choice) Multiple-Choice Tests.” Innovation in Teaching and Learning in Information and Computer Sciences 9(2):1–12.