Desarrollo de un algoritmo en Python para la simulación y análisis de fiabilidad de los test multirrespuesta

María José García Tárrago

doi:10.20868/ABE.2020.2.4461

Desarrollo de un algoritmo en Python para la simulación y análisis de fiabilidad de los test multirrespuesta

María José García Tárrago

Journal:

Advances in Building Education

ISSN: 2530-7940

Year of publication: 2020

Issue Title: mayo - agosto

Volume: 4

Issue: 2

Pages: 20-33

Type: Article

DOI: 10.20868/ABE.2020.2.4461 DIALNET GOOGLE SCHOLAR Dialnet editor

More publications in: Advances in Building Education

Abstract

There are many literatures related with the reliability of true/false and multiple- choice tests and their application in higher education. Choices per question, positive or negative marking, rewards of partial knowledge or how long they should be… The combination of all these parameters shows the wide set of test setup that each examiner could design. Is there any optimized configuration? An extended educational research has tried to answer these questions using probability calculations and empirical evaluations. In this investigation, a novel algorithm was designed with Python code to generate hypothetical examinees with specific features (real knowledge, degree of over-cautiousness, fatigue limit…). High knowledge level implies high probability to know whether an answer choice was true or false in a multiple- choice question. Over-cautiousness was related with the probability to answer an unknown question or the risk capacity of the examinee. Finally, fatigue is directly related with the number of questions in the test. Going beyond its upper limit the knowledge level is reduced and the over-cautiousness is increased. The algorithm launched tests to the hypothetical examinees analysing the deviation between the real knowledge (a feature of the examinee), and the estimated knowledge. This algorithm was used to optimize the different parameters of a test (length of test, choices per question, scoring system…) to reduce the influence of fatigue and over-cautiousness on the final score. An empirical evaluation was performed comparing different test setups to verify and validate the algorithm.

Bibliographic References

Ebel, R. L. (1979). Essentials of Educational Measurement. Englewood Cliffs New Jersey, Prentice-Hall.
Lesage, E. & Valcke, M. & Sabbe, E. (2013). Scoring methods for Multiple Choice Assessment in Higher Education – Is it still a Matter of Number Right Scoring or Negative Marking. Studies in Educational Evaluation, vol. 39, pp. 188-193.
Burton, R.F. (2005). Multiple-choice and true/false tests: myths and misapprehensions. Assessment & Evaluation in Higher Education, vol. 30, pp. 65-72.
Burton, R.F. (2004). Multiple-choice and true/false tests: reliability measures and some implications of negative marking. Assessment & Evaluation in Higher Education, vol. 29 pp. 585-595.
Warwick, J., Bush, M. & Jennings, S. (2010). Analysis and Evaluation of Liberal (Free-Choice) Multiple-Choise Tests. Innovation in Teaching and Learning in Information and Computer Sciences, vol. 9 pp. 1-12.
Bush, M. (2001). A Multiple Choice Test that Rewards Partial Knowledge. Journal of Further and Higher Education, vol. 25 pp. 157-163.
Burton, R.F. & Miller, D.J. (1999). Statistical Modelling of multiple-choise and True/False Tests: ways of considering, and of reducing, the uncertainties attributable to guessing. Assessment & Evaluation in Higher Education, vol. 24 pp. 399-411.
Frary, R.B. (1989). Partial-Credit Scoring Methods for Multiple-Choise Tests. Applied Measurement in Education, vol. 2 pp. 79-96.

Data source: Dialnet