Programming Assessment in E-Learning through Rule-Based Automatic Question Generation with Large Language Models

Authors

  • Halim Teguh Saputro Politeknik Negeri Malang
  • Usman Nurhasan Politeknik Negeri Malang
  • Vivi Nur Wijayaningrum Politeknik Negeri Malang

DOI:

https://doi.org/10.30871/jaic.v9i6.10901

Keywords:

Automatic Question Generation, E-learning, LLMs, Programming Rule-based, Taxonomy Bloom

Abstract

This study develops an evaluation instrument for Python programming using a Rule-Based Automatic Question Generation (AQG) system integrated with Large Language Models (LLMs), designed based on the Revised Bloom’s Taxonomy. The urgency of this research stems from the limitations of conventional programming evaluations, which are often time-consuming, less objective, and insufficiently aligned with cognitive learning levels. The proposed method applies assessment terms as rule-based constraints to guide LLM-generated questions, ensuring both pedagogical validity and structural consistency in JSON format. A total of 91 questions were produced, consisting of multiple-choice and coding items, which were then validated by three programming experts and tested on 32 vocational students. The findings indicate that the instrument achieved an overall validity of 77.66% (valid category), with the highest accuracy at the Apply (96.30%) and Create (100%) levels. The reliability test using Cronbach’s Alpha yielded 0.721, showing acceptable internal consistency. Item difficulty analysis revealed a strong dominance of easy questions (97.78%), with only 2.22% classified as moderate and none as difficult. Student performance also showed a fluctuating pattern: high in Remember (94.79%), Understand (95.83%), and Create (95.60%), but lower in Apply (86.11%), Analyze (90.97%), and Evaluate (87.15%). These results confirm that integrating Rule-Based AQG with LLMs can produce valid, reliable, and adaptive evaluation instruments that not only capture basic programming competencies but also partially address higher-order cognitive skills. This research contributes both practically by providing educators with an efficient tool for generating evaluation items and academically by enriching the growing body of literature on AI-assisted assessment in programming education.

Downloads

Download data is not yet available.

References

[1] L. Kyung Choi, N. Iftitah, and P. Angela, “Developing Technopreneur Skills to Face Future Challenges,” IAIC Transactions on Sustainable Digital Innovation (ITSDI), vol. 5, no. 2, pp. 127–135, 2024.

[2] E. Mehmood, A. Abid, M. S. Farooq, and N. A. Nawaz, “Curriculum, Teaching and Learning, and Assessments for Introductory Programming Course,” IEEE Access, vol. 8, pp. 125961–125981, 2020, doi: 10.1109/ACCESS.2020.3008321.

[3] K. Ishaq and A. Alvi, “Personalization, Cognition, and Gamification-based Programming Language Learning: A State-of-the-Art Systematic Literature Review,” Sep. 2023.

[4] Z. Ullah, A. Lajis, M. Jamjoom, A. H. Altalhi, J. Shah, and F. Saleem, “A rule-based method for cognitive competency assessment in computer programming using bloom’s taxonomy,” IEEE Access, vol. 7, pp. 64663–64675, 2019, doi: 10.1109/ACCESS.2019.2916979.

[5] O. Keklik, T. Tuglular, and S. Tekir, “Rule-based automatic question generation using semantic role labeling,” IEICE Trans Inf Syst, vol. E102D, no. 7, pp. 1362–1373, 2019, doi: 10.1587/transinf.2018EDP7199.

[6] M. F. Naufal and S. F. Kusuma, “Otomatisasi Pembangkitan Pertanyaan untuk Bahasa Indonesia (Systematic Literature Review),” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 10, no. 1, pp. 185–192, Feb. 2023, doi: 10.25126/jtiik.2023106455.

[7] S. Maity, A. Deroy, and S. Sarkar, “How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom’s Revised Taxonomy?,” 2024.

[8] Z. Ullah, A. Lajis, M. Jamjoom, A. H. Altalhi, J. Shah, and F. Saleem, “A rule-based method for cognitive competency assessment in computer programming using bloom’s taxonomy,” IEEE Access, vol. 7, pp. 64663–64675, 2019, doi: 10.1109/ACCESS.2019.2916979.

[9] R. Kadar, S. A. Mohamed Yusoff, S. N. Warris, and M. S. Abu Bakar, “Students’ Assessments in Learning Programming based on Bloom’s Taxonomy,” Journal of Computing Research and Innovation, vol. 6, no. 3, pp. 13–21, Sep. 2021, doi: 10.24191/jcrinn.v6i3.223.

[10] S. R. Sobral, “Bloom’s taxonomy to improve teaching-learning in introduction to programming,” International Journal of Information and Education Technology, vol. 11, no. 3, pp. 148–153, Mar. 2021, doi: 10.18178/ijiet.2021.11.3.1504.

[11] E. A. O. Zijlmans, J. Tijmstra, L. A. van der Ark, and K. Sijtsma, “Item-score reliability as a selection tool in test construction,” Front Psychol, vol. 9, no. JAN, Jan. 2019, doi: 10.3389/fpsyg.2018.02298.

[12] N. Z. Zuhri, S. Syihabuddin, and T. Tatang, “Analisis Validitas, Reliabilitas, dan Tingkat Kesukaran Soal Bahasa Arab Tingkat SMP Berbasis Artificial Intelligence (AI) melalui Platform QuestionWell,” Jurnal Pendidikan dan Pembelajaran Indonesia (JPPI), vol. 4, no. 2, pp. 693–704, Jul. 2024, doi: 10.53299/jppi.v4i2.576.

[13] P. Blessy Paul and C. Kurian, “Generation of Bloom’s taxonomy-based complex-level questions using knowledge graph,” in Proceedings - 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems: Harmonizing Signals, Data, and Energy: Bridging the Digital Future, SPICES 2024, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/SPICES62143.2024.10779773.

[14] D. Kusumaningrum and A. Muslihasari, “Pengembangan Instrumen Literasi Lingkungan Ranah Kognitif Untuk Siswa Sekolah Dasar di Kabupaten Malang,” Sep. 2020.

[15] A. Yaacoub, J. Da-Rugna, and Z. Assaghir, “Assessing AI-Generated Questions’ Alignment with Cognitive Frameworks in Educational Assessment,” 2025.

[16] S. Al Faraby, A. Romadhony, and Adiwijaya, “Analysis of LLMs for educational question classification and generation,” Computers and Education: Artificial Intelligence, vol. 7, Dec. 2024, doi: 10.1016/j.caeai.2024.100298.

[17] H. Touvron et al., “LLaMA: Open and Efficient Foundation Language Models,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2302.13971

[18] N. Scaria, S. Dharani Chenna, and D. Subramani, “Automated Educational Question Generation at Different Bloom’s Skill Levels Using Large Language Models: Strategies and Evaluation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Science and Business Media Deutschland GmbH, 2024, pp. 165–179. doi: 10.1007/978-3-031-64299-9_12.

[19] K. Hwang, S. Challagundla, M. M. Alomair, L. Karen Chen, and F.-S. Choa, “Towards AI-Assisted Multiple Choice Question Generation and Quality Evaluation at Scale: Aligning with Bloom’s Taxonomy,” 2023. [Online]. Available: https://tinyurl.com/35am6sah

[20] L. A. Son, “Instrumentasi Kemampuan Pemecahan Masalah matematis analisis reliabilitas, validitas, tingkat kesukaran dan daya beda butir soal,” 2019.

[21] M. Erfan et al., “Analisis Kualitas Soal Kemampuan Membedakan Rangkaian Seri Dan Paralel Melalui Teori Tes Klasik Dan Model Rasch,” Indonesian Journal of Educational Research and Review, vol. 3, no. 1, p. 11, 2020.

[22] K. Singhal et al., “Toward expert-level medical question answering with large language models,” Nat Med, vol. 31, no. 3, pp. 943–950, Mar. 2025, doi: 10.1038/s41591-024-03423-7.

[23] S. Ren et al., “CodeBLEU: a Method for Automatic Evaluation of Code Synthesis,” Sep. 2020, [Online]. Available: http://arxiv.org/abs/2009.10297

[24] J. Cressa and M. Mukhlis, “Level Kognitif Taksonomi Bloom pada Soal Mata Pelajaran Bahasa Indonesia,” vol. 3, 2023, [Online]. Available: https://journal.uir.ac.id/index.php/j-lelc

[25] A. Gomes and F. B. Correia, Bloom’s Taxonomy Based Approach to Learn Basic Programming Loops. IEEE, 2018.

[26] S. Marar, M. A. Hamza, M. Ayyash, and A. Abu-Shaheen, “Development and validation of an instrument to assess the knowledge and perceptions of predatory journals,” Heliyon, vol. 9, no. 11, Nov. 2023, doi: 10.1016/j.heliyon.2023.e22270.

[27] M. W. Gebremichael, B. Baraki, M. A. Mehari, and B. Assalfew, “Item analysis of multiple choice questions from assessment of health sciences students, Tigray, Ethiopia,” BMC Med Educ, vol. 25, no. 1, Dec. 2025, doi: 10.1186/s12909-025-06904-6.

Downloads

Published

2025-12-06

How to Cite

[1]
H. T. Saputro, U. Nurhasan, and V. Nur Wijayaningrum, “Programming Assessment in E-Learning through Rule-Based Automatic Question Generation with Large Language Models”, JAIC, vol. 9, no. 6, pp. 3356–3362, Dec. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.