Abstract Syntax Tree Model for Minimizing False Negative in Semantic Evaluation of Python Fill-in-the-Blank

Usman Nurhasan; Didik Dwi Prasetya; Syaad Patmanthara

doi:10.30871/jaic.v9i6.11090

Authors

Usman Nurhasan Departemen Teknik Elektro dan Informatika, Universitas Negeri Malang, Indonesia
Didik Dwi Prasetya Departemen Teknik Elektro dan Informatika, Universitas Negeri Malang, Indonesia
Syaad Patmanthara Departemen Teknik Elektro dan Informatika, Universitas Negeri Malang, Indonesia

DOI:

https://doi.org/10.30871/jaic.v9i6.11090

Keywords:

Abstract Syntax Tree, Epistemology of Evaluation, Fill-in-the-Blank, Pedagogical Effectiveness, Semantic Evaluation

Abstract

This study develops and evaluates an automated assessment model using Abstract Syntax Trees (AST) with a view to overcoming the limitations of string-matching techniques in the assessment of Fill-in-the-Blank (FIB) programming answers. Traditional string-matching techniques have a relatively high False Negative Rate (FNR) of 21.5% within the context of detecting semantic equivalence. The current model uses semantic structural triangulation to ascertain the semantic similarity of student answers. Technical assessment shows that the AST approach markedly reduces the FNR to 4.5%. The model demonstrates high reliability (ϰ = 0.83) with high classification accuracy (F1 Score = 0.966) which attests to its inferential validity. From a pedagogical perspective, system implementation leads to substantial learning gains, evidenced by a large effect size (Cohen’s d = 1.82) and a high normalized gain (Normalized Gain = 0.90). Multiple regression analysis confirms that semantic accuracy is the primary causal factor driving improved student comprehension. Ontologically, while AST is valid as a partial representation, its limitations—particularly tree isomorphism in recursive structures—highlight the need for further exploration of graph isomorphism approaches. Control Flow Graphs (CFG) and Data Flow Graphs (DFG) offer more expressive relational models for capturing control and data dependencies. The model demonstrates functional feasibility with a System Usability Scale (SUS) score of 76.47. Overall, the AST Triangulation Model is validated as pedagogically effective, inferentially robust, and supportive of evaluative transparency. Future research recommends validating the model on more complex tasks and releasing it as open-source to support reproducibility.

Downloads

Download data is not yet available.

References

[1] A. Agarwal, “Python for CS1, CS2 and beyond,” J. Comput. Small Coll., vol. 20, pp. 262–270, Jan. 2005.

[2] M. Messer, N. C. C. Brown, M. Kölling, and M. Shi, “Automated Grading and Feedback Tools for Programming Education: A Systematic Review,” ACM Trans. Comput. Educ., vol. 24, no. 1, Feb. 2024, doi: 10.1145/3636515.

[3] Z. Fan, S. H. Tan, and A. Roychoudhury, “Concept-Based Automated Grading of CS-1 Programming Assignments,” ISSTA 2023 - Proc. 32nd ACM SIGSOFT Int. Symp. Softw. Test. Anal., pp. 199–210, 2023, doi: 10.1145/3597926.3598049.

[4] B. Cheang, A. Kurnia, A. Lim, and W.-C. Oon, “On automated grading of programming assignments in an academic institution,” Comput. Educ., vol. 41, pp. 121–131, Sep. 2003, doi: 10.1016/S0360-1315(03)00030-7.

[5] M. Erfan, I. Handika, Afriyanti, W. Aziiz Hari Mukti, and T. Ratu, “Penggunaan Bahasa Pemrograman Python dalam Analisis Hubungan Peminat dan Daya Tampung Seluruh Prodi di Indonesia Pada PTN Akademik, Vokasi dan PTKIN Tahun 2023,” J. Classr. Action Res., vol. 6, no. 2, pp. 313–9, 2024, [Online]. Available: http://jppipa.unram.ac.id/index.php/jcar/index

[6] A. Kholik, H. Bisri, Z. K. Lathifah, B. Kartakusumah, M. Maufur, and T. Prasetyo, “Impelementasi Kurikulum Merdeka Belajar Kampus Merdeka (MBKM) Berdasarkan Persepsi Dosen dan Mahasiswa,” J. Basicedu, vol. 6, no. 1, pp. 738–748, 2022, doi: 10.31004/basicedu.v6i1.2045.

[7] Z. Swilam, A. Hamdy, and A. Pester, “Improving code semantics learning using enhanced Abstract Syntax Tree,” Int. J. Comput. Appl., vol. 47, no. 1, pp. 57–69, Jan. 2025, doi: 10.1080/1206212X.2024.2443506.

[8] Z. Zhu, N. Funabiki, M. Mentari, S. T. Aung, W. C. Kao, and Y. F. Lee, “An Automatic Code Generation Tool Using Generative Artificial Intelligence for Element Fill-in-the-Blank Problems in a Java Programming Learning Assistant System,” Electron., vol. 14, no. 11, pp. 1–27, 2025, doi: 10.3390/electronics14112261.

[9] A.-T. P. Nguyen and V.-D. Hoang, “Development of Code Evaluation System based on Abstract Syntax Tree,” J. Tech. Educ. Sci., vol. 19, no. 1, pp. 15–24, 2024, doi: 10.54644/jte.2024.1514.

[10] D. R. Fudholi and A. Capiluppi, “Artificial intelligence for source code understanding tasks: A systematic mapping study,” Inf. Softw. Technol., vol. 189, p. 107915, 2026, doi: https://doi.org/10.1016/j.infsof.2025.107915.

[11] Geetika, N. Kaur, and A. Kaur, “A Semantic-driven approach to detect Type-4 code clones by using AST and PDG,” Int. J. Inf. Technol., Jul. 2025, doi: 10.1007/s41870-025-02670-2.

[12] M. Hammad, Ö. Babur, H. Basit, and M. Brand, “Clone-Seeker: Effective Code Clone Search Using Annotations,” IEEE Access, vol. 10, p. 1, Jan. 2022, doi: 10.1109/ACCESS.2022.3145686.

[13] P. R., T. Mg, and J. Kannimoola, “Automated Code Assessment and Feedback: A Comprehensive Model for Improved Programming Education,” IEEE Access, vol. PP, p. 1, Jan. 2025, doi: 10.1109/ACCESS.2025.3554838.

[14] S. Parihar, Z. Dadachanji, P. K. Singh, R. Das, A. Karkare, and A. Bhattacharya, “Automatic Grading and Feedback using Program Repair for Introductory Programming Courses,” in Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education, in ITiCSE ’17. New York, NY, USA: Association for Computing Machinery, 2017, pp. 92–97. doi: 10.1145/3059009.3059026.

[15] G. Jiang, “Design and Implementation of an Automatic Grading System for Programming Code Based on Artificial Intelligence,” in 2025 IEEE 3rd International Conference on Image Processing and Computer Applications (ICIPCA), 2025, pp. 1846–1851. doi: 10.1109/ICIPCA65645.2025.11139057.

[16] E. Telli and A. Altun, “Effect of semantic encoding strategy instruction on transfer of learning in e-learning environments,” J. Educ. Technol. Online Learn., vol. 6, Jan. 2023, doi: 10.31681/jetol.1205276.

[17] A. Sheoran et al., “Data reporting quality and semantic interoperability increase with community-based data elements (CoDEs). Analysis of the open data commons for spinal cord injury (ODC-SCI),” Exp. Neurol., vol. 385, p. 115100, 2025, doi: https://doi.org/10.1016/j.expneurol.2024.115100.

[18] C. Xu, M. B. Mashhadi, Y. Ma, R. Tafazolli, and J. Wang, “Generative Semantic Communications With Foundation Models: Perception-Error Analysis and Semantic-Aware Power Allocation,” IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2493–2505, 2025, doi: 10.1109/JSAC.2025.3559120.

[19] Jessica J Santana and Seonghoon Kim, “From Values to Codes: A computational text analysis of the codification of occupational ethics,” Organ. Stud., p. 01708406251317255, Feb. 2025, doi: 10.1177/01708406251317255.

[20] A. Brockinton, M. Salnitri, F. Kooner-Evans, J. McAlaney, and S. Thompson, “An exploratory study on the human component using a cultural model to define open research topics for secure socio-technical systems,” Technol. Soc., vol. 83, p. 103000, 2025, doi: https://doi.org/10.1016/j.techsoc.2025.103000.

[21] Macclarck Pessoa Nery, Severiano José dos Santos Neto, Roberty Santos Alves, João Vitor dos Santos Santana, Sandro Griza, and Carlos Otávio Damas Martins, “Development of educational software for stainless steel selection and evaluating usability using the System Usability Scale (SUS),” Int. J. Mech. Eng. Educ., vol. 53, no. 4, pp. 957–972, Aug. 2024, doi: 10.1177/03064190241266978.

[22] S. F. Brähmer et al., “Development of a Serious Game App (Digimenz) for Patients With Dementia: Prospective Pilot Study for Usability Testing in Inpatient Treatment and Long-Term Care,” JMIR Serious Games, vol. 13, p. e69812, 2025, doi: 10.2196/69812.

[23] X. Xu et al., MGF-ESE: An Enhanced Semantic Extractor with Multi-Granularity Feature Fusion for Code Summarization, vol. 1, no. 1. Association for Computing Machinery, 2025. doi: 10.1145/3696410.3714544.

[24] L. Deng, X. Ren, C. Ni, M. Liang, D. Lo, and Z. Liu, “Enhancing Project-Specific Code Completion by Inferring Internal API Information,” IEEE Trans. Softw. Eng., vol. 51, no. 9, pp. 2566–2582, 2025, doi: 10.1109/TSE.2025.3592823.

[25] D. Chicco, A. Sichenze, and G. Jurman, A simple guide to the use of Student’s t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics, vol. 18, no. 1. BioMed Central, 2025. doi: 10.1186/s13040-025-00465-6.

[26] Chengliang Wang, Xiaojiao Chen, Yifei Li, Pengju Wang, Haoming Wang, and Yuanyuan Li, “MetaClassroom: A New Paradigm and Experience for Programming Education,” J. Educ. Comput. Res., vol. 63, no. 4, pp. 864–901, Feb. 2025, doi: 10.1177/07356331251322470.

[27] H. Cui, M. Xie, T. Su, C. Zhang, and S. H. Tan, “An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues,” vol. 1, no. 1, pp. 1–26, 2024, [Online]. Available: http://arxiv.org/abs/2408.13855

[28] Z. Chen, S. Villar, L. Chen, and J. Bruna, “On the equivalence between graph isomorphism testing and function approximation with GNNs,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., 2019.

[29] S. Dikici and T. T. Bilgin, “Advancements in automated program repair: a comprehensive review,” Knowl. Inf. Syst., vol. 67, no. 6, pp. 4737–4783, 2025, doi: 10.1007/s10115-025-02383-9.

Abstract Syntax Tree Model for Minimizing False Negative in Semantic Evaluation of Python Fill-in-the-Blank

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn