Comparative Evaluation of Agentic Workflow Capabilities in AI IDE Agents for Web-Based Learning Media Development

Authors

  • Erlangga Aditia Universitas Pendidikan Indonesia
  • Ulva Elviani Universitas Pendidikan Indonesia

DOI:

https://doi.org/10.30871/jaic.v10i3.12841

Keywords:

Agentic Workflow, AI IDE Agents, Comparative Evaluation, Web-Based Learning Media

Abstract

The rapid development of agentic IDEs calls for evaluation approaches that assess not only final outputs but also the agentic workflow enacted during software development. This study comparatively evaluates the workflow capabilities of four AI IDE agents, namely Cursor, Windsurf, Trae, and Antigravity, within the five-stage benchmark of developing a web-based learning media application, Next-Gen SPLDV. A descriptive comparative evaluation was conducted using five agentic maturity metrics: task decomposition (DC), tool-use effectiveness (TSR), autonomous recovery capability (ARC), human intervention cost (HIC), and time completion efficiency (TCT), complemented by interaction logs and internal artifacts. The findings indicate distinct performance trade-off profiles across systems. Antigravity appeared relatively more stable descriptively (TSR 96.0%; HIC 3; ARC 8), whereas the other systems exhibited context-dependent strengths: Cursor showed more selective tool use, Trae was efficient in several stages but more vulnerable during database integration, and Windsurf was more exploratory but required higher intervention and recovery effort. Qualitative evidence further suggests that these differences were associated with variations in plan-execute-verify strategies and error-response behavior. Overall, the evaluation of AI IDE agents is better interpreted as a contextual map of workflow trade-offs rather than the identification of a single winner across all settings.

Downloads

Download data is not yet available.

References

[1] A. Fan et al., “Large Language Models for Software Engineering: Survey and Open Problems,” in 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), IEEE, 2023, pp. 31–53. doi: 10.1109/ICSE-FoSE59343.2023.00008.

[2] Q. Zhang et al., “A survey on large language models for software engineering,” Science China Information Sciences, vol. 69, no. 4, 2026, doi: 10.1007/s11432-025-4670-0.

[3] Y. Majdoub and E. Ben Charrada, “Debugging with Open-Source Large Language Models: An Evaluation,” in Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, in ESEM ’24. New York, NY, USA: Association for Computing Machinery, 2024, pp. 510–516. doi: 10.1145/3674805.3690758.

[4] N. Davila et al., “An Industry Case Study on Adoption of AI-based Programming Assistants,” in Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice, in ICSE-SEIP ’24. New York, NY, USA: Association for Computing Machinery, 2024, pp. 92–102. doi: 10.1145/3639477.3643648.

[5] H. Wang, J. Gong, H. Zhang, J. Xu, and Z. Wang, “AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities,” 2025. [Online]. Available: https://arxiv.org/abs/2508.11126

[6] S. Suri, S. N. Das, K. Singi, K. Dey, V. S. Sharma, and V. Kaulgud, “Software Engineering Using Autonomous Agents: Are We There Yet?,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2023, pp. 1855–1857. doi: 10.1109/ASE56229.2023.00174.

[7] S. Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models,” 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

[8] T. Schick et al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

[9] C. S. Xia, Y. Deng, S. Dunn, and L. Zhang, “Demystifying LLM-Based Software Engineering Agents,” Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 801–824, 2025, doi: 10.1145/3715754.

[10] M. Shrivastava, “A Comparative Featureset Analysis of Agentic IDE Tools,” Jun. 2025, doi: 10.20944/preprints202506.0821.v1.

[11] M. Chen et al., “Evaluating Large Language Models Trained on Code,” arXiv preprint arXiv:2107.03374, 2021, [Online]. Available: https://arxiv.org/abs/2107.03374

[12] C. E. Jimenez et al., “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?,” in Proceedings of the International Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://arxiv.org/abs/2310.06770

[13] A. Bandi, B. Kongari, R. Naguru, S. Pasnoor, and S. V. Vilipala, “The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges,” Future Internet, vol. 17, no. 9, p. 404, Sep. 2025, doi: 10.3390/fi17090404.

[14] R. Peredo, A. Canales, A. Menchaca, and I. Peredo, “Intelligent Web-based education system for adaptive learning,” Expert Syst. Appl., vol. 38, no. 12, pp. 14690–14702, 2011, doi: 10.1016/j.eswa.2011.05.013.

[15] B. Mostefai, T. Boutefara, N. Bousbia, A. Balla, S. Dhelim, and A. Hammia, “Enhancing user experience in e-learning systems: A new user-centric RESTful web services approach,” Computers in Human Behavior Reports, vol. 18, p. 100643, 2025, doi: 10.1016/j.chbr.2025.100643.

[16] A. R. Marsa and R. Yunita, “Website Media Pembelajaran Matematika Berbasis Moodle Platform,” JOISIE (Journal Of Information Systems And Informatics Engineering), vol. 3, no. 1, p. 1, 2019, doi: 10.35145/joisie.v3i1.332.

[17] K. M. Lyons, N. G. Lobczowski, J. A. Greene, J. Whitley, and J. E. McLaughlin, “Using a design-based research approach to develop and study a web-based tool to support collaborative learning,” Comput. Educ., vol. 161, p. 104064, 2021, doi: 10.1016/j.compedu.2020.104064.

[18] Trae Inc., “SOLO Builder Documentation,” 2026.

[19] Google LLC, “Antigravity Documentation and Product Materials,” 2026.

[20] U. Nisa, M. Shirazi, M. A. Saip, and M. S. M. Pozi, “Agentic AI: The age of reasoning—A review,” Journal of Automation and Intelligence, vol. 5, no. 1, pp. 69–89, 2026, doi: 10.1016/j.jai.2025.08.003.

[21] R. Sapkota, K. I. Roumeliotis, and M. Karkee, “AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges,” Information Fusion, vol. 126, p. 103599, 2026, doi: 10.1016/j.inffus.2025.103599.

[22] L. Wang et al., “A survey on large language model based autonomous agents,” Front. Comput. Sci., vol. 18, no. 6, 2024, doi: 10.1007/s11704-024-40231-1.

[23] B. Kitchenham et al., “Robust Statistical Methods for Empirical Software Engineering,” Empir. Softw. Eng., vol. 22, no. 2, pp. 579–630, 2017, doi: 10.1007/s10664-016-9437-5.

[24] A. Velasco, “Beyond Accuracy: Evaluating Source Code Capabilities in Large Language Models for Software Engineering,” in Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, in ICSE-Companion ’24. New York, NY, USA: Association for Computing Machinery, 2024, pp. 162–164. doi: 10.1145/3639478.3639815.

[25] P. Runeson and M. Höst, “Guidelines for conducting and reporting case study research in software engineering,” Empir. Softw. Eng., vol. 14, no. 2, pp. 131–164, 2008, doi: 10.1007/s10664-008-9102-8.

[26] R. H. Pratiwi, R. A. Fajri, and Hasbullah, “Effectiveness of Implementing Open Source-Based E-Learning Media ‘Mathematics Laboratory: SPLDV’ on Mathematical Problem Solving Skills of MTs Students in Tangerang Regency,” Jurnal Penelitian Pendidikan IPA, vol. 11, no. 3, pp. 484–493, 2025, doi: 10.29303/jppipa.v11i3.9901.

[27] W. Villegas-Ch, D. Buenano-Fernandez, A. M. Navarro, and A. Mera-Navarrete, “Adaptive intelligent tutoring systems for STEM education: analysis of the learning impact and effectiveness of personalized feedback,” Smart Learning Environments, vol. 12, no. 1, Jun. 2025, doi: 10.1186/s40561-025-00389-y.

[28] M. Rizqullah and E. Albassam, “Large Language Model Selection for Test-Driven Prompt Android iOS Development,” International Journal of Interactive Mobile Technologies (iJIM), vol. 20, no. 03, Feb. 2026, doi: 10.3991/ijim.v20i03.59861.

[29] C. Wohlin, “Case Study Research in Software Engineering—It is a Case, and it is a Study, but is it a Case Study?,” Inf. Softw. Technol., vol. 133, p. 106514, 2021, doi: https://doi.org/10.1016/j.infsof.2021.106514.

[30] R. Verdecchia, E. Engström, P. Lago, P. Runeson, and Q. Song, “Threats to validity in software engineering research: A critical reflection,” Inf. Softw. Technol., vol. 164, p. 107329, 2023, doi: https://doi.org/10.1016/j.infsof.2023.107329.

Downloads

Published

2026-06-12

How to Cite

[1]
E. Aditia and U. Elviani, “Comparative Evaluation of Agentic Workflow Capabilities in AI IDE Agents for Web-Based Learning Media Development”, JAIC, vol. 10, no. 3, pp. 2556–2567, Jun. 2026.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.