Improving Helpdesk Chatbot Performance with Term Frequency-Inverse Document Frequency (TF-IDF) and Cosine Similarity Models
Abstract
Helpdesk chatbots are growing in popularity due to their ability to provide help and answers to user questions quickly and effectively. Chatbot development poses several challenges, including enhancing accuracy in understanding user queries and providing relevant responses while improving problem-solving efficiency. In this research, we aim to enhance the accuracy and efficiency of the Helpdesk Chatbot by implementing the Term Frequency-Inverse Document Frequency (TF-IDF) model and the Cosine Similarity algorithm. The TF-IDF model is a method used to measure the frequency of words in a document and their occurrence in the entire document collection, while the Cosine Similarity algorithm is used to measure the similarity between two documents. After implementing and testing TF-IDF and Cosine Similarity models in the Helpdesk Chatbot, we achieved a 75% question recognition rate. To increase accuracy and precision, it is necessary to increase the knowledge dataset and improve pre-processing, especially in recognition and correct inaccurate spelling
Downloads
References
A. L. Chiru, I. A. Awada, and A. M. Florea, “A Support Process of Telemedicine Applications that Integrates a Chatbot,” in 2021 International Conference on e-Health and Bioengineering (EHB), 2021, pp. 1–4. doi: 10.1109/EHB52898.2021.9657553.
R. Shah, S. Lahoti, and K. Lavanya, “An intelligent chat-bot using natural language processing,” International Journal of Engineering Research, vol. 6, no. 5, p. 281, 2017, doi: 10.5958/2319-6890.2017.00019.8.
S. K. Maher, S. G. Bhable, A. R. Lahase, and S. S. Nimbhore, “AI and Deep Learning-driven Chatbots: A Comprehensive Analysis and Application Trends,” in 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS), 2022, pp. 994–998. doi: 10.1109/ICICCS53718.2022.9788276.
J. J. Sophia and T. P. Jacob, “EDUBOT-A Chatbot For Education in Covid-19 Pandemic and VQAbot Comparison,” in 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), 2021, pp. 1707–1714. doi: 10.1109/ICESC51422.2021.9532611.
P. D. Larasati, A. Irawan, S. Anwar, M. F. Mulya, M. A. Dewi, and I. Nurfatima, “Chatbot helpdesk design for digital customer service,” Applied Engineering and Technology, vol. 1, no. 3, pp. 138–145, 2022, doi: 10.3176/aet.v1i1.684.
D. C. Ukpabi, B. Aslam, and H. Karjaluoto, “Chatbot adoption in tourism services: A conceptual exploration,” in Robots, Artificial Intelligence and Service Automation in Travel, Tourism and Hospitality, Emerald Group Publishing Ltd., 2019, pp. 105–121. doi: 10.1108/978-1-78756-687-320191006.
A. Ali and M. Zain Amin, “Conversational AI Chatbot Based on Encoder-Decoder Architectures with Attention Mechanism,” Artificial Intelligence Festival, vol. 2, no. 0, 2019, doi: 10.13140/RG.2.2.12710.27204.
S. Defit and G. Widi Nurcahyo, “Product Codefication Accuracy With Cosine Similarity And Weighted Term Frequency And Inverse Document FREQUENCY (TF-IDF),” 2021.
M. Chiny, M. Chihab, O. Bencharef, and Y. Chihab, “Netflix Recommendation System based on TF-IDF and Cosine Similarity Algorithms,” Scitepress, May 2022, pp. 15–20. doi: 10.5220/0010727500003101.
P. Y. Ristanti, A. P. Wibawa, and U. Pujianto, “Cosine Similarity for Title and Abstract of Economic Journal Classification,” in Proceeding - 2019 5th International Conference on Science in Information Technology: Embracing Industry 4.0: Towards Innovation in Cyber Physical System, ICSITech 2019, Institute of Electrical and Electronics Engineers Inc., Oct. 2019, pp. 123–127. doi: 10.1109/ICSITech46713.2019.8987547.
G. Herdian Setiawan and I. Made Budi Adnyana, “Information Retrieval Pada Frequently Asked Questions (FAQ) dengan metode String Similarity Information Retrieval on Frequently Asked Questions (FAQ) using String Similarity method,” 2022.
R. T. Wahyuni, D. Prastiyanto, and D. E. Supraptono, “Penerapan Algoritma Cosine Similarity dan Pembobotan TF-IDF pada Sistem Klasifikasi Dokumen Skripsi.”
S. Ayanouz, B. A. Abdelhakim, and M. Benhmed, “A Smart Chatbot Architecture based NLP and Machine Learning for Health Care Assistance,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Mar. 2020. doi: 10.1145/3386723.3387897.
Q. Xu, “Research on Text Classification Method based on PTF-IDF and Cosine Similarity,” Journal of Information and Communication Engineering, vol. 6, no. 1, pp. 335–339, 2020, [Online]. Available: https://www.kaggle.com/shineucc/bbc-newsdataset
Copyright (c) 2023 Gede Herdian Setiawan, I Made Budi Adnyana
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).