Comparison of Hadoop Mapreduce and Apache Spark in Big Data Processing with Hgrid247-DE

  • Firmania Dwi Utami Universitas Teknologi Digital Indonesia
  • Femi Dwi Astuti Universitas Teknologi Digital Indonesia
Keywords: Big Data, Apache Spark, Mapreduce, Hadoop

Abstract

In today’s rapidly evolving information technology landscape, managing and analyzing big data has become one of the most significant challenges. This paper explores the implementation of two major frameworks for big data processing: Hadoop MapReduce and Apache Spark. Both frameworks were tested in three scenarios sorting, summarizing, and grouping using HGrid247-DE as the primary tool for data processing. A diverse set of datasets sourced from Kaggle, ranging in size from 3 MB to 260 MB, was employed to evaluate the performance of each framework. The findings reveal that Apache Spark generally outperforms Hadoop MapReduce in terms of processing speed due to its in-memory data handling capabilities. However, Hadoop MapReduce proved to be more efficient in specific scenarios, particularly when dealing with smaller tasks or when memory resources are limited. This is largely because Apache Spark can experience overhead when initializing tasks for smaller jobs. Furthermore, Hadoop MapReduce's reliance on disk I/O makes it more suitable for tasks involving vast amounts of data that surpass available memory. In contrast, Spark excels in situations where quick iterative processing and real-time data analysis are essential. This study provides valuable insights into the strengths and limitations of each framework, offering guidance for practitioners and researchers when selecting the appropriate tool for specific big data processing requirements, particularly with respect to speed, memory usage, and task complexity.

Downloads

Download data is not yet available.

References

S. M. Metev & V. P. Veiko, Laser Assisted Microtechnology, 2nd ed., R. M. Osgood, Jr., Ed. Berlin, Germany: Springer-Verlag, 1998.

E. Ramadhan, "Analisis Perbandingan Performa Apache Spark dan Hadoop Mapreduce pada Mapreduce Framework Menggunakan Algoritma Support Vector Machine," Sarjana thesis, Universitas Siliwangi, 2023.

Mayer-Schönberger and K. Cukier, Data: A Revolution That Will Transform How We Live, Work, and Think. New York: Eamon Dolan/Houghton Mifflin Harcourt, 2013.

A. Wibowo, Teori Ekonomi Berbasis Data. Semarang, Indonesia: Universitas Sains & Teknologi Komputer (Universitas STEKOM), 2023.

P. A. T. Taqwin, A. B. Osmond, and R. Latuconsina, "Implementasi Metode Mapreduce Pada Data Berbasis Hadoop Distributed File System," Program Studi S1 Sistem Komputer, Fakultas Teknik Elektro, Universitas Telkom, 2023.

F. S. Muhammad, "Analisis Implementasi Sistem Informasi Manajemen di Sekolah Menengah Pertama 1 Batam," *Jurnal Infotek*, vol. 6, no. 1, pp. 39-48, 2023.

R. A. Rahman, A. K. Widiastuti, dan M. A. N. E. Syafri, "Pengaruh Penerapan Metode Pembelajaran Kooperatif Tipe STAD terhadap Hasil Belajar Siswa pada Materi Kesehatan," *Jurnal Teknik Industri*, vol. 20, no. 2, pp. 205-214, 2022.

A. R. D. L. Raj, M. C. K. L. K. K. and N. R. B. K., "Introduction to Hadoop for Data," International Journal of Computer Applications, vol. 179, no. 45, pp. 1-6, 2018.

R. J. K. Dwianto and D. P. Sari, "Implementasi Apache Spark pada Data Berbasis Hadoop Distributed File System," *Jurnal Teknik Informatika*, vol. 10, no. 2, pp. 100-107, 2022.

P. A. T. Taqwin, A. B. Osmond, dan R. Latuconsina, "Implementasi Metode Mapreduce pada Data Berbasis Hadoop Distributed File System," *Jurnal Teknik Informatika*, vol. 20, no. 2, pp. 1-10, 2023.

H. B. Y. Manik, "HGRID247 Data Engineering," Data Learns 247, 2024

O. Maakoul, S. Azzouzi and M. E. H. Charaf, "An Optimal Method for Testing Jobs' Execution in MapReduce Based Systems," 2023 9th International Conference on Control, Decision and Information Technologies (CoDIT), Rome, Italy, 2023, pp. 2433-2438, doi: 10.1109/CoDIT58514.2023.10284294.

N. Nelmiawati, N. C. Kushardianto, A. H. Tohari, Y. P. Hasibuan, and D. E. Kurniawan, “Rancang Bangun Lab Komputer Virtual Berbasis Cloud Computing Menggunakan Openstack Pada Jaringan Terpusat,” Journal of Applied Informatics and Computing, vol. 2, no. 1, Art. no. 1, Jul. 2018, doi: 10.30871/jaic.v2i1.821.

D. E. Kurniawan, I. Ahmad, M. R. Ridho, F. Hidayat, and A. A. Js, “Analysis of performance comparison between Software-Based iSCSI SAN and Hardware-Based iSCSI SAN,” J. Phys.: Conf. Ser., vol. 1351, no. 1, p. 012009, Nov. 2019, doi: 10.1088/1742-6596/1351/1/012009.

A. Singh et al., "A Comparative Study of Bigdata Tools: Hadoop Vs Spark Vs Storm," 2023 IEEE 4th KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine, 2023, pp. 1-5, doi: 10.1109/KhPIWeek61412.2023.10311577.

P. Sewal and H. Singh, "A Critical Analysis of Apache Hadoop and Spark for Big Data Processing," 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 2021, pp. 308-313, doi: 10.1109/ISPCC53510.2021.9609518.

Published
2024-11-12
How to Cite
[1]
F. Utami and F. D. Astuti, “Comparison of Hadoop Mapreduce and Apache Spark in Big Data Processing with Hgrid247-DE”, JAIC, vol. 8, no. 2, pp. 390-399, Nov. 2024.
Section
Articles