Performance Analysis of Deep Learning Model Quantization on NPU for Real-Time Automatic License Plate Recognition Implementation

Authors

  • Daniel Alexander Faculty of Computer Science, Informatics Engineering, Dian Nuswantoro University
  • Wildanil Ghozi Faculty of Computer Science, Informatics Engineering, Dian Nuswantoro University

DOI:

https://doi.org/10.30871/jaic.v9i4.9700

Keywords:

Deep Learning, Edge Computing, License Plate Recognition, Neural Processing Unit, Quantization

Abstract

Neural Processing Units (NPUs) are dedicated accelerators designed to perform efficient deep learning inference on edge devices with limited computational and power resources. In real-time applications such as automated parking systems, accurate and low-latency license plate recognition is critical. This study evaluates the effectiveness of quantization techniques, specifically Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), in improving the performance of YOLOv8-based license plate detection models deployed on an Intel NPU integrated within the Core Ultra 7 155H processor. Three model configurations are compared: a full-precision float32 model, a PTQ model, and a QAT model. All models are converted to OpenVINO’s Intermediate Representation (IR) and benchmarked using the benchmark_app tool. Results show that PTQ and QAT significantly enhance inference efficiency. QAT achieves up to 39.9% improvement in throughput and 28.6% reduction in latency compared to the non-quantized model, while maintaining higher detection accuracy. Both quantized models also reduce model size by nearly 50 percent. Although PTQ is simpler to implement, QAT offers a better balance between accuracy and speed, making it more suitable for deployment in edge scenarios with real-time constraints. These findings highlight QAT as an optimal strategy for efficient and accurate license plate recognition on NPU-based edge platforms.

Downloads

Download data is not yet available.

References

[1] D. Xu et al., “Edge Intelligence: Empowering Intelligence to the Edge of Network,” Proceedings of the IEEE, vol. 109, no. 11, pp. 1778–1837, Nov. 2021, doi: 10.1109/JPROC.2021.3119950.

[2] R. Jayanth, N. Gupta, and V. Prasanna, “Benchmarking Edge AI Platforms for High-Performance ML Inference,” Sep. 23, 2024, arXiv: arXiv:2409.14803. doi: 10.48550/arXiv.2409.14803.

[3] T. Tan and G. Cao, “FastVA: Deep Learning Video Analytics Through Edge Processing and NPU in Mobile,” in IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, Jul. 2020, pp. 1947–1956. doi: 10.1109/INFOCOM41043.2020.9155476.

[4] A. Fei and M. S. Abdelfattah, “NITRO: LLM Inference on Intel Laptop NPUs,” Dec. 15, 2024, arXiv: arXiv:2412.11053. doi: 10.48550/arXiv.2412.11053.

[5] R. Al-batat, A. Angelopoulou, S. Premkumar, J. Hemanth, and E. Kapetanios, “An End-to-End Automated License Plate Recognition System Using YOLO Based Vehicle and License Plate Detection with Vehicle Classification,” Sensors, vol. 22, no. 23, 2022, doi: 10.3390/s22239477.

[6] R. Laroca et al., “A Robust Real-Time Automatic License Plate Recognition Based on the YOLO Detector,” in 2018 International Joint Conference on Neural Networks (IJCNN), Jul. 2018, pp. 1–10. doi: 10.1109/IJCNN.2018.8489629.

[7] M. Hussain, “YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection,” Machines, vol. 11, no. 7, 2023, doi: 10.3390/machines11070677.

[8] M. Hussain, “YOLOv1 to v8: Unveiling Each Variant–A Comprehensive Review of YOLO,” IEEE Access, vol. 12, pp. 42816–42833, 2024, doi: 10.1109/ACCESS.2024.3378568.

[9] Z. Liu, Y. Wang, K. Han, S. Ma, and W. Gao, “Post-Training Quantization for Vision Transformer,” Jun. 27, 2021, arXiv: arXiv:2106.14156. doi: 10.48550/arXiv.2106.14156.

[10] M. Shen et al., “Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search,” Sep. 28, 2021, arXiv: arXiv:2010.04354. doi: 10.48550/arXiv.2010.04354.

[11] I. Hubara, Y. Nahshan, Y. Hanani, R. Banner, and D. Soudry, “Accurate Post Training Quantization With Small Calibration Sets,” in Proceedings of the 38th International Conference on Machine Learning, M. Meila and T. Zhang, Eds., in Proceedings of Machine Learning Research, vol. 139. PMLR, Jul. 2021, pp. 4466–4475. [Online]. Available: https://proceedings.mlr.press/v139/hubara21a.html

[12] Y. Mishchenko et al., “Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting,” in 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Dec. 2019, pp. 706–711. doi: 10.1109/ICMLA.2019.00127.

[13] S. A. Tailor, J. Fernandez-Marques, and N. D. Lane, “Degree-Quant: Quantization-Aware Training for Graph Neural Networks,” Mar. 15, 2021, arXiv: arXiv:2008.05000. doi: 10.48550/arXiv.2008.05000.

[14] M. Chen et al., “EfficientQAT: Efficient Quantization-Aware Training for Large Language Models,” Oct. 02, 2024, arXiv: arXiv:2407.11062. doi: 10.48550/arXiv.2407.11062.

[15] Z. Liu et al., “LLM-QAT: Data-Free Quantization Aware Training for Large Language Models,” May 29, 2023, arXiv: arXiv:2305.17888. doi: 10.48550/arXiv.2305.17888.

[16] A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, “A Survey of Quantization Methods for Efficient Neural Network Inference,” in Low-Power Computer Vision, Chapman and Hall/CRC, 2022.

[17] C.-H. Hsiao, H. Lee, Y.-T. Wang, and M.-J. Hsu, “Efficient License Plate Alignment and Recognition Using FPGA-Based Edge Computing,” Electronics, vol. 14, no. 12, Art. no. 12, Jan. 2025, doi: 10.3390/electronics14122475.

[18] R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper,” Jun. 21, 2018, arXiv: arXiv:1806.08342. doi: 10.48550/arXiv.1806.08342.

[19] A. Nurhopipah and U. Hasanah, “Dataset splitting techniques comparison for face classification on CCTV images,” IJCCS, vol. 14, no. 4, p. 341, Oct. 2020, doi: 10.22146/ijccs.58092.

[20] M. L. Ali and Z. Zhang, “The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection,” Computers, vol. 13, no. 12, 2024, doi: 10.3390/computers13120336.

[21] A. Kozlov, I. Lazarevich, V. Shamporov, N. Lyalyushkin, and Y. Gorbachev, “Neural Network Compression Framework for Fast Model Inference,” in Intelligent Computing, K. Arai, Ed., Cham: Springer International Publishing, 2021, pp. 213–232. doi: 10.1007/978-3-030-80129-8_17.

Downloads

Published

2025-08-03

How to Cite

[1]
D. Alexander and Wildanil Ghozi, “Performance Analysis of Deep Learning Model Quantization on NPU for Real-Time Automatic License Plate Recognition Implementation”, JAIC, vol. 9, no. 4, pp. 1227–1233, Aug. 2025.

Issue

Section

Articles

Similar Articles

<< < 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.