Comparison of Faster R-CNN and YOLO v12 on Passport Text Extraction Based on Optical Character Recognition

Authors

  • Masniari Samosir Universitas Pamulang, Indonesia
  • Sajarwo Anggai Universitas Pamulang, Indonesia
  • Taswanda Taryo Universitas Pamulang, Indonesia

DOI:

https://doi.org/10.37012/jtik.v12i1.3307

Abstract

Current developments in information technology are driving the need for digitalization of official identity documents, including passports, to improve service efficiency and reduce reliance on manual processes. The digitalization of official identity documents such as passports still faces efficiency and accuracy challenges due to manual data entry processes. This study aims to compare the performance of Faster R-CNN and YOLO v12 in an automatic text extraction system based on Optical Character Recognition (OCR). The research employed an experimental method with a comparative approach using 31 preprocessed passport images. YOLO v12 was integrated with EasyOCR, while Faster R-CNN was combined with a PyTorch-based OCR module. The evaluation metrics included mAP, Character Accuracy Rate (CAR), Word Error Rate (WER), F1-score, and inference time. The results indicate that YOLO v12 outperforms Faster R-CNN in object detection, achieving an mAP@50 of 95.0% and mAP@50–95 of 90.0%, compared to 93.0% and 89.0%, respectively. In terms of text extraction accuracy, Faster R-CNN achieved a CAR of 50.01% and an F1-score of 55.75%, slightly higher than YOLO v12 with a CAR of 47.72% and an F1-score of 53.84%. However, YOLO v12 produced a lower WER and faster inference time of 2.4202 seconds (0.45 FPS). The findings suggest that YOLO v12 excels in efficiency and detection performance, while Faster R-CNN performs better in specific text extraction accuracy.

Downloads

Published

2026-02-20

Citation Check