Translation to a Scarce-Resource Language through Iterations

Authors

  • Aasim Ali Multan University of Science and Technology, Multan, Pakistan.
  • Sania Umer COMSATS University Islamabad, Wah Cantt, Pakistan.
  • Ahsan Iqbal Multan University of Science and Technology, Multan, Pakistan.
  • Ashfaq Ahmad Lahore Garrison University, Lahore, Pakistan.

Keywords:

Urdu, Linguistic Improvement, Scarce-resource

Abstract

Machine translation for low-resource languages such as Urdu remains a significant challenge due to limited parallel corpora and the absence of linguistic annotation tools. This study presents an iterative statistical machine translation (SMT) approach that incrementally improves translation quality using only existing bilingual text. In each iteration, the translation output of the previous model is reused as the source side to retrain a new SMT system aligned with the original target sentences. The process continues until translation quality, measured by Bilingual Evaluation Understudy (BLEU) score, stabilizes. Experiments on an English–Urdu parallel corpus demonstrate that the proposed method achieves notable improvements over the baseline system without employing any morphological or syntactic pre-processing. These findings suggest that iterative retraining can partially capture implicit linguistic patterns from limited data, offering a viable path towards improving translation for scarce-resource languages.

Downloads

Published

2025-09-01

How to Cite

Aasim Ali, Umer, S. ., Iqbal, A., & Ashfaq Ahmad. (2025). Translation to a Scarce-Resource Language through Iterations. Journal of Computing & Biomedical Informatics, 9(02). Retrieved from https://www.jcbi.org/index.php/Main/article/view/1121