Translation to a Scarce-Resource Language through Iterations
Keywords:
Urdu, Linguistic Improvement, Scarce-resourceAbstract
Machine translation for low-resource languages such as Urdu remains a significant challenge due to limited parallel corpora and the absence of linguistic annotation tools. This study presents an iterative statistical machine translation (SMT) approach that incrementally improves translation quality using only existing bilingual text. In each iteration, the translation output of the previous model is reused as the source side to retrain a new SMT system aligned with the original target sentences. The process continues until translation quality, measured by Bilingual Evaluation Understudy (BLEU) score, stabilizes. Experiments on an English–Urdu parallel corpus demonstrate that the proposed method achieves notable improvements over the baseline system without employing any morphological or syntactic pre-processing. These findings suggest that iterative retraining can partially capture implicit linguistic patterns from limited data, offering a viable path towards improving translation for scarce-resource languages.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License
							


