Handling Gender Bias in Neural Machine Translation: A Focus on English-Khasi Language Pair

Authors

  • Aiusha Vellintihun Hujon North Eastern Hill University, India
  • Thoudam Doren Singh National Institute of Technology Meghalaya, India
  • Khwairakpam Amitab North Eastern Hill University, India

DOI:

https://doi.org/10.22232/stj.2025.13.01.17

Keywords:

Gender bias information, Khasi, Data augmentation, Transfer learning, Neural machine translation

Abstract

Machine Translation systems have progressed tremendously over the years and have enhanced cross-lingual communication. Although machine translation has advanced from a rule-based to a neural machine translation method, a new issue occurring in most machine translation systems, such as gender bias, cannot be ignored. Gender bias not only affects translation accuracy but also has an impact on societal prejudices and often promotes inequalities and stereotypes. This study attempts to address the challenges of gender bias in neural machine translation for English-Khasi, a low- resource language pair, where data scarcity increases the risk of biased translations. To the best of our knowledge, this is the first attempt to report a study on gender bias in the neural machine translation task of English-Khasi language pair. We use two different methods; a data augmentation technique and a transfer learning method tailored to the linguistic and socio-cultural characteristics of the target language. To implement the two methods, we manually build a sizeable gender-balanced English-Khasi parallel corpora to handle gender bias in English-Khasi neural machine translation systems. Through empirical evaluation of low-resource language pairs, English-Khasi, we demonstrate the effectiveness of the transfer learning approach in reducing gender bias while maintaining translation quality.

Author Biographies

Aiusha Vellintihun Hujon, North Eastern Hill University, India

Department of Information Technology

Thoudam Doren Singh, National Institute of Technology Meghalaya, India

Department of Computer Science and Engineering

Khwairakpam Amitab, North Eastern Hill University, India

Department of Information Technology

References

Cristina Espana-Bonet, Adam Csaba Varga, Alberto Barron-Cedeno, and Josef van Genabith. An empirical analysis of nmt-derived interlingual embeddings and their use in parallel sentence identification. IEEE Journal of Selected Topics in Signal Processing, 11 (8):1340-1350, December 2017. ISSN 1941- 0484. doi: 10.1109/jstsp.2017.2764273. URL http://dx.doi org/10.1109/JSTSP.2017.2764273.

Rico Sennrich and Barry Haddow. Linguistic input features improve neural machine translation, 2016.

Joel Escud ́e Font and Marta R. Costa-juss`a. Equalizing gender bias in neural machine translation with word embeddings techniques. In Marta R. Costa-juss`a, Christian Hardmeier, Will Radford, and Kellie Webster, editors, Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 147-154, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/ v1/W19-3821. URL https //aclanthology.org/W19-3821.

Aiusha Vellintihun Hujon, Thoudam Doren Singh, and Khwairakpam Amitab. Neural machine translation systems for english to khasi: A case study of an austroasiatic language. Expert Systems with Applications, 38:121813, 2024. ISSN 0957- 4174. Doi: https://doi.org/10.1016/j.eswa.2023.121813. https://www.sciencedirect.com/science/article/pii/URLS0957417423023151.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference resolution: Evaluation and debiasing methods. In Marilyn Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15-20, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/ v1/N18-2003. URL https://aclanthology.org/N18-2003.

Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.

Thoudam Doren Singh and Aiusha Vellintihun Hujon. Low resource and domain specific english to khasi smt and nmt systems. In 2020 International Conference on Computational Performance Evaluation (ComPE), pages 733–737. IEEE, 2020.

Aiusha Vellintihun Hujon, Khwairakpam Amitab, and Thoudam Doren Singh. Convolutional sequence to sequence learning for english-khasi neural machine translation. In 2023 4th International Conference on Computing and Communication Systems (13CS), pages 1-4, 2023a. Doi: 10.1109/I3CS58314. 2023.10127426.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks, 2014..

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2016. URL https://arxiv.org/abs/1409.0473.

Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1568-1575, Austin, Texas, Nov 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1163. URL https://aclanthology.org/ D16-1163.

Tom Kocmi and Ondrej Bojar. Trivial transfer learning for low- resource neural machine translation. CORR, abs/1809.00357, 2018. URL http://arxiv.org/abs/1809.00357.

Eva Vanmassenhove, Christian Hardmeier, and Andy Way. Getting gender right in neural machine translation. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun'ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3003-3008, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1334. URL https://aclanthology.org/D18-1334.

Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. CORR, abs/1607.06520, 2016a. URL http://arxiv.org/ abs/1607.06520.

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings, 2016b.

Odbal, Guanhong Zhang, and Sophia Ananiadou. Examining and mitigatinggender bias in text emotion detection task. Neurocomputing, 493:422-434, 2022. ISSN 0925-2312. Https://doi.org/10.1016/j.neucom.2022.04.057. doi: URL https://www.sciencedirect.com/science/ article/pii/S0925231222004374.

Marta R. Costa-juss`a, Carlos Escolano, Christine Basta, Javier Ferrando, Roser Batlle, and Ksenia Kharitonova. Interpreting gender bias in neural machine translation: Multilingual architecture matters. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11): 11855-11863, Jun. 2022. Doi: 10.1609/aaai.v36i11.21442. URL https://ojs.aaai.org/index. php/AAAI/article/view/21442.

Marcelo O. R. Prates, Pedro H. Avelar, and Lu ́ıs C. Lamb. Assessing gender bias in machine translation: a case study with google translate. Neural Comput. Appl., 32(10):6363-6381, may 2020. ISSN 0941-0643. doi:10.1007/s00521-019-04144-6. URL https://doi.org/10.1007/s00521-019-04144-6.

Guillaume Wisniewski, Lichao Zhu, Nicolas Ballier, and Francois Yvon. Analyzing gender translation errors to identify information flows between the encoder and decoder of a NMT system. In Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe, editors, Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 153-163, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. doi: 10.18653/v1/ 2022.blackboxnlp-1.13. URL https://aclanthology.org/2022.blackboxnlp-1.13.

Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, and Marco Turchi. Findings of the 2015 workshop on statistical machine translation. In Ondrej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina, editors, Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 1-46, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi:10.18653/v1/W15-3001.URL https://aclanthology.org/W15-3001.

Altaf Rahman and Vincent Ng. Resolving complex cases of definite pronouns: The Winograd schema challenge. In Jun'ichi Tsujii, James Henderson, and Marius Pa,sca, editors, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 777-789, Jeju Island, Korea, July 2012. Association for Computational Linguistics. URL https://aclanthology.org/D12-1071.

Toan Q. Nguyen and David Chiang. Transfer learning across low- resource, related languages for neural machine translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 296-301, Taipei, Taiwan, nov 2017. Asian Federation of Natural Language Processing. URL https://aclanthology. org/117-2050..

Aiusha V Hujon, Thoudam Doren Singh, and Khwairakpam Amitab Transfer learning based neural machine translation of english-khasi on low-resource settings. Procedia Computer Science, 218:1-8, 2023b. ISSN 1877-0509. doi: https:// doi.org/10.1016/j.procs.2022.12.396. URL https://www.sciencedirect.com/science/article/pii/S1877050922024899. International Conference on Machine Learning and Data Engineering.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.

Life.Church/YouVersion. Gnb bible youversion, 2021a. URL https:// www.bible.com/en-GB/ bible/296/GEN.1.GNB. Accessed: March 2021.

Life.Church/YouVersion. Khasiclbsi bible youversion, 2021b. URL https://www.bible.com/en-GB/bible/1865/EXO.1.KHASICLBSI. Accessed: March 2021.

Aiusha Vellintihun Hujon and Thoudam Doren Singh. Existing english to khasi translated documents for parallel corpora development: A survey. International Journal on Natural Language Computing (IJNLC), 7(5):81-91, 2018.

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondˇrej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177-180, Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL https://aclanthology.org/P07-2045.

Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: Long papers) 1715-1725 association for computational linguistics https://www. aclweb. org/ anthology. P16-1162, 2016.

Eirini Chatzikoumi. How to evaluate machine translation: A review of automated and human metrics. Natural Language Engineering, 26(2):137-161, 2020. doi: 10.1017/S1351324919000469.

Surangika Ranathunga, En-Shiun Annie Lee, Marjana Prifti Skenduli, Ravi Shekhar, Mehreen Alam, and Rishemjit Kaur. Neural machine translation for low-resource languages: A survey. ACM Comput. Surv., 55(11), February 2023. ISSN 0360-0300. doi:10.1145/3567592. URL https://doi.org/10.1145/3567592. Matt Post. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186-191, Belgium, Brussels, October 2018. Association for Computational Linguistics. URL https:// www.aclweb.org/ anthology/W18-6319.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311-318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://aclanthology.org/P02-1040.

Philipp Koehn. Evaluation, page 217-246. Cambridge University Press, 2009. doi: 10.1017/CBO9780511815829.009.

Downloads

Published

2025-09-29

How to Cite

Aiusha Vellintihun Hujon, Thoudam Doren Singh, & Khwairakpam Amitab. (2025). Handling Gender Bias in Neural Machine Translation: A Focus on English-Khasi Language Pair . Science & Technology Journal, 13(1). https://doi.org/10.22232/stj.2025.13.01.17

Issue

Section

Research Articles

Categories