Research Article | | Peer-Reviewed

Joint Entity and Relation Extraction Using Machine Reading Comprehension for Urdu

Received: 4 July 2024     Accepted: 3 September 2024     Published: 26 September 2024
Views:       Downloads:
Abstract

Joint Entity and Relation Extraction (JERE) plays an important role in natural language processing (NLP) by identifying names, locations, and the relationships among them from unstructured text. Despite extensive research in languages like English, JERE poses significant challenges in low-resource languages, particularly Urdu, due to limited annotated da-ta and inherent linguistic complexities. In this paper, we propose a novel Machine Reading Comprehension (MRC)-based approach that effectively addresses the JERE task for Urdu, integrating a text encoder and a question-answering module that work synergistically to enhance entity and relationship extraction. We introduce an annotated Urdu JERE dataset and demonstrate how our methodology will significantly contribute to multilingual NLP efforts. We propose an innovative Machine Reading Comprehension (MRC)-based method to tackle JERE in Urdu. This method has two main components: a text encoder and a question answering (QA) module. The text encoder converts Urdu text into a compact vector form, which is then fed into the QA module. The QA module generates answers to queries regarding the desired entities and relationships, producing a sequence of tokens that represent these entities and their interactions. The model is trained to minimize the difference between its predicted answers and the correct ones. Our approach, along with the introduction of an annotated Urdu JERE dataset, significantly advances multilingual NLP and information ex-traction research. The insights gained can be applied to other low-resource languages, aiding in the development of NLP tools and applications for a broader array of languages.

Published in American Journal of Computer Science and Technology (Volume 7, Issue 3)
DOI 10.11648/j.ajcst.20240703.15
Page(s) 104-114
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Entity Recognition, Joint Entity and Relation Extraction, Machine Reading Comprehension, Natural Language Processing, Urdu Language

References
[1] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN (2019).
[2] Chen, Q., Zhu, J., & Bansal, M.: Machine Reading Comprehension: A Review. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3347–3362. Association for Computational Linguistics, online (2020).
[3] Yang, Z., Yang, D., Dyer, C., He, X., & Gao, J.: Hierarchical Attention Networks for Document Classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489. Association for Computational Linguistics, San Diego, CA (2016).
[4] Afzal, M. T., & Nisar, A.: Part-of-Speech Tagging for Urdu Language Using Machine Learning Techniques. In: Proceedings of the International Conference on Computer Science and Information Technology, pp. 99–104. IEEE, Lahore, Pakistan (2019).
[5] Khan, M. R., Taqi, M., & Zubair, M.: Dependency Parsing for Urdu Using Rule-based and Data-driven Approaches. In: Proceedings of the 2018 International Conference on Asian Language Processing, pp. 213–218. IEEE, Kuala Lumpur, Malaysia (2018).
[6] Malik, M. K., & Sarwar, S. M. (2015) "Urdu Named Entity Recognition and Classification System Using Conditional Random Field” Sci-int. 27(5), pp (4473-4477).
[7] Ahmad, M., & Iqbal, A.: Development of a Named Entity Recognition System for Urdu Text Using HMM. In: Proceedings of the 2017 International Conference on Computer and Communication Engineering, pp. 207–211. IEEE, Kuala Lumpur, Malaysia (2017).
[8] Junaid, M. M., & Mahmud, M.: A Rule-based Approach for Entity Extraction from Urdu Text. In: Proceedings of the 8th International Conference on Emerging Technologies, pp. 124–129. IEEE, Islamabad, Pakistan (2018).
[9] Conneau, A., Kamath, A., & Lample, G.: XLM-R: A Strong Multilingual Language Representation Model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1–11. Association for Computational Linguistics, online (2020).
[10] Wu, S., & Hu, J.: A survey on named entity recognition. Journal of Computer Science and Technology 32(4), 1–17 (2017).
[11] Li, Y., Lian, D., & Ji, H.: Joint extraction of entities and relations based on a novel deep learning framework. In: Liu, Y., & Zhang, X. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 162–173. Association for Computational Linguistics, Brussels (2018).
[12] Manning, C. D., Subdean, M., Bauer, J., et al.: The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. Association for Computational Linguistics, Baltimore (2014).
[13] Zhou, Z., & Xu, W.: A Comparative Study of Feature Selection for Entity Recognition. In: Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM), pp. 745–750. IEEE, Barcelona (2016).
[14] Yang, Y., & Poon, H.: A Joint Model for Entity and Relation Extraction. In: Yang, W., et al. (eds.) Proceedings of the 27th International Conference on Computational Linguistics (COLING), pp. 2768–2777. Association for Computational Linguistics, Santa Fe (2018).
[15] Chen, B., & Wang, H.: Entity recognition with external knowledge. Journal of Artificial Intelligence Research 65, 241–291 (2019).
[16] Yang, Y., Qi, Y., & Wang, G.: Relation Extraction via Multi-Task Learning with Attention Mechanism. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4015–4025. Association for Computational Linguistics, Hong Kong (2019).
[17] Liu, H., Li, F., & Xu, Y.: Jointly Learning to Extract Entities and Relations from Documents. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4145–4155. Association for Computational Linguistics, online (2020).
[18] Bhatia, S., & Gupta, M.: Named Entity Recognition in Urdu Using Rule-Based and Feature-Based Approaches. In: Proceedings of the 12th International Conference on Language Resources and Evaluation, pp. 2499–2505. European Language Resources Association, Marseille (2020).
[19] Bukhari, S., & Ahmad, W.: Multi-Class Named Entity Recognition on Low-Resource Multilingual Text. Journal of Information Science 47(3), 546–558 (2021).
[20] Rani, A., & Kumar, A.: A Comprehensive Review on Named Entity Recognition for Urdu Text. Arabian Journal for Science and Engineering 46, 5947–5960 (2021).
[21] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019).
[22] Verga, P., & Dyer, C.: Neural Relation Extraction with Multi-Task Learning. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1867–1877. Association for Computational Linguistics, Copenhagen (2017).
[23] Zhang, Y., & Wang, Y.: Neural Relation Extraction from Text. Journal of Computer Science and Technology 34(4), 223–242 (2019).
[24] Schumann, H., & Stankovic, M.: Configurable Named Entity Recognition for Low-Resource Languages. In: Proceedings of the 2019 European Conference on Information Retrieval, pp. 18–32. ACM, Tübingen (2019).
[25] Shareef, A., & Razak, R.: Joint Entity and Relation Extraction with Attention Mechanisms. In: Proceedings of the 2021 International Conference on Information Technology, pp. 1–6. IEEE, Sydney (2021).
[26] Zhang, H., Yang, M., & Zhao, C.: A Joint Learning Approach for Named Entity and Relation Extraction. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2652–2661. Association for Computational Linguistics, online (2021).
[27] Choi, J., & Lee, S.: Tree-structured BiLSTM for Joint Extraction of Entities and Relations. In: Proceedings of the 2019 Joint Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing, pp. 1026–1036. Association for Computational Linguistics, Hong Kong (2019).
[28] Zhang, W., & Zhao, X.: An Attention-Based Model for Joint Entity and Relation Extraction. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1–2. Association for Computational Linguistics, online (2020).
[29] Sen, A., & Bansal, R.: A Seq2Seq Model for Entity-Relation Triples Generation. In: Proceedings of the 2021 International Joint Conference on Natural Language Processing, pp. 378–386. Association for Computational Linguistics, online (2021).
[30] Wang, S., & Lu, Z.: Evaluating Joint Models for Entity and Relation Extraction on Predefined vs. Solely Detected Entities. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1245–1255. Association for Computational Linguistics, Seattle (2022).
[31] Zhang, D., & Zhao, H.: Graph Convolutional Networks for Joint Entity and Relation Extraction. In: Proceedings of the 2020 International Conference on Learning Representations (ICLR). OpenReview.net, Addis Ababa (2020).
[32] Li, J., & Liu, Y.: A Survey of Graph Neural Networks in Entity and Relation Extraction. IEEE Transactions on Knowledge and Data Engineering (TKDE) (2021).
[33] Yang, Y., Li, H., & Huang, Y.: MRC-Based Joint Entity and Relation Extraction from Text. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1–10. Association for Computational Linguistics, Abu Dhabi (2022).
[34] Banko, M., Popat, K., et al.: The Role of Named Entity Recognition in Information Extraction Tasks. AI & Society 34(4), 563–578 (2019).
[35] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186.
[36] Xu, B., & Xu, J.: Applying Graph Convolutional Networks into Relation Extraction Tasks. In: Proceedings of the 2021 International Conference on Spoken Language Processing (ICSLP), pp. 3157–3161. IEEE, Brno (2021).
[37] Miwa, M., & Bansal, M. (2016). End-to-end relation extraction using LSTMs on sequences and tree structures. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1105–1116.
Cite This Article
  • APA Style

    Riasat, M. (2024). Joint Entity and Relation Extraction Using Machine Reading Comprehension for Urdu. American Journal of Computer Science and Technology, 7(3), 104-114. https://doi.org/10.11648/j.ajcst.20240703.15

    Copy | Download

    ACS Style

    Riasat, M. Joint Entity and Relation Extraction Using Machine Reading Comprehension for Urdu. Am. J. Comput. Sci. Technol. 2024, 7(3), 104-114. doi: 10.11648/j.ajcst.20240703.15

    Copy | Download

    AMA Style

    Riasat M. Joint Entity and Relation Extraction Using Machine Reading Comprehension for Urdu. Am J Comput Sci Technol. 2024;7(3):104-114. doi: 10.11648/j.ajcst.20240703.15

    Copy | Download

  • @article{10.11648/j.ajcst.20240703.15,
      author = {Maria Riasat},
      title = {Joint Entity and Relation Extraction Using Machine Reading Comprehension for Urdu
    },
      journal = {American Journal of Computer Science and Technology},
      volume = {7},
      number = {3},
      pages = {104-114},
      doi = {10.11648/j.ajcst.20240703.15},
      url = {https://doi.org/10.11648/j.ajcst.20240703.15},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20240703.15},
      abstract = {Joint Entity and Relation Extraction (JERE) plays an important role in natural language processing (NLP) by identifying names, locations, and the relationships among them from unstructured text. Despite extensive research in languages like English, JERE poses significant challenges in low-resource languages, particularly Urdu, due to limited annotated da-ta and inherent linguistic complexities. In this paper, we propose a novel Machine Reading Comprehension (MRC)-based approach that effectively addresses the JERE task for Urdu, integrating a text encoder and a question-answering module that work synergistically to enhance entity and relationship extraction. We introduce an annotated Urdu JERE dataset and demonstrate how our methodology will significantly contribute to multilingual NLP efforts. We propose an innovative Machine Reading Comprehension (MRC)-based method to tackle JERE in Urdu. This method has two main components: a text encoder and a question answering (QA) module. The text encoder converts Urdu text into a compact vector form, which is then fed into the QA module. The QA module generates answers to queries regarding the desired entities and relationships, producing a sequence of tokens that represent these entities and their interactions. The model is trained to minimize the difference between its predicted answers and the correct ones. Our approach, along with the introduction of an annotated Urdu JERE dataset, significantly advances multilingual NLP and information ex-traction research. The insights gained can be applied to other low-resource languages, aiding in the development of NLP tools and applications for a broader array of languages. 
    },
     year = {2024}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Joint Entity and Relation Extraction Using Machine Reading Comprehension for Urdu
    
    AU  - Maria Riasat
    Y1  - 2024/09/26
    PY  - 2024
    N1  - https://doi.org/10.11648/j.ajcst.20240703.15
    DO  - 10.11648/j.ajcst.20240703.15
    T2  - American Journal of Computer Science and Technology
    JF  - American Journal of Computer Science and Technology
    JO  - American Journal of Computer Science and Technology
    SP  - 104
    EP  - 114
    PB  - Science Publishing Group
    SN  - 2640-012X
    UR  - https://doi.org/10.11648/j.ajcst.20240703.15
    AB  - Joint Entity and Relation Extraction (JERE) plays an important role in natural language processing (NLP) by identifying names, locations, and the relationships among them from unstructured text. Despite extensive research in languages like English, JERE poses significant challenges in low-resource languages, particularly Urdu, due to limited annotated da-ta and inherent linguistic complexities. In this paper, we propose a novel Machine Reading Comprehension (MRC)-based approach that effectively addresses the JERE task for Urdu, integrating a text encoder and a question-answering module that work synergistically to enhance entity and relationship extraction. We introduce an annotated Urdu JERE dataset and demonstrate how our methodology will significantly contribute to multilingual NLP efforts. We propose an innovative Machine Reading Comprehension (MRC)-based method to tackle JERE in Urdu. This method has two main components: a text encoder and a question answering (QA) module. The text encoder converts Urdu text into a compact vector form, which is then fed into the QA module. The QA module generates answers to queries regarding the desired entities and relationships, producing a sequence of tokens that represent these entities and their interactions. The model is trained to minimize the difference between its predicted answers and the correct ones. Our approach, along with the introduction of an annotated Urdu JERE dataset, significantly advances multilingual NLP and information ex-traction research. The insights gained can be applied to other low-resource languages, aiding in the development of NLP tools and applications for a broader array of languages. 
    
    VL  - 7
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Sections