Document Details
Document Type |
: |
Thesis |
Document Title |
: |
Pre-trained Transformer-Based Approach for Extractive and Abstractive Summarization of Arabic Text نهج قائم على المحولات المدربة مسبقاً للتلخيص الاستخراجي والتجريدي للنص العربي |
Subject |
: |
Faculty of Computing and Information Technology |
Document Language |
: |
Arabic |
Abstract |
: |
Automatic Text Summarization (ATS) is a prominent research topic in Natural Language Processing (NLP) due to the variety and proliferation of information sources on the Internet. In this research study, we explored the ATS systems for two different approaches: extractive summarization and abstractive summarization. The extractive summarization method relies on selecting the most important phrases and sentences from the main entry text to create a summary without reformatting these phrases and sentences. Abstractive summarization, on the other hand, summarizes the original text in entirely newterms and sentences. There is plenty of research published on summarizing English text using more advanced methodologies to achieve advanced results. However, due to the nature of the Arabic language and the need for more basic reference datasets, research in Arabic text summarization is moving more slowly. Several pre-trained language models have recently shown excellent performance on many NLP tasks. For this reason, this study aims to experiment with different pre-trained models for summarizing the Arabic text. We finetuned and compared the performances of the base AraBERT model, the QARiB model, and the AraELECTRA model. These models were trained using the KALIMAT and EASC Arabic datasets for Arabic extractive text summarization. Then the generated summaries were evaluated with the ROUGE evaluation package using the ROUGE- 1, ROUGE-2, and ROUGE-L scales. The best results were achieved using the AraBERT model, which obtained 0.44, 0.26, and 0.44 on the KALIMAT dataset. In addition, for Arabic abstractive text summarization, we used the Text-to-Text Transfer Transformer (T5 model), which yielded good results. We used a dataset of 267,000 Arabic articles to finetune AraT5, the newly launched Arabic version. The model was evaluated through ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores, and the results were 0.494, 0.339, 0.469, and 0.4224, respectively. We also used another dataset containing 300,000 articles and headlines and achieved the following evaluation scores 0.53, 0.3, 0.36, and 0.48. In addition, the AraT5 model was superior to the most recent research using the Sequence-to-Sequence (Seq2Seq) model. |
Supervisor |
: |
Dr. Amal Almansour |
Thesis Type |
: |
Master Thesis |
Publishing Year |
: |
1445 AH
2023 AD |
Added Date |
: |
Friday, November 10, 2023 |
|
Researchers
ياسمين عينيه | Einieh, Yasmin | Researcher | Master | |
|
Back To Researches Page
|