Titre : | Arabic word segmentation |
Auteurs : | Abdelaziz Dinar, Auteur ; Abdelakader Fekir, Directeur de thèse |
Type de document : | texte manuscrit |
Editeur : | Université mustapha stambouli de Mascara:Faculté des sciences exactes, 2021 |
ISBN/ISSN/EAN : | SE01238T |
Format : | 79 / couv. ill. en coul / 29 cm. |
Accompagnement : | disque optique numérique (CD-ROM) |
Langues: | Anglais |
Résumé : |
Throughout this thesis, we have addressed the issue of segmentation of Arabic words which is considered as a POS tagging problem, followed by NLP techniques for divide word into units which are morphemes (clitics, affixes, stem). To implement the segmentation system, we proposed a statistical approach, which is a Bi-LSTM RNN coupled with the CRF model, during the segmentation process, based on the detection the limits of each morpheme in each word. We consider that we have succeeded in achieving a large part of the objectives of this project « despite the time criterion », and that we have made good choices regarding the implementation tools, therefore our work will constitute a very good lead for other future projects. For future work, we want to perform domain adaptation using large MSA data, such as ATB, to improve segmentation results. Further, we plan to investigate building a new model capable of segmenting the Arabic word with rewriting without use any dictionary or lexicon with minimal loss in accuracy. |
Exemplaires (2)
Code-barres | Cote | Support | Localisation | Section | Disponibilité |
---|---|---|---|---|---|
SE01238T | INF410 | Livre audio | Bibliothèque des Sciences Exactes | 7-Mémoires Master | Libre accès Disponible |
SE02112T | INF410 | Livre audio | Bibliothèque des Sciences Exactes | 7-Mémoires Master | Libre accès Disponible |
Aucun avis, veuillez vous identifier pour ajouter le vôtre !
Accueil