Titre : | Developing a machine learning part of speech tagger for Arabic language |
Auteurs : | Yamina Bouchra Keddar, Auteur ; Mohamed Salem, Directeur de thèse |
Type de document : | texte manuscrit |
Editeur : | Université mustapha stambouli de Mascara:Faculté des sciences exactes, 2021 |
ISBN/ISSN/EAN : | SE01232T |
Format : | 86P. / couv. ill. en coul / 29 cm. |
Accompagnement : | disque optique numérique (CD-ROM) |
Langues: | Français |
Résumé : |
The graduation dissertation that we worked on interested one of the most important tasks of the natural language processing which is the Part of speech tagging for the Arabic language, the correct tags the tagger achieves the better results we get when it comes to automating the Arabic language, POS tagging in Arabic language consists in finding the grammatical nature of a word in a sentence. This task is not easy to achieve, it depends on the word itself and the words in the neighborhood. Another difficulty arose at the beginning of this work is that of the Arabic language, indeed, Arabic is a very rich grammatically language, it has more than 150 labels, and, in addition to that, the words do not have a fixed location in the sentence and it is difficult to split a sentence into tokens as it is the case in European languages. We have used neural networks to solve this complex problem because until now there is no formal set of rules, so the only alternative is to learn from examples. We have developed an approach based on the sliding window to detect the dependence of a word on its neighborhood. Each word is strongly linked to the words that precede it and those that follow it. We obtained good results by learning the proposed neural network with two windows of size 3 and 5. However, the results obtained clearly depend on the size of the dataset used and on the fact that it is correctly tagged by the expert. Before we started our work we were aiming to implement a tagging tool that has an accuracy close to 70% not because we weren’t able to achieve better results, it’s due to the fact that the MLP Classifier algorithms didn’t achieve better accuracy in Arabic language, and the most of researchers avoided work with this approach because of the less result of accuracy and the lack of datasets. In most cases our approach of tagging achieved accuracy over 80% if using the correct parameters given in the last chapter. As a future perspective, we seek to expand the dataset that are processed into millions of sentences, with the aim of generalizing this approach to most types of Arabic as the dialectical Arabic, to give a high accuracy. Maybe we will be done to construct an approach that use the benefits of the MLP Classifier approach and the benefits of another approaches in the advantage to giving a high accuracy and speed processing. |
Exemplaires (2)
Code-barres | Cote | Support | Localisation | Section | Disponibilité |
---|---|---|---|---|---|
SE01232T | INF415 | Livre audio | Bibliothèque des Sciences Exactes | 8-Mémoires licence | Libre accès Disponible |
SE02117T | INF415 | Livre audio | Bibliothèque des Sciences Exactes | 8-Mémoires licence | Libre accès Disponible |
Aucun avis, veuillez vous identifier pour ajouter le vôtre !
Accueil