Titre : | Comparative study of Arabic language stemmers |
Auteurs : | Ghania Otmane, Auteur ; Amel Rezki, Auteur ; Mohamed Salem, Directeur de thèse |
Type de document : | texte manuscrit |
Editeur : | Université mustapha stambouli de Mascara:Faculté des sciences exactes, 2019 |
ISBN/ISSN/EAN : | SE00703T |
Format : | 59P. / couv. / 27 cm. |
Langues: | Français |
Résumé : |
Natural language processing techniques are based on different analysis levels to provide the machine, the ability of generating and understanding human language. Stemming is the most important step in this process, it allows us to get the word’s stem. Many stemmers were developed to gain the rich linguistic features provided by the stems. Some of these stemmers should make explicit decisions, statistical-based or linguistic-based, to select only one stem. Other ones used ranking to express their selection preference rather than selecting a single root. However, at the very end, a single root would be chosen. The research in this dissertation focused on presenting the well-known Arabic stemmers and their techniques ( Khoja, Tashaphyne, ISRI and Light stemmer ), discussing their strengths and weaknesses and carrying out a theoretical and practical comparison between them using benchmark corpora and known metrics. This Study have led to some interesting results: • The presented stemmers try to find the word’s stem by different techniques, they are rule based and dictionary based ones . • Although the existing of some comparison studies of Arabic stemmers, none of them has used the Paice metric and the same corpora as ours. • After the experimental comparison, any stemmer has obtained perfect results, so every stemmer got its weaknesses. • Light stemmer has given the right stem in corpora . • While khoja has obtained the best results in words and Paice metric which is a ratio between the over stemming and under stemming rates. • ISRI stemmer has got bad results with three or less characters length words The presented work could be extended in several ways: • First, it could be enhanced by including more specialized corpora and evaluation metrics. • Second, we can exploit the obtained strength and weakness points of the stemmers to enhance the behavior and performances of khoja algorithm |
En ligne : | http://127.0.0.1/pmb/images/pdf/SE00703T.pdf |
Exemplaires (2)
Code-barres | Cote | Support | Localisation | Section | Disponibilité |
---|---|---|---|---|---|
SE00703T | INF235 | Livre audio | Bibliothèque des Sciences Exactes | 7-Mémoires Master | Libre accès Disponible |
SE00921T | INF235 | Livre audio | Bibliothèque des Sciences Exactes | 7-Mémoires Master | Libre accès Disponible |
Aucun avis, veuillez vous identifier pour ajouter le vôtre !
Accueil