Communications of the ACS, Vol 5, No 1 (2012)

Feature Extraction for Prophetic Traditions Texts Classification

Fouzi Harrag, Abdul Malik Salman Al-Salman, Eyas El-Qawasmah

Abstract


In this paper, a comparative study is conducted of three text preprocessing techniques in the context of the Arabic text categorization problem using an in-house Arabic dataset. We evaluated and compared three Stemming techniques: Light-Stemming, Root-Based-Stemming and Dictionary-Lookup-Stemming, to reduce the feature space into an input space of much lower dimension for two different state-of-the art classifiers: Artificial Neural Networks and support vectors machine. The results illustrated that using light stemmer enhances the performance of Arabic Text Categorization. The results also showed that the proposed Artificial Neural Networks model was able to achieve high categorization effectiveness as measured by Macro-Average F1 measure.

Full Text: DOC