Information Filtering and Automatic Keyword Identification by Artificial Neural Networks. Academic Article uri icon

abstract

  • Information filtering (IF) systems usually filter data items by correlating a vector of terms (keywords) that represent the user profile with similar vectors of terms that represent the data items (e.g. documents). The terms that represent the data items can be determined by (human) experts (e.g. authors of documents) or by automatic indexing methods. In this study we employ an artificial neural-network (ANN) as an alternative method for both filtering and term selection, and compare its effectiveness to "traditional" methods. In an earlier study we developed and examined the performance of an IF system that employed content-based and stereotypic rule-based filtering methods, in the domain of e-mail messages. In this study we train a large-scale ANN-based filter which uses meaningful terms in the same database of e- mail messages as input, and use it to predict the relevancy of those messages. Results of the study reveal that the ANN prediction of relevancy is very good, compared to the prediction of the IF system: correlation between the ANN prediction and the users' evaluation of message relevancy ranges between 0.76- 0.99, compared to correlation in the range of 0.41-0.77 for the IF system. Moreover, we found very low correlation between the terms in the user profile (which were selected by the users) and the positive causal-index terms of the ANN (which indicate the important terms that appear in the messages). This indicates that the users under-estimate the importance of some terms, failing to include them in their profiles. This may explain the rather low prediction accuracy of the IF system that is based on user-generated profiles.

publication date

  • January 1, 2000