Unsupervised Commonsense Knowledge Enrichment for Domain-Specific Sentiment Analysis Academic Article uri icon

abstract

  • Sentiment analysis in natural language text is a challenging task involving a deep understanding of both syntax and semantics. Leveraging the polarity of multiword expressions—or concepts—rather than single words can mitigate the difficulty of such a task as these expressions carry more contextual information than isolated words. Such contextual information is the key to understanding both the syntactic and semantic structure of natural language text and hence is useful in tasks such as sentiment analysis. In this work, we propose a new method to enrich SenticNet (a publicly available knowledge base for concept-level sentiment analysis) with domain-level concepts composed of aspects and sentiment word pairs, along with a measure of their polarity. We process a set of unlabeled texts and, by considering the statistical co-occurrence information, generate a direct acyclic graph (DAG) of concepts. The polarity score of known concepts is propagated and used to compute polarity scores of new concepts. By designing and implementing our exhaustive algorithm, we are able to use a seed set containing only two sentiment words (good and bad). In our evaluation conducted on a dataset of hotel reviews, SenticNet was enriched by a factor of three (from 30,000 to nearly 90,000 concepts). The experiments demonstrate the merit of the concepts discovered by our method at improving sentence-level and aspect-level sentiment analysis tasks. Results of the two-factor ANOVA statistical test showed a confidence level of 95 %, verifying that the improvements are statistically significant.

publication date

  • January 1, 2016