Online classification of nonstationary data streams Academic Article uri icon

abstract

  • Most classification methods are based on the assumption that the data conforms to a stationary distribution. However, the real-world data is usually collected over certain periods of time, ranging from seconds to years, and ignoring possible changes in the underlying concept, also known as concept drift, may degrade the predictive performance of a classification model. Moreover, the computation time, the amount of required memory, and the model complexity may grow indefinitely with the continuous arrival of new training instances. This paper describes and evaluates OLIN, an online classification system, which dynamically adjusts the size of the training window and the number of new examples between model re-constructions to the current rate of concept drift. By using a fixed amount of computer resources, OLIN produces models, which have nearly the same accuracy as the ones that would be produced by periodically re-constructing the model from all accumulated instances. We evaluate the system performance on sample segments from two real-world streams of non-stationary data.

publication date

  • January 1, 2002