Multiple hierarchical classification of free-text clinical guidelines Academic Article uri icon

abstract

  • Summary Objective Manual classification of free-text documents within a predefined hierarchy, commonly required in the medical domain, is highly time consuming task. We present an approach based on supervised learning to automate the classification of clinical guidelines into predefined hierarchical conceptual categories. Methods and material Given a set of hierarchically categorized documents in the training stage the learning algorithm exploits the hierarchical structure of the concepts in order to overcome the low number of training examples. The classification task is thus decomposed into a continuous decision process, unlike searching within a decision tree, which follows the concept hierarchy and makes a single decision at each node on the path, multiple paths can be chosen. Classification is based on applying a similarity function at each concept. Several evaluation measures were used, based on the intended use of the hierarchy. In addition, conservative and aggressive stop-criterion strategies for stopping the search through the concept hierarchy were formulated. An evaluation of the approach, including several training methods and multiple evaluation measures, has been performed using a training set of 1136 guidelines from the National Guideline Clearing House set. Results Based on a test collection consisting of 1038 clinical practice guidelines (CPGs) classified along two hierarchies, of roughly 5000 concepts, in which each CPG was classified by a mean of 10 concepts, a variable precision was observed from 44% to 60% depending on the settings of the training methods. Conclusion These results demonstrate the feasibility of the approach, especially when considering the low ratio of guidelines to classification indices (concepts) in the evaluation data set used here.

publication date

  • July 1, 2006