Classification of infectious diseases based on chemiluminescent signatures of phagocytes in whole blood Academic Article uri icon

abstract

  • Objectives: Despite medical advances, infectious diseases are still a major cause of mortality and morbidity, disability and socio-economic upheaval worldwide. Early diagnosis, appropriate choice and immediate initiation of antibiotic therapy can greatly affect the outcome of any kind of infection. Phagocytes play a central role in the innate immune response of the organism to infection. They comprise the first-line of defense against infectious intruders in our body, being able to produce large quantities of reactive oxygen species, which can be detected by means of chemiluminescence (CL). The data preparation approach implemented in this work corresponds to a dynamic assessment of phagocytic respiratory burst localization in a luminol-enhanced whole blood CL system. We have previously applied this approach to the problem of identifying various intra-abdominal pathological processes afflicting peritoneal dialysis patients in the Nephrology department and demonstrated 84.6% predictive accuracy with the C4.5 decision-tree algorithm. In this study, we apply the CL-based approach to a larger sample of patients from two departments (Nephrology and Internal Medicine) with the aim of finding the most effective and interpretable feature sets and classification models for a fast and accurate identification of several infectious diseases. Materials and methods: Whole blood samples were collected from 78 patients (comprising 115 instances) with respiratory infections, infections associated with renal replacement therapy and patients without infections. CL kinetic parameters were calculated for each case, which was assigned into a specific clinical group according to the available clinical diagnostics. Feature selection wrapper and filter methods were applied to remove the irrelevant and redundant features and to improve the predictive performance of disease classification algorithms. Three data mining algorithms, C4.5 (J48) decision tree, support vector machines and naive Bayes classifier were applied for inducing disease classification models and their performance in classifying three clinical groups was evaluated by 10 runs of a stratified 10-fold cross-validation. Results and conclusions: The results demonstrate that the predictive power of the best models obtained with the three evaluated algorithms after feature selection was found to be in the range of 63.38+/-2.18-70.68+/-1.43%. The highest disease classification accuracy was reached by C4.5, which also provides the most informative model in the form of a decision tree, and the lowest accuracy was obtained with naive Bayes. The feature selection method attaining the best classification performance was the wrapper method in forward direction. Moreover, the classification models exposed biological patterns specific to the clinical states and the predictive features selected were found to be characteristic of a specific disorder. Based on these encouraging results, we believe that the CL-based data pre-processing approach combined with the wrapper forward feature selection procedure and the C4.5 decision-tree algorithm has a clear potential to become a fast, informative, and sensitive tool for predictive diagnostics of infectious diseases in clinics.

publication date

  • January 1, 2011