Effective Training Data Improved Ensemble Approaches for Urinalysis Model

Full Text (PDF, 168KB), PP.25-31

Views: 0 Downloads: 0

Author(s)

Ping Wu 1,* Min Zhu 1 Peng Pu 1 Tang Jiang 2

1. School of Information Science and Technology, East China Normal University, Shanghai 200062, China

2. Clinical Laboratory, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou 510080, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2011.04.04

Received: 11 Apr. 2011 / Revised: 5 Jun. 2011 / Accepted: 12 Jul. 2011 / Published: 8 Aug. 2011

Index Terms

Urinalysis, noisy data, imbalanced data, sampling methods, classification, ensembles

Abstract

Urinalysis remains one of the most commonly performed tests in clinical practice. Laboratory work can be greatly relieved by automated analyzing techniques. However, noisy and imbalanced urine samples make automatically identifying and classifying urine-related diseases become very difficult. This paper proposed hybrid sampling-based ensemble learning strategies by improving training data and classification performance. Having compared the effectiveness of several learning classifiers and data processing techniques, the experiments showed that the suggesting methods provided better classification accuracy than other approaches.

Cite This Paper

Ping Wu, Min Zhu, Peng Pu, Tang Jiang, "Effective Training Data Improved Ensemble Approaches for Urinalysis Model", International Journal of Modern Education and Computer Science(IJMECS), vol.3, no.4, pp.25-31, 2011. DOI:10.5815/ijmecs.2011.04.04

Reference

[1]Josiane Steinmetz, Joseph Henny and Rene Gueguen, Stepwise strategies in analysing haematuria and leukocyturia in screening Clin Chem Lab Med 44(4), 2006, pp. 464-470.
[2]Elin RJ, Hosseini JM, Kestner J, et al., Comparison of automated and manual methods for urinalysis. Am J ClinPathol. ,1986, pp. 731-737.
[3]Tang Jiang, et al., Development of a new mode of urinary test and its software Chin J Lab Med 29(7), 2006.
[4]Ping Wu, et al., A Hybrid GA-based Fuzzy Classifying Approach to Urinary Analysis Modeling. Genetic and Evolutionary Computation Conference 2009, pp. 2671-2677.
[5]Dietterich T.G., Machine learning research: Four current directions. AI Magazine 18(4), 1997, pp.97–136.
[6]BREIMAN Leo, Bagging Predictors. Machine Learning, 24(2), 1996, pp. 123–140.
[7]T. G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, Mach. Learn. 40(2), 2000, pp. 139–157.
[8]Breiman L., Random forests. Machine Learning 45(1), 2001,pp.5–32
[9]Bauer E. and Kohavi R., An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36,1999, pp. 105-139
[10]Nitesh V. Chawla , Nathalie Japkowicz , and Aleksander Ko lcz, Editorial: Special Issue on Learning from Imbalanced Data, ACM SIGKDD Explorations Newsletter 6(1), 2004, pp.1-6.
[11]Haibo He, Garcia, and E.A., Learning from Imbalanced Data, IEEE Transactions On Knowledge And Data Engineering, 21(9), 2009,pp. 1263 – 1284
[12]Nathalie Japkowicz, Concept-Learning in the Presence of Between-Class and Within-Class Imbalances, Advances in Artificial Intelligence, 2056, 2001, pp.67-77
[13]Gary M. Weiss, Mining with rarity: a unifying framework, ACM SIGKDD Explorations Newsletter 6(1), 2004, pp.7-19.
[14]M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection. International Conference on Machine Learning 1997, pp. 179-186.
[15]I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, California, 2nd edition, 2005.
[16]Brodley, C. E. and Friedl, M. A., Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 1999, pp.131–167.
[17]Xingquan Zhu, et al., “Cleansing Noisy Data Streams,” ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 1139-1144.