Abstract
A classification method, which comprises Fuzzy C-Means method, a modified form of the Huang-index function and Variable Precision Rough Set (VPRS) theory, is proposed for classifying labeled/unlabeled data sets in this study. This proposed method, designated as the MVPRS-index method, is used to partition the values of per conditional attribute within the data set and to achieve both the optimal number of clusters and the optimal accuracy of VPRS classification. The validity of the proposed approach is confirmed by comparing the classification results obtained from the MVPRS-index method for UCI data sets and a typical stock market data set with those obtained from the supervised neural networks classification method. Overall, the results show that the MVPRS-index method could be applied to data sets not only with labeled information but also with unlabeled information, and therefore provides a more reliable basis for the extraction of decision-making rules of labeled/unlabeled datasets.
Similar content being viewed by others
References
Bezdek JC (1974). Cluster validity with fuzzy sets. Journal of Cybernetics 3: 58–74.
Clark P and Niblett T (1989). The CN2 induction algorithm. Machine Learning 3 (4): 261–283.
Gabrys B and Petrakieva L (2004). Combining labelled and unlabelled data in the design of pattern classification systems. International Journal of Approximate Reasoning 35 (3): 251–273.
Huang KY (2009). Application of VPRS model with enhanced threshold parameter selection mechanism to automatic stock market forecasting and portfolio selection. Expert Systems with Applications 36 (9): 11652–11661.
Huang KY (2010). Applications of an enhanced cluster validity index method based on the Fuzzy C-means and rough set theories to partition and classification. Expert Systems with Applications 37: 8757–8769.
Huang KY and Jane CJ (2009). A hybrid model for stock market forecasting and portfolio selection based on ARX, grey system and RS theories. Expert Systems with Applications 36: 5387–5392.
Lee CH (2007). Improving classification performance using unlabeled data: Naive Bayesian case. Knowledge Based Systems 20: 220–224.
Nigam K, McCallum AK, Thrun S and Mitchell T (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–134.
Pawlak Z (1982). Rough sets. International Journal of Information and Computer Sciences 11 (5): 341–356.
Quinlan JR (1986). Induction of decision trees. Machine Learning 1 (1): 85–106.
Stoean R, Preuss M, Stoean C, El-Darzi E and Dumitrescu D (2009). Support vector machine learning with an evolutionary engine. Journal of the Operational Research Society 60 (8): 1116–1122.
UCI Machine Learning Repository (2011). http://archive.ics.uci.edu/ml/, accessed 3 January 2011.
Vovk V, Gammerman A and Shafer G (2005). Algorithmic Learning in a Random World. New York: Springer.
Wang F and Zhang C (2007). Robust self-tuning semi-supervised learning. Neurocomputing 70: 2931–2939.
Wang Y, Xu X, Zhao H and Hua Z (2010). Semi-supervised learning based on nearest neighbor rule and cut edges. Knowledge-Based Systems 23: 547–554.
Zhang GP and Berardi VL (2001). Time series forecasting with neural network ensembles: An application for exchange rate prediction. Journal of the Operational Research Society 52 (6): 652–664.
Zhou Z, Zhan D and Yang Q (2007). Semi-supervised learning with very few labeled training examples. In: Proceedings of the 22nd National Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence (AAAI-07): Vancouver.
Zhu H, Beling PA and Overstreet GA (2002). A Bayesian framework for the combination of classifier outputs. Journal of the Operational Research Society 53 (7): 719–727.
Ziarko W (1993). Variable precision rough set model. Journal of Computer and System Sciences 46: 39–59.
Ziarko W (2001). Probabilistic decision tables in the variable precision rough set model. Computational Intelligence 17: 593–603.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Huang, K. A heuristic approach to classifying labeled/unlabeled data sets. J Oper Res Soc 63, 1248–1257 (2012). https://doi.org/10.1057/jors.2011.103
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/jors.2011.103