Skip to main content
Log in

A heuristic approach to classifying labeled/unlabeled data sets

  • General Paper
  • Published:
Journal of the Operational Research Society

Abstract

A classification method, which comprises Fuzzy C-Means method, a modified form of the Huang-index function and Variable Precision Rough Set (VPRS) theory, is proposed for classifying labeled/unlabeled data sets in this study. This proposed method, designated as the MVPRS-index method, is used to partition the values of per conditional attribute within the data set and to achieve both the optimal number of clusters and the optimal accuracy of VPRS classification. The validity of the proposed approach is confirmed by comparing the classification results obtained from the MVPRS-index method for UCI data sets and a typical stock market data set with those obtained from the supervised neural networks classification method. Overall, the results show that the MVPRS-index method could be applied to data sets not only with labeled information but also with unlabeled information, and therefore provides a more reliable basis for the extraction of decision-making rules of labeled/unlabeled datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1

Similar content being viewed by others

References

  • Bezdek JC (1974). Cluster validity with fuzzy sets. Journal of Cybernetics 3: 58–74.

    Article  Google Scholar 

  • Clark P and Niblett T (1989). The CN2 induction algorithm. Machine Learning 3 (4): 261–283.

    Google Scholar 

  • Gabrys B and Petrakieva L (2004). Combining labelled and unlabelled data in the design of pattern classification systems. International Journal of Approximate Reasoning 35 (3): 251–273.

    Article  Google Scholar 

  • Huang KY (2009). Application of VPRS model with enhanced threshold parameter selection mechanism to automatic stock market forecasting and portfolio selection. Expert Systems with Applications 36 (9): 11652–11661.

    Article  Google Scholar 

  • Huang KY (2010). Applications of an enhanced cluster validity index method based on the Fuzzy C-means and rough set theories to partition and classification. Expert Systems with Applications 37: 8757–8769.

    Article  Google Scholar 

  • Huang KY and Jane CJ (2009). A hybrid model for stock market forecasting and portfolio selection based on ARX, grey system and RS theories. Expert Systems with Applications 36: 5387–5392.

    Article  Google Scholar 

  • Lee CH (2007). Improving classification performance using unlabeled data: Naive Bayesian case. Knowledge Based Systems 20: 220–224.

    Article  Google Scholar 

  • Nigam K, McCallum AK, Thrun S and Mitchell T (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–134.

    Article  Google Scholar 

  • Pawlak Z (1982). Rough sets. International Journal of Information and Computer Sciences 11 (5): 341–356.

    Article  Google Scholar 

  • Quinlan JR (1986). Induction of decision trees. Machine Learning 1 (1): 85–106.

    Google Scholar 

  • Stoean R, Preuss M, Stoean C, El-Darzi E and Dumitrescu D (2009). Support vector machine learning with an evolutionary engine. Journal of the Operational Research Society 60 (8): 1116–1122.

    Article  Google Scholar 

  • UCI Machine Learning Repository (2011). http://archive.ics.uci.edu/ml/, accessed 3 January 2011.

  • Vovk V, Gammerman A and Shafer G (2005). Algorithmic Learning in a Random World. New York: Springer.

    Google Scholar 

  • Wang F and Zhang C (2007). Robust self-tuning semi-supervised learning. Neurocomputing 70: 2931–2939.

    Article  Google Scholar 

  • Wang Y, Xu X, Zhao H and Hua Z (2010). Semi-supervised learning based on nearest neighbor rule and cut edges. Knowledge-Based Systems 23: 547–554.

    Article  Google Scholar 

  • Zhang GP and Berardi VL (2001). Time series forecasting with neural network ensembles: An application for exchange rate prediction. Journal of the Operational Research Society 52 (6): 652–664.

    Article  Google Scholar 

  • Zhou Z, Zhan D and Yang Q (2007). Semi-supervised learning with very few labeled training examples. In: Proceedings of the 22nd National Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence (AAAI-07): Vancouver.

  • Zhu H, Beling PA and Overstreet GA (2002). A Bayesian framework for the combination of classifier outputs. Journal of the Operational Research Society 53 (7): 719–727.

    Article  Google Scholar 

  • Ziarko W (1993). Variable precision rough set model. Journal of Computer and System Sciences 46: 39–59.

    Article  Google Scholar 

  • Ziarko W (2001). Probabilistic decision tables in the variable precision rough set model. Computational Intelligence 17: 593–603.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, K. A heuristic approach to classifying labeled/unlabeled data sets. J Oper Res Soc 63, 1248–1257 (2012). https://doi.org/10.1057/jors.2011.103

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/jors.2011.103

Keywords

Navigation