Abstract
We propose two multi-class classification methods using a signomial function. Each of these methods directly constructs a multi-class classifier by solving a single optimization problem. Since the number of possible signomial terms is extremely large, we propose a column generation method that iteratively generates good signomial terms. Both of these methods obtain better or comparable classification accuracies than existing methods and also provide more sparse classifiers.
Similar content being viewed by others
References
Amaldi E and Kann V (1998). On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science 209 (1–2): 237–260.
Anand R, Mehrotra K, Mohan CK and Ranka S (1995). Efficient classification for multiclass problems using modular neural networks. IEEE Transactions on Neural Networks 6 (1): 117–124.
Bache K and Lichman M (2013). University of California, Irvine (UCI) machine learning repository. http://archive.ics.uci.edu/ml, accessed 15 December 2013.
Baesens B, Mues C, Martens D and Vanthienen J (2009). 50 years of data mining and OR: Upcoming trends and challenges. The Journal of the Operational Research Society 60 (Supplement 1): S16–S23.
Bennett KP and Mangasarian OL (1994). Multicategory discrimination via linear programming. Optimization Methods and Software 3 (1–3): 27–39.
Bennett KP, Demiriz A and Shawe-Taylor J (2000). A column generation algorithm for boosting. In: Langley P (ed). Proceedings of the 17th International Conference on Machine Learning, ICML ’00, Morgan Kaufmann Publishers: Stanford, CA, pp 65–72.
Bertsimas D and Tsitsiklis J (1997). Introduction to Linear Optimization. Athena Scientific: Belmont, MA.
Bi J, Zhang T and Bennett KP (2004). Column-generation boosting methods for mixture of kernels. In: Kim W, Kohavi R, Gehrke J and DuMouchel W (eds). Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, ACM: New York, NY, pp 521–526.
Blumer A, Ehrenfeucht A, Haussler D and Warmuth MK (1987). Occam’s razor. Information Processing Letters 24 (6): 377–380.
Bradley P and Mangasarian O (1998). Feature selection via concave minimization and support vector machines. In: Shavlik JW (ed). Proceedings of the 15th International Conference on Machine Learning, ICML ’98, Morgan Kaufmann Publishers: Madison, WI, pp 82–90.
Bredensteiner EJ and Bennett KP (1999). Multicategory classification by support vector machines. Computational Optimization and Applications 12 (1–3): 53–79.
Carrizosa E, Martin-Barragan B and Morales DR (2010). Binarized support vector machines. INFORMS Journal on Computing 22 (1): 154–167.
Choo EU and Wedley WC (1985). Optimal criterion weights in repetitive multicriteria decision-making. The Journal of the Operational Research Society 36 (11): 983–992.
Clark P and Boswell R (1991). Rule induction with CN2: Some recent improvements. In: Kodratoff Y (ed). Proceedings of the European Working Session on Machine Learning, EWSL ’91, Springer-Verlag: Porto, Portugal, pp 151–163.
Crammer K and Singer Y (2002). On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2: 265–292.
Debnath R, Takahide N and Takahashi H (2004). A decision based one-against-one method for multiclass support vector machine. Pattern Analysis and Applications 7 (2): 164–175.
Demiriz A, Bennett KP and Shawe-Taylor J (2002). Linear programming boosting via column generation. Machine Learning 46 (1–3): 225–254.
Frank M and Wolfe F (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly 3 (1–2): 95–110.
Friedman JH (1996). Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford University, Stanford, CA.
Fung GM and Mangasarian OL (2004). A feature selection Newton method for support vector machine classification. Computational Optimization and Applications 28 (2): 185–202.
Galar M, Fernández A, Barrenechea E, Bustince H and Herrera F (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-versus-one and one-versus-all schemes. Pattern Recognition 44 (8): 1761–1776.
Garey MR and Johnson DS (1979). Computers and Intractability; A Guide to the Theory of NP-Completeness. WH Freeman and Company: New York, NY.
Goldberg N and Eckstein J (2010). Boosting classifiers with tightened l0-relaxation penalties. In: Proceedings of the 27th International Conference on Machine Learning, ICML ’10, Omni Press, pp 383–390.
Goldberg N and Eckstein J (2012). Sparse weighted voting classifier selection and its linear programming relaxations. Information Processing Letters 112 (12): 481–486.
Hastie T and Tibshirani R (1997). Classification by pairwise coupling. In: Jordan MI, Kearns MJ and Solla SA (eds). Proceedings of the 10th Annual Conference on Neural Information Processing Systems, NIPS ’97, MIT Press: Denver, CO, pp 507–513.
He X, Wang Z, Jin C, Zheng Y and Xue X (2012). A simplified multi-class support vector machine with reduced dual optimization. Pattern Recognition Letters 33 (1): 71–82.
Hsu CW and Lin CJ (2002a). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13 (2): 415–425.
Hsu CW and Lin CJ (2002b). A simple decomposition method for support vector machines. Machine Learning 46 (1–3): 291–314.
Hsu CW and Lin CJ (2012). BSVM: A SVM library for the solution of large classification and regression problems. http://www.csie.ntu.edu.tw/~cjlin/bsvm, accessed 15 December 2013.
Hsu CW, Chang CC and Lin CJ (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, Taipei 106, Taiwan.
Huang K, Zheng D, King I and Lyu MR (2009). Arbitrary norm support vector machines. Neural Computation 21 (2): 560–582.
Hüllermeier E and Vanderlooy S (2010). Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting. Pattern Recognition 43 (1): 128–142.
Lam KF and Moy JW (1996). Improved linear programming formulations for the multi-group discriminant problem. The Journal of the Operational Research Society 47 (12): 1526–1529.
Lawler EL and Wood DE (1966). Branch-and-bound methods: A survey. Operations Research 14 (4): 699–719.
Lee K, Kim N and Jeong MK (2012). The sparse signomial classification and regression model. Annals of Operations Research published online 15 August, doi:10.1007/s10479-012-1198-y.
Lee Y, Lin Y and Wahba G (2004). Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465): 67–82.
Lee Y, Kim Y, Lee S and Koo J (2006). Structured multicategory support vector machines with analysis of variance decomposition. Biometrika 93 (3): 555–571.
Li J-T and Jia Y-M (2010). Huberized multiclass support vector machine for microarray classification. Acta Automatica Sinica 36 (3): 399–405.
Lorena AC, Carvalho AC and Gama JM (2008). A review on the combination of binary classifiers in multiclass problems. Artificial Intelligence Review 30 (1–4): 19–37.
Mangasarian OL (2000). Generalized support vector machines. In: Smola AJ, Bartlett P, Schöolkopf B and Schuurmans D (eds). Advances in Large Margin Classifiers. The MIT Press: Cambridge, MA, pp 135–146.
Mangasarian OL (2006). Exact 1-norm support vector machines via unconstrained convex differentiable minimization. The Journal of Machine Learning Research 7: 1517–1530.
Oladunni OO and Singhal G (2009). Piecewise multi-classification support vector machines. In: Kozma R (ed). Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN ’09, IEEE Press: Piscataway, NJ, pp 2323–2330.
Pavur R and Loucopoulos C (1995). Examining optimal criterion weights in mixed integer programming approaches to the multiple-group classification problem. The Journal of the Operational Research Society 46 (5): 626–640.
Tax DMJ and Duin RPW (2002). Using two-class classifiers for multiclass classification. In: Kasturi R (ed). Proceedings of the 16th International Conference on Pattern Recognition, ICPR ’02, IEEE Computer Society: Quebec, Canada, pp 124–127.
Vapnik VN (1998). Statistical Learning Theory. Wiley: New York, NY.
Wang L and Shen X (2006). Multi-category support vector machines, feature selection and solution path. Statistica Sinica 16 (2): 617–633.
Wang L and Shen X (2007). On l1-norm multiclass support vector machines. Journal of the American Statistical Association 102 (478): 583–594.
Wang L, Zhu J and Zou H (2006). The doubly regularized support vector machine. Statistica Sinica 16 (2): 589–615.
Wang L, Zhu J and Zou H (2008). Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24 (3): 412–419.
Weston J and Watkins C (1999). Support vector machines for multi-class pattern recognition. In: Verleysen M (ed). Proceedings of the 7th European Symposium on Artificial Neural Networks, ESANN ’99, Citeseer: Bruges, Belgium, pp 219–224.
Weston J, Elisseeff A, Schölkopf B and Tipping M (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research 3: 1439–1461.
Xpress (2012). Xpress-MP 7.3. http://www.fico.com/en, accessed 15 December 2013.
Yajima Y (2005). Linear programming approaches for multicategory support vector machines. European Journal of Operational Research 162 (2): 514–531.
Zhang HH, Liu Y, Wu Y and Zhu J (2008). Variable selection for the multicategory SVM via adaptive sup-norm regularization. Electronic Journal of Statistics 2: 149–167.
Zhou W, Zhang L and Jiao L (2002). Linear programming support vector machines. Pattern Recognition 35 (12): 2927–2936.
Zhu J, Rosset S, Hastie T and Tibshirani R (2003). 1-norm support vector machines. In: Proceedings of the 16th Annual Conference on Neural Information Processing Systems, NIPS ’03, MIT Press: Vancouver and Whistler, BC, Canada, pp 49–56.
Zou H (2007). An improved 1-norm SVM for simultaneous classification and variable selection. Journal of Machine Learning Research—Proceedings Track 2: 675–681.
Zou H and Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–320.
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012–006351).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hwang, K., Lee, K., Lee, C. et al. Multi-class classification using a signomial function. J Oper Res Soc 66, 434–449 (2015). https://doi.org/10.1057/jors.2013.180
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/jors.2013.180