Article

Security Journal (2008) 21, 278–290. doi:10.1057/palgrave.sj.8350073; published online 11 August 2008

A Bootstrap-based Simple Probability Model for Classifying Network Traffic and Detecting Network Intrusion

Yun Wanga,b and Inyoung Kimc

  1. aCenter for Outcomes Research and Evaluation, Yale University and Yale New Haven Health, 1 Church Street, Suite 200, New Haven, CT 06510, U.S.A. E-mail: yun.wang@yale.edu
  2. bQualidigm, 100 Roscommon Drive, Middletown, CT 06457, U.S.A.
  3. cDepartment of Statistics, Virginia Tech, Blacksburg, VA 24061, U.S.A.
Top

Abstract

Network traffic audit data provide unique and valuable information for network security. Although a comprehensive intrusion detection scheme contains multiple data sources and multiple measurements, the system-level traffic data provide important baseline information on anomalous traffic that could harm the network system, and such information can be learned from training data. However, when labeled abnormal data are not available or such events are insufficient in training data, conventional supervised classification methods, such as regression models and neural networks, are not suitable. Using the bootstrap resampling method, we developed a simple probability model trained with an anomaly-free training sample and yielded a receiver operating characteristic area of 0.96, specificity of 0.96, sensitivity of 0.96, and a classification agreement rate of 0.96 to detect abnormal events in a testing sample. The model provides a potential approach for classifying network traffic when limited or no abnormal information is available in training data.

Keywords:

intrusion detection, machine learning, classification, network traffic analysis, network security