For example, if a program runs for several days or more, each time that it is flagged as anomalous must be counted separately. As pointed out in , the simplest way to measure this is to count all the individual decisions. Then, the false-positive rate is selected as the percentage of decisions in which normal data were detected as anomalous. In the experiments, the TD learning prediction method was applied to the above data sets. Every state in the Markov reward model has a system-call sequence length of 6, which has been widely employed in previous works. The reward function is defined by (18). A linear function approximator, which is a polynomial function of the observation states and has a dimension of 24, was used as the value function approximator. To compare the performance of TD learning prediction and previous approaches, the experimental results in , where HMM-based dynamic behavior modeling methods were applied to the same data sets, are also shown in the following Table 2. Table 2. Performance comparisons between TD and HMM methods To compare the performance between the kernel LS-TD approach with the linear LS-TD  and the HMM-based approach , experiments on host-based intrusion detection using system calls were conducted. In the experiments, the data set of system call traces generated from the Sendmail program was used. The system call traces were divided into two parts.
Machine Learning for Sequential Behavior Modeling and Prediction 419 One part is for model training and threshold determination and the other part is for performance evaluation. The normal trace numbers for training and testing are 13 and 67, respectively. The numbers of attack traces used for training and testing are 5 and 7. The total number of system calls in the data set is 223733. During the threshold determination process, the same traces were used as the training process. The testing data are different from those in model training and their sizes are usually larger than the training data. In the learning prediction experiments for intrusion detection, the kernel LS-TD algorithm and previous linear TD(λ) algorithms, i.e., LS-TD(λ), are all implemented for the learning prediction task. In the kernel-based LS-TD algorithm, a radius basis function (RBF) kernel is selected and its width parameter is set to 0.8 in all the experiments. A threshold parameter ǅ=0.001 is selected for the sparsification procedure of the kernel-based LS-TD learning algorithm. The LS-TD(λ) algorithm uses a linear function approximator, which is a polynomial function of the observation states and has a dimension of 24. * The false alarm rates were only computed for trace numbers, not for single state Table 3. Performance comparisons between different methods The experimental results are shown in Table 3. It can be seen from the results that both of the two RL methods, i.e., the kernel LS-TD and linear LS-TD, have 100% detection rates and the kernel-based LS-TD approach has better performance in false alarm rates than the linear LS-TD method. The main reason is due to the learning prediction accuracy of kernel-based