Thesis Roosmalen van J.pdf - The feasibility of deep...

This preview shows page 1 out of 229 pages.

Unformatted text preview: The feasibility of deep learning approaches for P2P-botnet detection Jos van Roosmalen Document compiled at: 2017-01-08 17:49:20 CET Configuration: Online, Final, HQ Plots, and TODO invisible. VCS Revision: 1278 VCS Revision time stamp: 2017-01-08 15:46:53 UTC © Jos van Roosmalen, 2016 – 2017 This thesis is typeset in Fedra Serif B and Fedra Sans, licensed by the author for educational or not-for-profit purposes from type foundry Typotheque, The Hague, the Netherlands. Cover art: © Sarunyu_foto - shutterstock.com The LATEX layout class was initially derived from previous work by © Dr. Michael Ummels 2010 – 2013. Used with permission under the terms of the MIT License. cbnd This work is licensed under the Creative Commons Attribution-Non-Commercial-NoDerivatives 4.0 International License. To view a copy of this license, visit http:// creativecommons.org/licenses/by-nc-nd/4.0/. The feasibility of deep learning approaches for P2P-botnet detection A master’s thesis submitted by Jos van Roosmalen in partial fulfillment of the requirements for the degree of Master of Science in Computer Science at the Open University, Faculty of Management, Science and Technology Master Computer Science th to be defended publicly on Friday, January 27 , 2017 at 14:00 Student: Jos van Roosmalen Student number: 837385901 Course: IM990C Chairman: Prof. Dr. M.C.J.D. van Eekelen Supervisor: Dr. ir. H.P.E. Vranken An electronic version of this thesis is available at . Contents Acronyms . . . . . . . . . . . . . . . . . . . . . xiv Glossary . . . . . . . . . . . . . . . . . . . . . . xvi Abstract . . . . . . . . . . . . . . . . . . . . . . . Samenvatting 1 2 Introduction . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . 5 1.1 Problem statement 1.2 Research objective 1.3 Research method 1.4 Document structure P2P-Botnets . . . . . . . . . . . . . . . . 6 . . . . . . . . . . . . . . 6 . . . . . . . . . . . . . . . . . 6 . . . . . . . . . . . . . . . . . . 7 . . . . . . . . . . . . . . . . . . . . . 8 2.1 Introduction 2.2 Botnets 2.2.1 . . . . . . . . . . . . . . . . . . . 8 . . . . . . . . . . . . . . . . . . . . . 9 Botnet impact: The case for botnet defense 2.2.2 Botnet life-cycle P2P-architecture 2.3.1 2.4 13 . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . 18 Taxonomy . . . . . . 2.5.3 Related work . . . 2.5.2 Level of passive monitoring iv . . 15 P2P-botnet detection methods 2.5.1 . . . ZeuS Game Over 2.4.2 Sality 2.5 . . . . . . . . . . . . . . . . . . . 10 11 Communication in unstructured P2P . 9 . P2P-botnet analysis 2.4.1 . . . . . . . . . . . . . . . . . . . . 2.2.3 Botnet architecture and design evolution 2.3 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 18 . . 18 . 20 . . . . . . . . . . . . . . . . 21 2.5.4 Common approach 3 . 23 Deep learning . . . . . . . . . . . . . . . . . . . . 25 3.1 Notation . . . . . . . . . . . . . . . . . . . . 25 3.2 What is learning? 3.2.1 Task T . . . . 3.3 . . . . . . . Experience E . . . . . . . . . . 26 . . . . . . . . . . . . . . . . 26 . . . . . . . . . . . . . . . . . . . . . Neural network architecture . . . . . . . . . . . Neural network topologies . 27 28 . 31 . . . . . . . . . . . 31 Feed-forward neural network learning . . . . 32 . . . . 34 . . . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . . . . . 35 Initialize. Deep learning 26 . . . . . . . . . . . . Activation functions . . . . . . . . Definition and motivation . . . . . . . . . . . . . . 35 . . . . . 36 Architecture selection . . . . . . . . . . . . . . . 39 3.7.1 Model selection . . . . . . . . . . . . . . . 39 3.7.2 Hyperparameter selection . . . . . . . . . . . . . . . . 3.8 Architectures used in other studies 3.9 Learning algorithms for deep learning . . . . . . . . . 40 . 41 . . . . . . . . . 42 . . . . . . . . . . . . 42 3.9.2 Semi-supervised learning. . . . . . . . . . . . 42 . . 44 . . 48 . . . . . . . . . . . . . . . . 49 . . . . . . . . . . . . . . . . . . 49 3.9.1 Training layer-by-layer. . . . . . . . 3.6.2 Deep learning architectures 3.9.3 Using a numerical optimizer. Improving generalization 3.10.1 Weight decay 3.10.2 Dropout 3.10.3 DropConnect 3.11 . . 3.6.1 3.10 . 26 3.5.2 Feed-forward 3.7 . . 3.5.1 3.6 . . 3.4.1 3.5 . . Artificial neuron fundamentals 3.3.1 3.4 . . . . . . . . . . . . . . . . . . 3.2.2 Performance measure P 3.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Activation functions for deep learning . . . . . . . . . 50 3.11.1 Rectified linear . 50 3.11.2 Maxout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 v 3.12 Input preprocessing . . . . . . . . . . . . . . . . 3.13 Selected architecture overview . . . . . . . 3.13.1 Pre-training via denoising auto-encoder 3.13.2 Supervised deep neural network 3.13.3 Semi-supervised learning 3.14 4 . 53 . . . . 54 . . . . 54 Visualizing high-dimensional data . . . . . . . . . . 55 . . . . . . . . . . . . . . . . . . . 59 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 . . . . . . . . . . . . . . . . . 60 . . . . . . . . . . . . . 61 . . . . . . . . . . . . . . 61 . . . . . . . . . . . . . . 61 . . 65 . . . . . . . . . . . . . . . . . . 67 . . . . . . . . . . . . . . . . . . . . 67 Metrics . . . . 4.3.2 Statistical testing Experimental setup Datasets . 59 . Performance evaluation 4.3.1 . . . 5.1.1 Non-malicious datasets 5.1.2 P2P-botnet datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Data extraction and preprocessing . . . . . . . . . . 70 . . . . . . . 70 . . . . . . . . . . 72 PCAP processing . . . . 67 . 5.2.1 . . . . 5.2.2 Low-level feature generation 5.2.3 Generating low-level features 5.2.4 Data preprocessing . . . . . . . . 74 . . . . . . . 75 . . . . . . . . . . 77 . . . . . . . . . . . . . 77 . Framework overview 5.3.2 Developments in 2016 Software customization . 5.4.1 PDNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 . 81 . . . . . . . . . . . . . . . . . . . 81 5.4.2 Ladder networks . . . Deep learning framework selection 5.3.1 . . . . . . . . . . . . . . . 81 5.4.3 t-SNE visualizations . . . . . . . . . . . . . 82 Runtime infrastructure . . . . . . . . . . . . . 82 Amazon web services platform . . . . . . . . . 82 5.5.1 vi . . 4.2.2 Scope and limitations 5.5 . . Approach and scope 5.4 . . 4.2 5.3 53 . Research questions 5.2 . . . 4.1 5.1 . . . 4.2.1 5 . . . . Research design 4.3 . . 51 . 5.5.2 Dedicated GPU processing 6 Experiments and results 6.1 . . . . . . . . . . . 83 . . . . . . . . . . . . . . . . 84 Experiment design . . . . . . . . . . . . . . . . 84 6.1.1 Architecture . . . . . . . . . . . . . . . . 84 6.1.2 Weight matrix initialization . . . . . . . . . . 85 6.1.3 Activation function . . . . . . . . . . 86 . . . . . . . . . . . . . . . 86 6.1.4 Mini-batch size 6.1.5 Learning . . 6.1.6 Pre-training . . . . . . . . . . . . . . 86 . . . . . . . . . . . . . . . 87 Regularization . . . . . . . . . . . . . . . 87 6.1.8 Stop criterion . . . . . . . . . . . . . . . . 87 6.1.9 Loss function . . . . . . . . . . . . . . . . 88 Searching the optimal parameters . . . . . . . . . . 88 Experiment set 1 . . . . . . . . . . . . . . 88 6.2.2 Experiment set 2 . . . . . . . . . . . . . . 89 6.2.3 Experiment set 3 . . . . . . . . . . . . . . 90 6.2.4 Experiment set 4 . . . . . . . 6.2.1 7 . . . 6.1.7 6.2 . . . . 91 . . . . . . . . . . . . . . 91 6.3 Semi-supervised training 6.4 Discussion 6.5 Data visualizations of P2P-botnet detection Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 . 93 . . . . . . . . . . . . . . . . . . . . . 7.1 Answers to research questions . . . . . . . . 100 7.2 Research contributions . . . . . . . . 102 7.3 Discussion . . . . . . . . . . . . . . . . . . . 102 7.4 Threats to validity . . . . . . . . . . . . . . . . 103 7.5 Future work . . 105 7.6 Reflection . . . . . . . . . . . . . . . . . . . 105 . . . . . . . Bibilography: Academic references . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibilography: Non-academic references Appendix . 100 . 127 . . . . . . . . . . . . . . . . . . . . . . 131 A P2P Botnet Characteristics B P2P Botnet Detection Characteristics . . . . . . . . 107 . . . . . . . . . . . . . . . . . . . . . . . . . 131 134 vii C Dataset Breakdown D Experiments viii . . . . . . . . . . . . . . . . . 148 . . . . . . . . . . . . . . . . . . . . 156 List of Tables 2.1 C&C-topology characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1 Deep learning approach for this study . . . . . . . . . . . . . . . . . . . . 60 4.2 Confusion matrix. In this study botnet is the positive class. For example: a false negative is a botnet trace (data class positive) which is classified as non-botnet (classified as negative) . . . . . . . . . . . . . . 62 4.3 Binary classification performance metrics . . . . . . . . . . . . . . . . . 62 4.4 An example of 2 classifiers with different output but same accuracy 65 5.1 Packet classification and flow generation strategy . . . . . . . . . . . . 68 5.2 Details of dataset used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3 Networking headers used as features . . . . . . . . . . . . . . . . . . . . 73 5.4 Dataset statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Deep Learning Frameworks overview . . . . . . . . . . . . . . . . . . . . 78 5.6 Deep Learning Framework Ranking . . . . . . . . . . . . . . . . . . . . . 79 5.7 Support for the selected features . . . . . . . . . . . . . . . . . . . . . . . 80 5.8 Matrix Multiplication performance on different hardware . . . . . . . 82 6.1 94 Used hyperparameters for TSNE . . . . . . . . . . . . . . . . . . . . . . . A.1 P2P-botnet overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 B.1 Comparison of end host P2P-detection techniques (1) . . . . . . . . . . 135 B.2 Comparison of end host P2P-detection techniques (2) . . . . . . . . . . 136 B.3 Comparison of end host P2P-detection techniques (3) . . . . . . . . . . 137 B.4 Comparison of end host P2P-detection techniques (4) . . . . . . . . . . 138 B.5 Comparison of end host P2P-detection techniques (5) . . . . . . . . . . 139 B.6 Comparison of end host P2P-detection techniques (6) . . . . . . . . . . 140 B.7 Comparison of end host P2P-detection techniques (7) . . . . . . . . . . 141 B.8 Comparison of end host P2P-detection techniques (8) . . . . . . . . . . 142 ix List of Tables B.9 Comparison of end host P2P-detection techniques (9) . . . . . . . . . . 143 B.10 Comparison of other P2P-detection techniques (2) . . . . . . . . . . . . 144 B.11 Comparison of Core Network P2P-detection techniques (1) . . . . . . . 145 B.12 Comparison of Core Network P2P-detection techniques (2) . . . . . . . 146 B.13 Comparison of Core Network P2P-detection techniques (3) . . . . . . . 147 C.1 Dataset breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 C.2 (Sub)classification breakdown of our best classifier . . . . . . . . . . . . 150 C.2 (Sub)classification breakdown of our best classifier (Continued) . . . . 151 C.2 (Sub)classification breakdown of our best classifier (Continued) . . . . 152 C.2 (Sub)classification breakdown of our best classifier (Continued) . . . . 153 C.2 (Sub)classification breakdown of our best classifier (Continued) . . . . 154 C.2 (Sub)classification breakdown of our best classifier (Continued) . . . . 155 D.1 All architectures used in experiments . . . . . . . . . . . . . . . . . . . . 157 D.1 All architectures used in experiments (Continued) . . . . . . . . . . . . 158 D.2 Pretraining details set 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 D.3 Supervised experiments details set 1 . . . . . . . . . . . . . . . . . . . . . 160 D.3 Supervised experiments details set 1 (Continued) . . . . . . . . . . . . . 161 D.3 Supervised experiments details set 1 (Continued) . . . . . . . . . . . . . 162 D.3 Supervised experiments details set 1 (Continued) . . . . . . . . . . . . . 163 D.3 Supervised experiments details set 1 (Continued) . . . . . . . . . . . . . 164 D.3 Supervised experiments details set 1 (Continued) . . . . . . . . . . . . . 165 D.4 Pretraining details set 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 D.5 Supervised experiments details set 2 . . . . . . . . . . . . . . . . . . . . . 167 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 168 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 169 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 170 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 171 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 172 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 173 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 174 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 175 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 176 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 177 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 178 D.5 Supervised experiments details set 2 (Continued) . . . . . . . . . . . . . 179 D.6 Pretraining details set 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 x List of Tables D.6 Pretraining details set 3 (Continued) . . . . . . . . . . . . . . . . . . . . . 181 D.6 Pretraining details set 3 (Continued) . . . . . . . . . . . . . . . . . . . . . 182 D.6 Pretraining details set 3 (Continued) . . . . . . . . . . . . . . . . . . . . . 183 D.7 Supervised experiments details set 3 . . . . . . . . . . . . . . . . . . . . . 184 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 185 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 186 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 187 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 188 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 189 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 190 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 191 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 192 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 193 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 194 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 195 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 196 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 197 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 198 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 199 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 200 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 201 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 202 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 203 D.7 Supervised experiments details set 3 (Continued) . . . . . . . . . . . . . 204 D.8 Pretraining details set 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 D.9 Supervised experiments details set 4 . . . . . . . . . . . . . . . . . . . . . 206 D.9 Supervised experiments details set 4 (Continued) . . . . . . . . . . . . . 207 D.9 Supervised experiments details set 4 (Continued) . . . . . . . . . . . . . 208 D.9 Supervised experiments details set 4 (Continued) . . . . . . . . . . . . . 209 D.9 Supervised experiments details set 4 (Continued) . . . . . . . . . . . . . 210 D.10 Semi-supervised experiments details . . . . . . . . . . . . . . . . . . . . 211 xi List of Figures 1.1 Classical machine learning vs. Deep learning . . . . . . . . . . . . . . . 6 2.1 Botnet C&C-topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Botnet life-cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Botnet Architectural Evolution . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Benign DNS A-records from google.com . . . . . . . . . . . . . . . . . . . 13 2.5 ZeuS Gameover Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 Botnet detection taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Artificial Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Several activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 The derivative of sigmoid and tanh . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Shallow Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 Example of a Recurrent Neural Network . . . . . . . . . . . . . . . . . . 32 3.6 Effect different learning rates . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.7 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.8 Deep learning taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.9 Discriminative versus Generative . . . . . . . . . . . . . . . . . . . . . . . 38 3.10 Ladder network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.11 Optima: Local and global minimum . . . . . . . . . . . . . . . . . . . . . 45 3.12 Over-, balanced, and underfitting . . . . . . . . . . . . . . . . . . . . . . . 48 3.13 Maxout activation function . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.14 Stacked Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.15 Dimensional reduction from 3D (left) to 2D (middle and right) . . . . 55 3.16 Visualizing high-dimensional data . . . . . . . . . . . . . . . . . . . . . . 56 5.1 xii The architecture of the botnet detection system based on deep learning 69 5.2 Empirical Cumulative Distribution Functions (CDF) for (non)botnet dataset 72 5.3 IP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 List of Figures 5.4 TCP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.5 75 UDP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Input breakdown. 319 packets with 13 inputs and 1 with 12 inputs results in 4158 inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 76 Error rate for different layers/node configurations for tanh/dropout 10%/40% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.2 t-SNE process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3 t-SNE scatterplot of 50000 input vectors (4158D) . . . . . . . . . . . . . 95 6.4 t-SNE scatterplot output of the first hidden layer (5000D) . . . . . . . 96 6.5 t-SNE scatterplot of the second hidden layer (3000D) . . . . . . . . . . 97 6.6 t-SNE scatterplot of the third hidden layer (1500D) . . . . . . . . . . . . 98 6.7 Scatterplot of the output layer . . . . . . . . . . . . . . . . . . . . . . . . . 99 xiii Acronyms AUC Area Under Curve. C2 Command and Control. C&C Command and Control. CPU Central Processing Unit. CSV Comma Seperated Value. DBN Deep Belief Network. DDoS Distributed Denial of Service. DGA Domain Generation Algorithm. DNN Deep Neural Network. DNS Domain Name Server. FFNN Feed Forward Neural Network. FN False Negative. FP False Positive. GPU Graphics Processing Unit. IDS Intruder Detection System. IP Internet Protocol. xiv Acronyms MLP Multi Layer Perceptron. NHST Null Hypothesis Significance Testing. P2P Peer-to-Peer. PCA Princi...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern