This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Detailed Diagnosis in Enterprise Networks Srikanth Kandula Ratul Mahajan Patrick Verkaik (UCSD) Sharad Agarwal Jitendra Padhye Paramvir Bahl Microsoft Research ABSTRACT By studying trouble tickets from small enterprise networks, we conclude that their operators need detailed fault diagnosis. That is, the diagnostic system should be able to diagnose not only generic faults (e.g., performance-related)but also application specific faults (e.g., error codes). It should also identify culprits at a fine gran- ularity such as a process or firewall configuration. We build a sys- tem, called NetMedic , that enables detailed diagnosis by harnessing the rich information exposed by modern operating systems and ap- plications. It formulates detailed diagnosis as an inference problem that more faithfully captures the behaviors and interactions of fine- grained network components such as processes. The primary chal- lenge in solving this problem is inferring when a component might be impacting another. Our solution isbasedon an intuitive technique that uses the joint behavior of two componentsin the past to estimate the likelihood of them impacting one another in the present. We find that our deployed prototype is effective at diagnosing faults that we inject in a live environment. The faultycomponent iscorrectlyidenti- fied as the most likely culprit in uniF64BuniF643uniF642 of the cases and is almost always in the list of top five culprits. Categories and Subject Descriptors C.uniF647 [Performance of systems] Reliability, availability, serviceability General Terms Algorithms, design, management, performance, reliability Keywords Enterprise networks, applications, fault diagnosis 1. INTRODUCTION Diagnosing problems in computer networks is frustrating. Mod- ern networks have many components that interact in complex ways. Configuration changes in seemingly unrelated files, resource hogs elsewhere in the network, and even software upgrades can ruin what worked perfectly yesterday. Thus, the development of tools to help operators diagnose faults has been the subject of much research and commercial activity [ uniF645 , uniF647 , uniF648 , uniF649 , uniF644uniF644 , uniF644uniF645 , uniF644uniF64A , uniF645uniF644 ]. Because little is known about faults inside small enterprise net- works, we conduct a detailed study of these environments. We reach a surprising conclusion. As we explain below, existing diagnostic sys- tems, designed with large, complex networks in mind, fall short at helping the operators of small networks. Our study is based on trouble tickets that describe problems re- ported by the operators of small enterprise networks. We observe that most problems in this environment concern application specific issues such as certain features not working or servers returning error codes. Generic problems related to performance or reachability are in a minority. The culprits underlying these faults range from bad application or firewall configuration to software and driver bugs.application or firewall configuration to software and driver bugs....
View Full Document
This note was uploaded on 12/01/2011 for the course EE 5373 taught by Professor Chao during the Spring '11 term at NYU Poly.
- Spring '11