10-Debugging

10-Debugging - CSE 265: System and Network Administration...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Spring 2008 CSE 265: System and Network Administration ©2004-2008 Brian D. Davison CSE 265: CSE 265: System and Network Administration System and Network Administration Debugging Learn the customer's problem Find the root cause and fix it Have the right tools Fixing things once Fix things once, rather than over and over Avoid the temporary fix trap Learning from carpenters
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Spring 2008 CSE 265: System and Network Administration ©2004-2008 Brian D. Davison Learn the customer's problem Learn the customer's problem Step one: understand (at a high level) what the user is trying to do, and what part is failing The customer expects a particular result from some action, but is getting something else Ex: My mail program is broken I can't reach the mail server My mailbox disappeared! Any could be true, but the real problem could be DNS, a power failure, a network problem, etc. When complete, make sure the customer agrees!
Background image of page 2
Spring 2008 CSE 265: System and Network Administration ©2004-2008 Brian D. Davison Example #1: tape failures Example #1: tape failures As every System Administrator knows, reliable backups are a must. Because of this my team became suitably concerned when the operators handling our central database servers started to report "tape failures." The failures soon became regular, and required regular manual intervention to keep operational. In investigating the cause of this problem, corporate security and production floor rules forced us to depend on the operators for information. The operations staff placed the blame on the off-site tape-storage service's jostling tapes during transport, and requests for samples of failed tapes gave no indication as to the cause.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Spring 2008 CSE 265: System and Network Administration ©2004-2008 Brian D. Davison Example #1 continued Example #1 continued The root cause of the problem didn't become obvious until this had been going on for a couple of months. During a large system upgrade, my team was able to observe the operators at work. The operations staff had been out-sourced to a low-cost contracting firm that apparently contained a large percentage of fans of the local professional hockey team. The operators were skidding the 8mm tapes across the computer-room floor like a hockey puck instead of carrying them across the floor. Adding a rule prohibiting throwing, skipping, and sliding of backup tapes quickly restored backups to a reliable state. Tape Hockey , by Allen Peckham
Background image of page 4
Spring 2008 CSE 265: System and Network Administration ©2004-2008 Brian D. Davison Find the problem's cause and fix it Find the problem's cause and fix it Workarounds are good, but fixing the root cause is much better Rebooting/restarting is a common workaround E.g., solution for full-disk problem is not to delete old log files Improving the speed of reboots is not really the solution either!
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 22

10-Debugging - CSE 265: System and Network Administration...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online