Troubleshooting Win2K: Crash causes

If you've worked with computers for any length of time, you've no doubt encountered situations in which a crash was so severe that it took a full day (or longer) to restore the system to working order. Meanwhile, your workload continued to pile up, and users were kept waiting for less significant repairs while you tackled the big one.

Unfortunately, there's no foolproof method of preventing a massive crash. You can significantly reduce the amount of time required to recover from a really big crash, though, if you know what caused it in the first place. This knowledge can also help you prevent crashes.

Let's take a look at the root causes of Win2K crashes. One note before we get started: Although people have different ideas of what constitutes a major crash, for the purposes of this article, I'll define a major crash as one that prevents the Windows 2000 operating system from booting.

Incorrect device drivers

One of the most frequent causes of a system failing to boot is an incorrect driver. An incorrect driver can be one of the easiest problems to track down and fix. If you change a driver and suddenly the system fails to boot, it's pretty obvious that the driver is probably what's causing the problem. What makes this problem even easier to track down is that most of the time, only a handful of device drivers have the potential to cause a boot failure.

If you suspect an incorrect device driver, the device with the incorrect driver is most often a video adapter, network card, sound card, or some other high-profile hardware component that's used during the boot process. A modem driver or a printer driver, even if incorrect, usually won't cause a boot failure because modems and printers usually aren't initialized during the boot sequence.

The incorrect device driver also probably won't have anything to do with lower-level system components such as hard drives, CD-ROM drives, USB, LPT, or serial ports because these items typically rely on generic device drivers that work for just about any system. The exception to this is SCSI devices, which rely on specific device drivers. An incorrect SCSI device driver can and usually will cause a boot failure.

Bad device drivers

A bad device driver is one that has been loaded appropriately but is malfunctioning. Sometimes drivers go bad when a registry entry or file that's associated with the driver is accidentally modified, deleted, or corrupted. Many of the rules that apply to incorrect device drivers apply to bad device drivers. A bad device driver will cause a boot failure only if the device that's associated with the bad driver is used during the boot process.

Hard disk corruption or failure

Another major cause of major system crashes is a hard disk failure or hard disk corruption. Obviously, if the hard disk that contains the boot or system partitions (or both) were to fail, booting the operating system would be impossible. Likewise, even if the hard disk doesn't physically fail, if some or all of the information contained on the system or boot partitions becomes corrupted, the boot process may also be impossible.

If the hard drive physically fails, the only real solution is to replace the drive and reload the operating system, restore a backup, or both. If the drive is still working but some corruption has occurred, things quickly become much more interesting. Getting the system back up and running becomes a question of what was on the failed drive or partition.

For example, if the failed drive or partition contained only the Windows 2000 operating system, the quickest and easiest solution might be to reformat the drive and restore a backup or reload the operating system. If, on the other hand, the affected drive or partition contained data, you'd probably be better off trying to salvage the drive or partition rather than simply reformatting it.

User tampering (security)

Another cause of massive failures is user tampering. I've seen more situations than I can count in which a user crashed a workstation. One example is the user who was running out of hard disk space and -corrected" the problem by erasing every file he didn't recognise (COMMAND.COM, WIN.COM, etc.).

User tampering isn't nearly as big an issue in Windows 2000 as it was in Windows 9x because of the integrated security. I have seen it happen, though. For example, in one situation, Windows 2000 workstations used the FAT file system instead of NTFS, and there was nothing preventing users from making changes to system files.

An even uglier situation involved the Windows 2000 security system. Windows 2000 is designed so that you can log in to either a domain or the local machine. Each individual workstation contains its own Administrator account that can be used to make changes to the individual machine's configuration.

A help desk technician logged in to a workstation to perform some routine maintenance. During the course of this maintenance, the technician received a phone call. The machine's user knew just enough about Windows 2000 to be dangerous and changed the local administrator's password. After the technician got off the phone, he finished the job, unaware of the password change. The rogue user then made a few changes and crashed the system. The technician was unable to get back into the system to fix the problem because the password had been changed.

As you can see, even a fairly secure operating system like Windows 2000 can be subject to user tampering if security policies and procedures are lax. It's impossible for me to tell you that if a user tampers with a system, you can follow a specified procedure to fix the problem. The user can do almost anything to the system. Fortunately, just about any type of tampering that's severe enough to crash the system falls into one of the other categories discussed here.

Incorrect version of files or missing files

Incorrect or missing system files can cause a crash. This can happen when files are accidentally deleted, when a buggy service pack is installed, or when a technician attempts to copy a missing file from another machine.

Viruses

All the e-mail viruses that have been going around lately have created an increased awareness of this threat. Although this type of virus usually can't prevent a system from booting, there are some that can, such as boot sector viruses and file viruses.

CPU failure

It may seem obvious that if a CPU fails, the system may not boot. Unfortunately, there are many types of CPU failures. For example, rather than the CPU completely going bad, one particular memory block may go bad.

Registry

One of the trickiest problems to fix is a bad registry entry. There's a very real chance that registry corruption can lead to a major crash.

Summary

Major systems crashes can be extremely disruptive to both users and the IT desk staff. When a major crash occurs, your goal is usually to recover as quickly as possible with minimal data loss. The first step in recovering from a really big crash, whether on a server or a workstation, is to understand the factors that could have led to that crash. Only then can you effectively begin the troubleshooting process.

TechRepublic is the online community and information resource for all IT professionals, from support staff to executives. We offer in-depth technical articles written for IT professionals by IT professionals. In addition to articles on everything from Windows to e-mail to fire walls, we offer IT industry analysis, downloads, management tips, discussion forums, and e-newsletters.

©2001 TechRepublic, Inc.

Advertisement

Talkback 4 comments

    A crash that takes a *day* to ...James R -- 31/12/01

    A crash that takes a *day* to recover from??? You ARE joking, aren’t you?!!

    What self respecting computer user would subject themselves to that sort self-abuse? What sort of self respecting organisation would risk such an obvious joke of an OS? I don’t think there is *any* possible justification for using an OS that can ever crash as seriously as that; I value my data WAY too much to risk it with an OS like the one you describe.

    I am studying computer science at Uni (so I REALLY know how to crash a computer) and I have used Mac OS X 10 since public beta (I am now up to 10.1.2), and the severest problem I have had to contend with (including intensive FreeBSD command line experimentation) is the Encyclopædia Britannica forcing my session to log out, which required a soft reboot.

    I have thrashed OS X senseless, doing nearly all of the things you aren’t supposed to do, and I haven’t lost a single bit of data (that I didn’t *accidentally* secure delete myself); the most time I have ever lost is booting into that weird, retro world of command lines only, to run fsck (i.e. around 3 minutes TOTAL to boot back into OS X proper).

    As such, I simply do not get it; why would *anyone* use an OS that wastes an entire day from a single crash, let alone the risk to data that such a sloppy system poses???

    Can someone please explain this? I am genuinely amazed.

    gleenogs orson cart -- 26/10/08 (in reply to #120007680)

    zarabanda might help,try it please.

    The answer, James R, is Market ...Robert McKenzie -- 02/01/02

    The answer, James R, is Marketing & Monopoly.

    Microsoft, like IBM before it, is skilled in using FUD to get people to pay high prices for very mediocre products.

    Eventually these customers no longer have any real choice at all and then face paying even higher prices.....

    But why don't people learn? Because they are either ignorant or biased (or often both). Sadly the cynical sayings "There's a sucker born every minute" and "Nobody ever went broke by underestimating the general public" are all too true!

    Give me a freakin break... you ...Anonymous -- 29/01/04

    Give me a freakin break... you two guys have lost your minds!! I usually just don't respond to such moronic comments... but on the off chance that you might even understand what I am explaining:

    Why in the world would an aspiring hacker write anything to attack an Apple OS? Heavyweight boxers don't fight unknowns... they fight someone that is going to give them some noteriety. Attacking an Apple OS is like beating up your little sister. You are delusional if you think it can't be done. The sad fact is that nobody cares enough to do it. Quit playing and go get real computers! =)

Add your opinion

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

  • David Braue Can not-so-smart meters help the NBN?
    It was interesting to witness Conroy's recent enthusiasm to spruik the NBN's role in supporting the Smart Grid, Smart City initiative. What a pity that Conroy hadn't yet seen the damning report from the Victorian auditor-general about that state's smart-meter roll-out.
  • Array Can the Telco Reform Act be win-win?
    In the second of our two programs looking at the Senate Inquiry into the Telecommunications Legislation Amendment Bill, we hear from shareholders, bureaucrats and industry groups.
  • Array Has New Zealand's smiling assassin delivered?
    One year into its tenure, how has the new New Zealand Government performed on issues of technology and telecommunications?
  • More blogs »

Tags

Back to top

Featured