Friday, April 30, 2010

Saga of an overheating graphics card

Late March 2010: I'm playing a video game (Awakening, the expansion to Dragon Age: Origins, which is pretty good; not as good as the original, but I didn't really expect it to be). I've been playing for maybe half an hour when my computer crashes. The screen goes blank, and the audio is stuck looping in the fraction of a second it was on when the crash happened. The computer is still on--the power and wireless indicator lights shine blue, and it still seems open to receiving some sort of input signal--my mouse optical sensor still lights up. But I can't do anything with the screen blank.

It's unexpected for sure. I have literally changed nothing about this computer since December, when I added a second DVD-RW drive. This is the first time the computer has as much as frozen or locked up since I installed Windows 7 back in October, but I don't read too much into it. I instead take my own computer advice: "You can solve nine computer problems out of ten by shutting the computer off and turning it back on. You might never know what went wrong or why, but you probably don't need to." It turns on just fine, and within a few minutes I'm back killing darkspawn. Then it crashes again, same behavior as before, except that it only takes ten minutes this time.

Except this time it doesn't turn back on. Now I start to worry: this has all the marks of a fried graphics card, and that's a hundred dollar hassle I don't need. I leave the computer alone for a while, watch some TV, read a book, and eventually work up the courage to turn it on again. It works! Everything seems to be running fine, except that one of the fans is going a little crazy. Turns out it's the graphics card fan.

Computer operates fine outside gaming (except for an overtaxed fan), gaming makes it crash, and it takes a while for whatever problem happened in the first place to go away? Sounds like a heat issue to me. How do you solve a problem like overheating? Hold your moonbeam can of compressed air in your hand, crack the case, and drive some dust from its nest. The CPU, power supply, and GPU fans are all a little dusty, but not enough to cause major problems, same with the motherboard heat sink. The case fan is squeaky clean. Not feeling I'd solved any problems, I turn the computer on again to find out that I haven't. The graphics card fan still runs hot, and ten minutes of Dragon Age makes it crash. I decide to leave the problem alone for a few days--I have more important things to worry about, like a trip to Texas--and I hope it goes away on its own.

Early April 2010: I do a Google search and find CPUID HW Monitor, a nice little free utility that monitors system health diagnostics, like fan RPM, temperature, and even voltages (if you're such a power user that seeing VIN0 at 1.68 V tells you anything at all, more power to you). I install it to confirm that it is in fact a heat problem, and boy is it. My CPU and hard drives are all running between 25 and 40 C, which is exactly where they should be. The GPU? Idling between 80 and 85 C, and peaking at 110 C while gaming before a crash. Granted, graphics cards are supposed--or at least accepted--to tun hotter than the rest of your machine, but when it's putting out enough heat to boil water, that signals a problem.

Easy ways to alleviate a heat problem? Blow out all the dust (check). Make sure the airflow inside the case isn't obstructed (check). Aim a box fan at it (totally infeasible). "Buy a slot fan and aim it at your GPU" sounds like my next step. I do that.

Middle April 2010: My slot fan arrives from Newegg. I quickly realize a major problem: I don't have a free slot open. I'm not going to sacrifice my wireless internet for gaming, that's for sure, but I decided to (temporarily) take the hit and remove my TV tuner card. The new fan screws into place fine, the computer boots up fine, and I check the temperature to see that it's now idling between 65 and 70 C, about a 15 degree reduction from where it was before. More importantly, when I run Dragon Age for a few minutes, the card only reaches 90 C or so, and the computer doesn't crash.

Another major problem arises pretty quickly: this thing is loud. It was probably assembled in some factory in Guangdong Province by a few dudes who have never seen a computer in their lives. After running for a few minutes, it expels this nasty raspy "rahhhh" that sounds a little like Lady Gaga singing the hook section of "Bad Romance". For the time being, I decide that "loud computer that plays games but doesn't record television" is preferable to "quiet computer that crashes if you try to game"... but just barely. I look for an alternate solution.

Late April 2010: My new toy arrives, a contraption touted as a "0 dB fanless cooling system" that mounts on the graphics card. It's designed to act like a fancy heat sink: a conductive material (here, copper) contacts the hottest part of the card and conduct the heat to a series of intricate fins that can dissipate the heat effectively because of their large surface area. (Quick, someone calculate the Prandtl and Nusselt numbers!) I take out my graphics card to attach this cooling system and quickly realize a major problem (this is becoming a theme here): this thing is never, ever going to fit in my case. In my mind, "mount on" the graphics card suggests it would be, oh, smaller than the graphics card. Not so.

Dejected, I put the GPU back in the case, and serendipitously forget to put Guangdong Fan in. Immediately, I notice that the GPU fan isn't working nearly as hard as it was before. I boot up HW Monitor with cautious optimism and see... 64 C? I start up Dragon Age with optimism even more cautious and see... 86 C? As little sense as it makes, I'm left with a computer that can once again play games and record TV and not crash, a five-dollar paperweight, and a twenty-five dollar paperweight. My only guess is that some long-forgotten piece of dust in the GPU air channels was jarred when I took it out of the case or put it back in.

Morals of the story: first, in an arena with sound ratings and power draws and conductances, sometimes the most important spec of all is good old physical size. Second, computers do whatever they want, and far be it from us to do anything about it.

Currently listening: "Sacrae Symphony No. 12", Giovanni Gabrieli

No comments: