Sunday, October 6, 2013

Thermal-Aware Computing

Temperature has reared its head as an important issue.  And it's not just a challenge for server farms.  All types of embedded devices must be designed to meet thermal-related design specifications.

Thermal behavior affects system operation in several ways.  Heat generation causes problems for the environment in which the device operates.  In machine rooms, we want to pack as many CPU and disk drives into the room as possible; perhaps even more important, we don't want to spend any more money than we have to cooling the room.  Thus, the simple quantity of energy transferred into the room from the processor is a big issue in machine rooms.  Machine room designers also have to worry about how air circulates.   Poor air circulation leads to hot spots that cause devices to fail.

Excess temperatures also lead to catastrophic component failure.  And I don't use the word catastrophic lightly.  Search for "overheating pentium" on youtube to find a variety of videos that show CPUs running so hot that they start to smoke.  These failures are so catastrophic because they are the result of positive feedback: high temperatures increase leakage currents; high leakage currents increase temperature.

In consumer electronics, the traditional goal for temperature management has been to avoid fans. A fan, even a well-designed one, creates some amount of background noise that is particularly undesirable in multimedia applications.  Of course, a fan on a cell phone is just plain ridiculous.  Today's cell phones can get pretty darn warm to the touch if you run a data-intensive application for a few minutes.

But temperature has a more subtle effect on system lifetime.  A CPU doesn't have to catch fire to be damaged.  Chips can fail in all sorts of ways: dopants rediffuse, oxides break down, wires break down.  Most of these failure mechanisms are temperature-dependent and can be an exponential function of temperature.  The hotter you run, the shorter your lifetime. Running just under the maximum temperature for your device doesn't avoid trouble.

No comments:

Post a Comment