Nvdia had significant trouble releasing the first GF100 Fermi-based GPU back in March, with launch delays letting ATI rule the DX11 battleground for roughly six months. It didn’t help that when the GeForce GTX 480 1.5GB did find its way to our test rigs, it wasn’t fast enough for the price, and also ran hot, loud and consumed too much power. What a difference a few months make, as the new GeForce GTX 580 1.5GB solves all of these issues.
We’ve leave the GeForce GTX 580 1.5GB performance analysis till after the benchmarks, and instead talk about the upgrades that the GPU and the reference card have received on this and the next page.
GPU Upgrades
While Nvidia has made the obvious move of unlocking the 16th and final SM (Streaming Multiprocessor, or ‘stream processor cluster’ in neutral terminology) of the GF100 architecture for the GeForce GTX 580 1.5GB, the new GF110 codename reveals that this isn’t the full extent of the changes. The most significant difference between GF110 and GF100 is the use of different grades of transistor.Typically a GPU will use the fastest switching transistors to attain the highest possible frequencies, but these transistors are also the most leaky, resulting in higher power consumption and more waste heat. With the GF110 design, Nvidia has used less leaky transistors for non-performance-critical areas of the GPU, thus lowering the overall power draw of the chip.
However, the power-saving transistors – despite their slower switching tendencies – haven’t lead to a lower GPU frequency for the GeForce GTX 580 1.5GB. The GTX 580 1.5GB’s GPU core operates at 772MHz rather than the 700MHz of the GeForce GTX 480 1.5GB, with the 512 stream processors ripping along at 1,544MHz rather than 1,400MHz.
Nvidia has also added temperature and power draw monitoring to the GeForce GTX 580 1.5GB via two additional chips on the card. This means that if the GPU or the card’s VRMs get too hot or try to draw more power than is safe, the GPU will clock down to avoid damage to the hardware.
There are three things to note about this power management technology, the first being that as it’s enabled by two separate chips on the card - board partners can choose to leave them off to lower the cost of their card. Secondly, the GPU won’t increase in frequency if the power draw or temperature are lower than the maximums – the technology is more akin to Intel’s SpeedStep than Turbo Boost. Finally, the monitoring is software-based and at the moment only detects OCCT and the latest version of FurMark. This means that any thermal or power draw test using these applications are inaccurate, but as we use 3DMark06 to stress the GPUs, our numbers are perfect.
While a hardware-only implementation of power monitoring might be preferable – it would be automatic, based on the actual power draw or temperature of the components, and not be dependent on driver interaction – Nvdia said that actually a software-based implementation had more advantages. It allows Nvidia to be more flexible, letting Nvidia add extra applications that it finds to be particularly power hungry. However, the option to disable power monitoring isn't exposed in the driver, so extreme overclockers and anyone wanting to verify that their card is working properly and not overheating will have to be careful.
Apart from the unlocked SM, the GPU has seen its high-precision fp16 capabilities increased, which Nvidia claims improves performance by 4-12 per cent alone.
GeForce GTX 580 Specifications
The headline specification is undoubtedly the enabling of the 16th and final SM (Streaming Multiprocessor, or ‘stream processor cluster’ in neutral terminology) of the GF100 Fermi design.However, the new GF110 design still uses the 32 stream processors per SM layout of the original GF100 rather than the 48 per SM of the GeForce GTX 460 GPU.
Even Nvidia sees the 32 layout as a less efficient design, but we suspect that it’s not possible to get four GPCs (Graphics Processing Clusters) with four SMs each onto a die small enough to actually make if each SM contained 48 stream processors rather than 32. In the end, brute force wins out.
Perhaps this layout will change with TSMC’s 28nm process, but that’s not due until halfway through 2011, with GPUs based on this process (from ATI and Nvidia) pencilled in for the autumn of that year.
As well as the extra resources, and the increased high-precision fp16 capabilities (which Nvidia claims is worth a 4-12 per cent performance increase), the GeForce GTX 580 1.5GB operates at higher frequencies than the GeForce GTX 480 1.5GB. While the GPU core of the latter runs at 700MHz (meaning that its 480 stream processors operate at 1.4GHz) the GPU core of the GeForce GTX 580 1.5GB runs at 772MHz, with its 512 stream processors clocked at 1.544GHz.
The 1.5GB of GDDR5 memory also runs faster, with an effective frequency of 4.08GHz rather than 3.7GHz. While this gives the GTX 580 1.5GB more memory bandwidth, the rest of the GPU is the same, with the same 384-bit memory interface and 48 ROPs. The reason for the rise in texture units (from 60 to 64) is because each SM of the GF100 design contains four texture units - unlocking the 16th SM unlocked four more textures.
Nvidia GeForce GTX 580 1.5GB | Nvidia GeForce GTX 480 1.5GB | Nvidia GeForce GTX 470 1,280MB | ATI Radeon HD 5870 1GB | ATI Radeon HD 6870 1GB | ATI Radeon HD 5970 2GB | |
GPU | ||||||
Codename | GF110 | GF100 | GF100 | Cypress XT | Barts XT | Hemlock XT |
Frequency | 772MHz | 700MHz | 607MHz | 850MHz | 900MHz | 725MHz |
Stream Processors | 512 (1,544MHz) | 480 (1.4GHz) | 448 (1,215MHz) | 1,600 (850MHz) | 1,120 (900MHz) | 2 x 1,600 (725MHz) |
Layout | 16 SMs, 4 GPCs | 15 SMs, 4GPCs | 14 SMs, 4 GPCs | 20 SIMD engines | 14 SIMD engines | 2 x 20 SIMD engines |
Rasterisers | 4 | 4 | 4 | 2 | 2 | 2 x 2 |
Tesselation Units | 16 | 15 | 14 | 1 | 1 | 2 x 1 |
Texture Units | 64 | 60 | 56 | 80 | 56 | 2 x 80 |
ROPs | 48 | 48 | 40 | 32 | 32 | 2 x 32 |
Transistors | 3 billion | 3 billion | 3 billion | 2.15 billion | 1.7 billion | 2 x 2.15 billion |
Size | 530mm2 | 530mm2 | 530mm2 | 334mm2 | 255mm2 | 2 x 334mm2 |
Process | 40nm | 40nm | 40nm | 40nm | 40nm | 40nm |
Memory | ||||||
Amount | 1.5GB GDDR5 | 1.5GB GDDR5 | 1,280MB GDDR5 | 1GB GDDR5 | 1GB GDDR5 | 2 x 1GB GDDR5 |
Frequency | 1.02GHz (4.08GHz effective) | 924MHz (3.7GHz effective) | 837MHz (3.2GHz effective) | 1,050MHz (4.2GHz effective) | 1,050MHz (4.2GHz effective) | 1GHz (4GHz effective) |
Interface | 384-bit | 384-bit | 320-bit | 256-bit | 256-bit | 2 x 256-bit |
Bandwidth | 192.4GB/sec | 177GB/sec | 134GB/sec | 134.4GB/sec | 134.4GB/sec | 2 x 128GB/sec |
Card Specifications | ||||||
Power Connectors | 1 x 6-pin, 1 x 8-pin PCI-E | 1 x 6-pin, 1 x 8-pin PCI-E | 2 x 6-pin PCI-E | 2 x 6-pin PCI-E | 2 x 6-pin PCI-E | 1 x 6-pin, 1 x 8-pin PCI-E |
Maximum Power Draw | 244W | 250W | 215W | 188W | 151W | 294W |
Idle Power Draw | Unspecified | Unspecified | Unspecified | 27W | 19W | Unspecified |
Recommended PSU | 600W | 600W | 550W | 500W | Unspecified | Unspecified |
Typical Street Price | £400 | £330 | £200 | £320 | £220 | £490 |
How we tested
As always, we did our best to deliver a clean set of benchmarks, with each test repeated three times and an average of those results is what we’re reporting here. In the rare case where performance was inconsistent, we continued repeating the test until we got three results that were consistent.The tests performed are a mixture of custom in-game timedemos and manually played sections, using FRAPS to record the average and minimum frame rates. We strive to not only record real-world performance you will actually see, but also present the results in a manner that is easy to digest.
Intel Core i7 Test System
- Intel Core i7-965 processor (3.2GHz: 133MHz x 24)
- Asus P6T V2 motherboard (Intel X58 Express with three PCI-Express 2.0 x16 slots)
- 3x 2GB Corsair TR3X6G1333C9 memory modules (operating in dual channel at DDR3 1,600MHz 9-9-9-24-1T)
- Corsair X128 120GB SSD running v1 firmware
- Corsair HX1000W PSU
- Windows 7 Home Premium x64
- Antec Twelve Hundred Chassis
ATI graphics cards
- ATI Radeon HD 6870 1GB (900MHz GPU, 4.2GHz memory) using Catalyst 10.9 WHQL
- ATI Radeon HD 6850 1GB (725MHz GPU, 4GHz memory) using Catalyst 10.9 WHQL
- ATI Radeon HD 5970 2GB (2 x 725MHz GPU, 2 x 4GHz memory) using Catalyst 10.9 WHQL
- ATI Radeon HD 5870 1GB (850MHz GPU, 4.8GHz memory) using Catalyst 10.9 WHQL
- ATI Radeon HD 5850 1GB (725MHz GPU, 4.0GHz memory) using Catalyst 10.9 WHQL
- ATI Radeon HD 5770 1GB (850MHz GPU, 4.8GHz memory) using Catalyst 10.9 WHQL
Nvidia graphics cards
- Nvidia GeForce GTX 580 1.5GB (772MHz core, 1,544MHz stream processors, 4.08GHz memory) using release driver
- Nvidia GeForce GTX 480 1.5GB (756MHz core, 1,512MHz stream processors, 3.8GHz Memory) using GeForce 260.89 WHQL
- Nvidia GeForce GTX 470 1.2GB (607MHz core, 1,215MHz stream processors, 3.3GHz memory) using GeForce 260.89 WHQL
- Nvidia GeForce GTX 460 1GB (675MHz core, 1,350MHz stream processors, 3.6GHz memory) using GeForce 260.89 WHQL
- Nvidia GeForce GTX 460 768MB (675MHz core, 1,350MHz stream processors, 3.6GHz memory) using GeForce 260.89 WHQL
Games Tested
- Colin McRae: Dirt 2 (DX11)
- Arma II: Operation Arrowhead (DX9)
- Just Cause 2 (DX11)
- Battlefield: Bad Company 2 (DX11)
Colin McRae: Dirt 2
Publisher: CodemastersFrom our Colin McRae: Dirt 2 review:
“While Dirt ’s shift away from ‘pure’ rallying to a more contemporary styling will likely divide players, there’s no arguing that the game itself looks simply stunning, improving upon Race Driver: GRID's EGO engine to make Dirt 2 one of the best looking racers we’ve ever seen. Both cars and tracksides are lavishly detailed, and there are dozens of gorgeous touches, from the spattering mud in jungle stages, to the jaw dropping water effects from the driver’s cam when you hit a water hazard.”
We drive a lap around the London, Battersea track, with a full eight-car grid, starting at the back. We use the maximum image quality settings in DX11 mode. We repeat each test three times, discarding anomalous results and averaging the consistent ones.
No comments:
Post a Comment