Build Thread 1.0
An Nvidia engineer has stated that the 512MB is still around 4 times faster than system memory, and the GPU still has 224 GB/s bandwidth if memory is being accessed from both pools. The only time bandwidth is 28 GB/s is when the GPU is only accessing the 512MB pool, and it's possible for the OS, drivers, and game engine to stop that happening.
Hm, as far as I've gathered, if the GPU is accessing the 512MB in question, it has to do so exclusively. So it *will* impact performance significantly at this specifc conditions, because not only is the bandwidth impaired for these 512MB, but also will the memory controller have to switch modes between accessing parts of the other 3.5 GB and these 512MB. But it is hard to construct settings where this pool is accessed exclusively: People testing for this issue specifically will have to create conditions where the memory allocation is *not* handled by the driver that would try its best to keep the usage below 3.5 or above 4GB to ask the issue, and where the memory allocated is strictly between 3.5 and 4GB, with the upper part being accessed most...
In normal usage, a user will probably just feel a little more stutter when some of the impaired memory blocks are accessed every now and then or if the driver swaps data to/from main memory to avoid the weak 512MB. But if the "sour spot" is achieved, something like that can result:
GTX 980 underclocked to match the GTX 970 in compute, texture and pixel fillrate (it's unclear to me whether this equality in theoretical horse power is achieved taking into account nVidias original claims regarding the 970's ROP specs or the newly revised actual specs): http://www.pcgamesha...SAApng-pcgh.png
Frametimes and RAM usage (usage as reported by tools at least known not to really be able to cope with the GTX 970's unique setup)
vs. said underclocked GTX 980
(if deep links fail, please manually copy / paste the links into the address bar)
As for the possibilities to avoid the situation: game engines almost certainly will not account for one GPU's particular and very specific weaknesses, unless nVidia pays the developers for the extra effort and layer of abstraction they'd have to insert for that. The same is true for an OS, especially if developed prior to the common knowledge of this issue - which is every OS that is available now or appearing soon. So if anywhere, I'd expect this to be handled at driver level.
And regarding the claim of that engineer that 28GB/s is four times faster than main memory bandwidth: if he was not talking about access times or anything like that, I'd be ashamed if I were him Even a very common case of, let's say, a dual channel pair of 1600 DDR 3 (aka. PC3-12800, which shows the MB/s for one channel) offers bandwidth similar to that (at least between CPU and RAM, while the bandwidth available between GPU and system RAM is capped by a PCI-e 3 x16 at a theoretical maximum of about 16GB/s)
To clarify: I still think the 970 offers quite competitive performance and even better efficiency for its price. It's just the misleading PR (down to actually false technical specs) surrounding it that I'm trying to debunk here.
Edited by Fionavar, 28 January 2015 - 09:06 PM.
Closed Build Thread 1.0