Well, that's pretty much what we have. Though a graphics chip (for example) has an advantage in that it is geared directly for graphics, and won't suffer from being a general purpose processor. It's entire purpose is to perform the intense vector calculations, and can benefit from the nature of graphics programming by gearing the chip on the graphics card to be particularly strong at it.
There can also be an advantage with respect to memory. The memory architecture of a main system is much more static. Graphics cards were able to implement DDR RAM, as well as widen the pipe (to 256 bit), as well as ramp up the speeds of the RAM, while system RAM has remainined relatively static. The graphics card also does not have to share its memory. This can be a double edged sword though, as unused video memory is not particularly usable (IIRC, I think Apple made a bit of a hack to get the graphics card to pick up some of the cycles for running Mac OS X, but I'm not 100% sure....it is not common though) by the rest of the system. Even then, it would lose much of its speed advantage as it would then have to go through the system bus (which may be fast enough with PCI-Express).
I think decentralizing the threads helps from a cost aspect as well. People on a budget can more easily mix and match parts.
Even with a multicore processor though, there's still something that brings the information together. The processors are just closer together :D