Wider memory bus
We could probably make computers faster with a wider memory bus. Say, for example, that the bus speed is 800 Mhz and the CPU speed is 1.8 ghz. Then for every cycle of the bus, the CPU can process at most about 2 64-bit values, or 2 128-bit values with SIMD, or with four cores, perhaps 8 128-bit values. Thus, if we have a bus width of X bytes, we can define an opcode to retrieve up to X bytes of RAM from location Y and put it in the cache, and do it in fewer cycles. X could be, say, 128 bytes. (In the above scenario, requesting more than 32 bytes at once wouldn't be necessary for a given core, but perhaps a 128-byte-wide memory bus would aid in 4 cores each requesting 32 bytes at once.)
Since we don't know what the future of CPUs and bus speeds holds, for future compatibility we should probably make this bus width as wide as is practical, unless it can be increased later but in a way that's easily scalar as far as programmers are concerned.
This would save time every time memory is accessed sequentially, which is often.
I guess the pipelines within the RAM would have to be changed too? I don't know much about RAM.
Apparently a CPU has its own prediction mechanisms and automatic pre-caching. This automated pre-caching can equally take advantage of a wider memory bus, and conversely, my explicit op-code idea for pre-caching carries its own advantage independently of the idea of extending the bus width. That is, why leave it all up to the processor to predict what's going to happen, when the programmer (or possibly compiler/VM) can just *tell* it?