The primary focus is on 32-bit (single-precision floating-point) performance
anyway, for 3D, so if 64-bit operations happen to have half the number of
Reservation Stations / Function Units, and block more often, we actually
-don't mind so much.
+don't mind so much. Also, we can still apply the same "banks" trick on
+the Register File, except this time with 4-way multiplexing on 32-bit
+wide banks, and 4x4 crossbars on the bytes:
+
+{{register_file_multiplexing.jpg}}
+
+