From: Luke Kenneth Casson Leighton Date: Tue, 5 Dec 2023 15:55:47 +0000 (+0000) Subject: add improvements section X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=b421f0cc6b35bf138235ca2d3f505611c2dc5a2d;p=libreriscv.git add improvements section --- diff --git a/openpower/sv/cookbook/pospopcnt.mdwn b/openpower/sv/cookbook/pospopcnt.mdwn index 0c3bd5eb4..9ec37adcb 100644 --- a/openpower/sv/cookbook/pospopcnt.mdwn +++ b/openpower/sv/cookbook/pospopcnt.mdwn @@ -178,4 +178,22 @@ the correct number of *blocks* (of length 8). sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve ``` +# Improvements + +There exist many opportunities for parallelism that simpler hardware +would need to have in order to maximise performance. On Out-of-Order +hardware the above extremely simple and very clear algorithm will +achieve extreme high levels of performance simply by exploiting +standard Multi-Issue Register Hazard Management. + +However simpler hardware - in-order - will need a little bit of +assistance, and that can come in the form of expanding to QTY4 or +QTY8 64-bit blocks (so that sv.popcntd uses MVL=VL=32 or MVL=VL=64), +`gbbd` becomes an `sv.gbbd` but VL being set to the block count +(QTY4 or QTY8), and the SV REMAP Parallel Reduction Schedule being +applied to each intermediary result rather than using an array +of straight accumulators `r16-r23`. However this starts to push +the boundaries of the number of registers needed, so as an +exercise is left for another time. + [[!tag svp64_cookbook ]]