From: lkcl Date: Sun, 16 Apr 2023 10:32:44 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls009_v1~29 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=e61935ea7e5a852d2c98668480259c7fc01ab1b2;p=libreriscv.git --- diff --git a/openpower/sv/remap/appendix.mdwn b/openpower/sv/remap/appendix.mdwn index 8c67325ab..7e6945e85 100644 --- a/openpower/sv/remap/appendix.mdwn +++ b/openpower/sv/remap/appendix.mdwn @@ -98,7 +98,7 @@ If the programmer can find any algorithm which has identical triple nesting then the FFT Schedule may be used even there. -# 4x4 Matrix to vec4 Multiply Example +# 4x4 Matrix to vec4 Multiply (4x4 by 1x4) The following settings will allow a 4x4 matrix (starting at f8), expressed as a sequence of 16 numbers first by row then by column, to be multiplied @@ -140,14 +140,14 @@ equivalent sequence thus is issued: fmac f7, f3, f23, f7 ``` -The only other instruction required is to ensure that f4-f7 are -initialised (usually to zero). +Hardware should easily pipeline the above FMACs and as long as each FMAC +completes in 4 cycles or less there should be 100% sustained throughput, +from the one single Vector FMAC. -It should be clear that a 4x4 by 4x4 Matrix Multiply, being effectively -the same technique applied to four independent vectors, can be done by -setting VL=64, using an extra dimension on the SHAPE0 and SHAPE1 SPRs, -and applying a rotating 1D SHAPE SPR of xdim=16 to f8 in order to get -it to apply four times to compute the four columns worth of vectors. +The only other instruction required is to ensure that f4-f7 are +initialised (usually to zero) however obviously if used as part +of some other computation, which is frequently the case, then +clearly the zeroing is not needed. [[!tag standards]]