From: lkcl Date: Sun, 16 Apr 2023 10:34:19 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls009_v1~28 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=44abf4016b6ea62b016c5874e0970c55229f39db;p=libreriscv.git --- diff --git a/openpower/sv/remap/appendix.mdwn b/openpower/sv/remap/appendix.mdwn index 7e6945e85..0ef9e0b38 100644 --- a/openpower/sv/remap/appendix.mdwn +++ b/openpower/sv/remap/appendix.mdwn @@ -78,27 +78,8 @@ pipeline overlaps. Out-of-order / Superscalar micro-architectures with register-renaming will have an easier time dealing with this than DSP-style SIMD micro-architectures. -## REMAP FFT, DFT, NTT - -The algorithm from a later section of this Appendix shows how FFT REMAP works, -and it may be executed as a standalone python3 program. -The executable code is designed to illustrate how a hardware -implementation may generate Indices which are completely -independent of the Execution of element-level operations, -even for something as complex as a Triple-loop Tukey-Cooley -Schedule. A comprehensive demo and test suite may be found -[here](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;hb=HEAD) -including Complex Number FFT which deploys Vertical-First Mode -on top of the REMAP Schedules. - -Other uses include more than DFT and NTT: as abstracted RISC-paradigm -the Schedules are not -restricted in any way or tied to any particular instructtion. -If the programmer can find any algorithm -which has identical triple nesting then the FFT Schedule may be -used even there. -# 4x4 Matrix to vec4 Multiply (4x4 by 1x4) +### 4x4 Matrix to vec4 Multiply (4x4 by 1x4) The following settings will allow a 4x4 matrix (starting at f8), expressed as a sequence of 16 numbers first by row then by column, to be multiplied @@ -149,6 +130,26 @@ initialised (usually to zero) however obviously if used as part of some other computation, which is frequently the case, then clearly the zeroing is not needed. +## REMAP FFT, DFT, NTT + +The algorithm from a later section of this Appendix shows how FFT REMAP works, +and it may be executed as a standalone python3 program. +The executable code is designed to illustrate how a hardware +implementation may generate Indices which are completely +independent of the Execution of element-level operations, +even for something as complex as a Triple-loop Tukey-Cooley +Schedule. A comprehensive demo and test suite may be found +[here](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;hb=HEAD) +including Complex Number FFT which deploys Vertical-First Mode +on top of the REMAP Schedules. + +Other uses include more than DFT and NTT: as abstracted RISC-paradigm +the Schedules are not +restricted in any way or tied to any particular instructtion. +If the programmer can find any algorithm +which has identical triple nesting then the FFT Schedule may be +used even there. + [[!tag standards]] ---------