From cd8e492dba7209c609d07865187f596e21b802d4 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sat, 8 Apr 2023 11:32:42 +0100 Subject: [PATCH] --- openpower/sv/rfc/ls012.mdwn | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn index 59ac55c23..395863cf2 100644 --- a/openpower/sv/rfc/ls012.mdwn +++ b/openpower/sv/rfc/ls012.mdwn @@ -15,7 +15,8 @@ months. *It is expected that readers visit and interact with the Libre-SOC resources in order to do due-diligence on the prioritisation evaluation. Otherwise the ISA WG is overwhelmed by piecemeal RFCs that may turn out not -to be useful, against a background of having no guiding overview*. +to be useful, against a background of having no guiding overview +or pre-filtering*. Worth bearing in mind during evaluation that every "Defined Word" may or may not be Vectoriseable, but that every "Defined Word" @@ -163,7 +164,24 @@ implement the VSX PackedSIMD paradigm, it becomes necessary to upgrade SFFS such that it is stand-alone capable. One omission based on the assumption that VSX would always be present is an equivalent to `xvtstdcsp`. +## (f)mv.swizzle +[[sv/mv.swizzle]] is dicey. It is a 2-in 2-out operation whose value as a Scalar +instruction is limited *except* if combined with `cmpi` and SVP64Single +Predication, whereupon the end result is the RISC-synthesis of Compare-and-Swap, +in two instructions. + +Where this instruction comes into its full value is when Vectorised. 3D GPU +and HPC numerical workloads astonishingly contain between 10 to 15% swizzle +operations: access to YYZ, XY, of an XYZW Quaternion, performing balancing +of ARGB pixel data. The usage is so high that 3D GPU ISAs make Swizzle a first-class +priority in their VLIW words. Even 64-bit Embedded GPU ISAs have a staggering +24-bits dedicated to 2-operand Swizzle. + +So as not to radicalise the Power ISA the Libre-SOC team decided to introduce +mv Swizzle operations, which can always be Macro-op fused in exactly the same +way that ARM SVE predicated-move extends 3-operand "overwrite" opcodes to full +independent 3-in 1-out. -- 2.30.2