From cd8e492dba7209c609d07865187f596e21b802d4 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Sat, 8 Apr 2023 11:32:42 +0100
Subject: [PATCH]

---
 openpower/sv/rfc/ls012.mdwn | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn
index 59ac55c23..395863cf2 100644
--- a/openpower/sv/rfc/ls012.mdwn
+++ b/openpower/sv/rfc/ls012.mdwn
@@ -15,7 +15,8 @@ months.
 *It is expected that readers visit and interact with the Libre-SOC resources
 in order to do due-diligence on the prioritisation evaluation. Otherwise
 the ISA WG is overwhelmed by piecemeal RFCs that may turn out not
-to be useful, against a background of having no guiding overview*.
+to be useful, against a background of having no guiding overview
+or pre-filtering*.
 
 Worth bearing in mind during evaluation that every "Defined
 Word" may or may not be Vectoriseable, but that every "Defined Word"
@@ -163,7 +164,24 @@ implement the VSX PackedSIMD paradigm, it becomes necessary to upgrade SFFS
 such that it is stand-alone capable. One omission based on the assumption
 that VSX would always be present is an equivalent to `xvtstdcsp`.
 
+## (f)mv.swizzle
 
+[[sv/mv.swizzle]] is dicey. It is a 2-in 2-out operation whose value as a Scalar
+instruction is limited *except* if combined with `cmpi` and SVP64Single
+Predication, whereupon the end result is the RISC-synthesis of Compare-and-Swap,
+in two instructions.
+
+Where this instruction comes into its full value is when Vectorised.  3D GPU
+and HPC numerical workloads astonishingly contain between 10 to 15% swizzle
+operations: access to YYZ, XY, of an XYZW Quaternion, performing balancing
+of ARGB pixel data. The usage is so high that 3D GPU ISAs make Swizzle a first-class
+priority in their VLIW words. Even 64-bit Embedded GPU ISAs have a staggering
+24-bits dedicated to 2-operand Swizzle.
+
+So as not to radicalise the Power ISA the Libre-SOC team decided to introduce
+mv Swizzle operations, which can always be Macro-op fused in exactly the same
+way that ARM SVE predicated-move extends 3-operand "overwrite" opcodes to full
+independent 3-in 1-out.
 
 
 
-- 
2.30.2