From 3d09f43519d76ef22e34de26cc0ebea0aeab9888 Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Tue, 11 Apr 2023 18:05:33 +0100
Subject: [PATCH] add Vectorisation note

---
 openpower/sv/rfc/ls012.mdwn | 38 +++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn
index c376ed275..43c3226e5 100644
--- a/openpower/sv/rfc/ls012.mdwn
+++ b/openpower/sv/rfc/ls012.mdwn
@@ -390,6 +390,44 @@ offset computation, thus they are best placed in EXT0xx.
 
 \newpage{}
 
+# Vectorisation: SVP64 and SVP64Single
+
+To be submitted as part of [[ls001]], [[ls008]], [[ls009]] and [[ls010]],
+with SVP64Single to follow in a subsequent RFC, SVP64 is conceptually
+identical to the 50+ year old 8080 `REP` instruction and the Zilog Z80
+`CPIR` and `LDIR` instructions.  Parallelism is best achieved by exploiting
+a Multi-Issue Out-of-Order Micro-architecture.  It is extremely important
+to bear in mind that at no time does SVP64 add even one single actual
+Vector instruction.  It is a *pure* RISC-paradigm Prefixing concept only.
+
+This has some implications which need unpacking.  Firstly: in the future,
+the Prefixing may be applied to VSX.  The only reason it was not included
+in the initial proposal of SVP64 is because due to the number of VSX
+instructions the Due Diligence required is obviously five times higher
+than the 3+ years work done so far on the SFFS Subset.
+
+Secondly: **any** Scalar instruction involving registers **automatically**
+becomes a candidate for Vector-Prefixing.  This in turn means that when
+a new instruction is proposed, it becomes a hard requirement to consider
+not only the implications of its inclusion as a Scalar-only instruction,
+but how it will best be utilised as a Vectorised instruction **as well**.
+Extreme examples of this are the Big-Integer 3-in 2-out instructions that
+use one 64-bit register effectively as a Carry-in and Carry-out. The
+instructions were designed in a *Scalar* context to be inline-efficient
+in hardware (use of Operand-Forwarding to reduce the chain down to 2-in 1-out),
+but in a *Vector* context it is extremely straightforward to Micro-code
+an entire batch onto 128-bit SIMD pipelines, 256-bit SIMD pipelines, and
+to perform a large internal Forward-Carry-Propagation on for example the
+Vectorised-Multiply instruction.
+
+Thirdly: as far as Opcode Allocation is concerned, SVP64 needs to be
+considered as an independent stand-alone instruction (just like `REP`).
+In other words, the Suffix **never** gets decoded as a completely different
+instruction just because of the Prefix.  The cost of doing so is simply
+too high in hardware.
+
+--------
+
 # Guidance for evaluation
 
 Deciding which instructions go into an ISA is extremely complex, costly,
-- 
2.30.2