add Vectorisation note

[libreriscv.git] / openpower / sv / rfc / ls012.mdwn
diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn

index b4fdc7a9d6a30a778333fbf82aa060a82e8a6beb..43c3226e5026739a99d8b81dd27f845e0ef1bf0c 100644 (file)
--- a/openpower/sv/rfc/ls012.mdwn
+++ b/openpower/sv/rfc/ls012.mdwn
@@ -46,17 +46,18 @@ or may not be Vectoriseable, but that every "Defined Word" should have
  merits on its own, not just when Vectorised.  An example of a borderline
  Vectoriseable Defined Word is `mv.swizzle` which only really becomes
  high-priority for Audio/Video, Vector GPU and HPC Workloads, but has
-less merit as a Scalar-only operation.
+less merit as a Scalar-only operation, yet when SVP64Single-Prefixed
+can be part of an atomic Compare-and-Swap sequence.
  
  Although one of the top world-class ISAs,
  Power ISA Scalar (SFFS) has not been significantly advanced in 12
  years: IBM's primary focus has understandably been on PackedSIMD VSX.
  Unfortunately, with VSX being 914 instructions and 128-bit it is far too
-much for any new team to consider (10 years development effort) and far
+much for any new team to consider (10+ years development effort) and far
  outside of Embedded or Tablet/Desktop/Laptop power budgets. Thus bringing
  Power Scalar up-to-date to modern standards *and on its own merits*
  is a reasonable goal, and the advantages of the reduced focus is that
-SFFS remains RISC-paradigm, and that lessons can be learned from other
+SFFS remains RISC-paradigm, with lessons being be learned from other
  ISAs from the intervening years.  Good examples here include `bmask`.
  
  SVP64 Prefixing - also known by the terms "Zero-Overhead-Loop-Prefixing"
@@ -389,13 +390,54 @@ offset computation, thus they are best placed in EXT0xx.
  
  \newpage{}
  
+# Vectorisation: SVP64 and SVP64Single
+
+To be submitted as part of [[ls001]], [[ls008]], [[ls009]] and [[ls010]],
+with SVP64Single to follow in a subsequent RFC, SVP64 is conceptually
+identical to the 50+ year old 8080 `REP` instruction and the Zilog Z80
+`CPIR` and `LDIR` instructions.  Parallelism is best achieved by exploiting
+a Multi-Issue Out-of-Order Micro-architecture.  It is extremely important
+to bear in mind that at no time does SVP64 add even one single actual
+Vector instruction.  It is a *pure* RISC-paradigm Prefixing concept only.
+
+This has some implications which need unpacking.  Firstly: in the future,
+the Prefixing may be applied to VSX.  The only reason it was not included
+in the initial proposal of SVP64 is because due to the number of VSX
+instructions the Due Diligence required is obviously five times higher
+than the 3+ years work done so far on the SFFS Subset.
+
+Secondly: **any** Scalar instruction involving registers **automatically**
+becomes a candidate for Vector-Prefixing.  This in turn means that when
+a new instruction is proposed, it becomes a hard requirement to consider
+not only the implications of its inclusion as a Scalar-only instruction,
+but how it will best be utilised as a Vectorised instruction **as well**.
+Extreme examples of this are the Big-Integer 3-in 2-out instructions that
+use one 64-bit register effectively as a Carry-in and Carry-out. The
+instructions were designed in a *Scalar* context to be inline-efficient
+in hardware (use of Operand-Forwarding to reduce the chain down to 2-in 1-out),
+but in a *Vector* context it is extremely straightforward to Micro-code
+an entire batch onto 128-bit SIMD pipelines, 256-bit SIMD pipelines, and
+to perform a large internal Forward-Carry-Propagation on for example the
+Vectorised-Multiply instruction.
+
+Thirdly: as far as Opcode Allocation is concerned, SVP64 needs to be
+considered as an independent stand-alone instruction (just like `REP`).
+In other words, the Suffix **never** gets decoded as a completely different
+instruction just because of the Prefix.  The cost of doing so is simply
+too high in hardware.
+
+--------
+
  # Guidance for evaluation
  
  Deciding which instructions go into an ISA is extremely complex, costly,
  and a huge responsibility. In public standards mistakes are irrevocable,
  and in the case of an ISA the Opcode Allocation is a finite resource,
  meaning that mistakes punish future instructions as well.  This section
-therefore provides some Evaluation Guidance on the decision process.
+therefore provides some Evaluation Guidance on the decision process,
+particularly for people new to ISA development, given that this RFC
+is circulated widely and publicly.  Constructive feedback from experienced
+ISA Architects welcomed to improve this section.
  
  **Does anyone want it?**
  
@@ -564,10 +606,13 @@ The key to headings and sections are as follows:
    instead.  see [[sv/po9_encoding]].
  * **regs** - a guide to register usage, to how costly Hazard Management
    will be, in hardware:
-  - 1R: reads one GPR/FPR/SPR/CR.
-  - 1W: writes one GPR/FPR/SPR/CR.
-  - 1r: reads one CR *Field* (not necessarily the entire CR)
-  - 1w: writes one CR *Field* (not necessarily the entire CR)
+
+```
+     - 1R: reads one GPR/FPR/SPR/CR.
+     - 1W: writes one GPR/FPR/SPR/CR.
+     - 1r: reads one CR *Field* (not necessarily the entire CR)
+     - 1w: writes one CR *Field* (not necessarily the entire CR)
+```
  
  [[!inline pages="openpower/sv/rfc/ls012/areas.mdwn" raw=yes ]]
  [[!inline pages="openpower/sv/rfc/ls012/xo_cost.mdwn" raw=yes ]]