merits on its own, not just when Vectorised. An example of a borderline
Vectoriseable Defined Word is `mv.swizzle` which only really becomes
high-priority for Audio/Video, Vector GPU and HPC Workloads, but has
-less merit as a Scalar-only operation.
+less merit as a Scalar-only operation, yet when SVP64Single-Prefixed
+can be part of an atomic Compare-and-Swap sequence.
Although one of the top world-class ISAs,
Power ISA Scalar (SFFS) has not been significantly advanced in 12
years: IBM's primary focus has understandably been on PackedSIMD VSX.
Unfortunately, with VSX being 914 instructions and 128-bit it is far too
-much for any new team to consider (10 years development effort) and far
+much for any new team to consider (10+ years development effort) and far
outside of Embedded or Tablet/Desktop/Laptop power budgets. Thus bringing
Power Scalar up-to-date to modern standards *and on its own merits*
is a reasonable goal, and the advantages of the reduced focus is that
-SFFS remains RISC-paradigm, and that lessons can be learned from other
+SFFS remains RISC-paradigm, with lessons being be learned from other
ISAs from the intervening years. Good examples here include `bmask`.
SVP64 Prefixing - also known by the terms "Zero-Overhead-Loop-Prefixing"
\newpage{}
+# Vectorisation: SVP64 and SVP64Single
+
+To be submitted as part of [[ls001]], [[ls008]], [[ls009]] and [[ls010]],
+with SVP64Single to follow in a subsequent RFC, SVP64 is conceptually
+identical to the 50+ year old 8080 `REP` instruction and the Zilog Z80
+`CPIR` and `LDIR` instructions. Parallelism is best achieved by exploiting
+a Multi-Issue Out-of-Order Micro-architecture. It is extremely important
+to bear in mind that at no time does SVP64 add even one single actual
+Vector instruction. It is a *pure* RISC-paradigm Prefixing concept only.
+
+This has some implications which need unpacking. Firstly: in the future,
+the Prefixing may be applied to VSX. The only reason it was not included
+in the initial proposal of SVP64 is because due to the number of VSX
+instructions the Due Diligence required is obviously five times higher
+than the 3+ years work done so far on the SFFS Subset.
+
+Secondly: **any** Scalar instruction involving registers **automatically**
+becomes a candidate for Vector-Prefixing. This in turn means that when
+a new instruction is proposed, it becomes a hard requirement to consider
+not only the implications of its inclusion as a Scalar-only instruction,
+but how it will best be utilised as a Vectorised instruction **as well**.
+Extreme examples of this are the Big-Integer 3-in 2-out instructions that
+use one 64-bit register effectively as a Carry-in and Carry-out. The
+instructions were designed in a *Scalar* context to be inline-efficient
+in hardware (use of Operand-Forwarding to reduce the chain down to 2-in 1-out),
+but in a *Vector* context it is extremely straightforward to Micro-code
+an entire batch onto 128-bit SIMD pipelines, 256-bit SIMD pipelines, and
+to perform a large internal Forward-Carry-Propagation on for example the
+Vectorised-Multiply instruction.
+
+Thirdly: as far as Opcode Allocation is concerned, SVP64 needs to be
+considered as an independent stand-alone instruction (just like `REP`).
+In other words, the Suffix **never** gets decoded as a completely different
+instruction just because of the Prefix. The cost of doing so is simply
+too high in hardware.
+
+--------
+
# Guidance for evaluation
Deciding which instructions go into an ISA is extremely complex, costly,
and a huge responsibility. In public standards mistakes are irrevocable,
and in the case of an ISA the Opcode Allocation is a finite resource,
meaning that mistakes punish future instructions as well. This section
-therefore provides some Evaluation Guidance on the decision process.
+therefore provides some Evaluation Guidance on the decision process,
+particularly for people new to ISA development, given that this RFC
+is circulated widely and publicly. Constructive feedback from experienced
+ISA Architects welcomed to improve this section.
**Does anyone want it?**
instead. see [[sv/po9_encoding]].
* **regs** - a guide to register usage, to how costly Hazard Management
will be, in hardware:
- - 1R: reads one GPR/FPR/SPR/CR.
- - 1W: writes one GPR/FPR/SPR/CR.
- - 1r: reads one CR *Field* (not necessarily the entire CR)
- - 1w: writes one CR *Field* (not necessarily the entire CR)
+
+```
+ - 1R: reads one GPR/FPR/SPR/CR.
+ - 1W: writes one GPR/FPR/SPR/CR.
+ - 1r: reads one CR *Field* (not necessarily the entire CR)
+ - 1w: writes one CR *Field* (not necessarily the entire CR)
+```
[[!inline pages="openpower/sv/rfc/ls012/areas.mdwn" raw=yes ]]
[[!inline pages="openpower/sv/rfc/ls012/xo_cost.mdwn" raw=yes ]]