From 3d09f43519d76ef22e34de26cc0ebea0aeab9888 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 11 Apr 2023 18:05:33 +0100 Subject: [PATCH] add Vectorisation note --- openpower/sv/rfc/ls012.mdwn | 38 +++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn index c376ed275..43c3226e5 100644 --- a/openpower/sv/rfc/ls012.mdwn +++ b/openpower/sv/rfc/ls012.mdwn @@ -390,6 +390,44 @@ offset computation, thus they are best placed in EXT0xx. \newpage{} +# Vectorisation: SVP64 and SVP64Single + +To be submitted as part of [[ls001]], [[ls008]], [[ls009]] and [[ls010]], +with SVP64Single to follow in a subsequent RFC, SVP64 is conceptually +identical to the 50+ year old 8080 `REP` instruction and the Zilog Z80 +`CPIR` and `LDIR` instructions. Parallelism is best achieved by exploiting +a Multi-Issue Out-of-Order Micro-architecture. It is extremely important +to bear in mind that at no time does SVP64 add even one single actual +Vector instruction. It is a *pure* RISC-paradigm Prefixing concept only. + +This has some implications which need unpacking. Firstly: in the future, +the Prefixing may be applied to VSX. The only reason it was not included +in the initial proposal of SVP64 is because due to the number of VSX +instructions the Due Diligence required is obviously five times higher +than the 3+ years work done so far on the SFFS Subset. + +Secondly: **any** Scalar instruction involving registers **automatically** +becomes a candidate for Vector-Prefixing. This in turn means that when +a new instruction is proposed, it becomes a hard requirement to consider +not only the implications of its inclusion as a Scalar-only instruction, +but how it will best be utilised as a Vectorised instruction **as well**. +Extreme examples of this are the Big-Integer 3-in 2-out instructions that +use one 64-bit register effectively as a Carry-in and Carry-out. The +instructions were designed in a *Scalar* context to be inline-efficient +in hardware (use of Operand-Forwarding to reduce the chain down to 2-in 1-out), +but in a *Vector* context it is extremely straightforward to Micro-code +an entire batch onto 128-bit SIMD pipelines, 256-bit SIMD pipelines, and +to perform a large internal Forward-Carry-Propagation on for example the +Vectorised-Multiply instruction. + +Thirdly: as far as Opcode Allocation is concerned, SVP64 needs to be +considered as an independent stand-alone instruction (just like `REP`). +In other words, the Suffix **never** gets decoded as a completely different +instruction just because of the Prefix. The cost of doing so is simply +too high in hardware. + +-------- + # Guidance for evaluation Deciding which instructions go into an ISA is extremely complex, costly, -- 2.30.2