From 05713681981e159c0b65e0f0309e7c8b5555249a Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 17 Sep 2021 14:51:17 +0100 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 34 ++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index 3e49d6a90..4693bdd52 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -166,7 +166,9 @@ followed by Reduction in SVP64 is deterministic and somewhat of a misnomer. A normal Vector ISA would have explicit Reduce opcodes with defined characteristics per operation: in SX Aurora there is even an additional scalar argument -containing the initial reduction value. SVP64 fundamentally has to +containing the initial reduction value, and the default is either 0 +or 1 depending on the specifics of the explicit opcode. +SVP64 fundamentally has to utilise *existing* Scalar Power ISA v3.0B operations, which presents some unique challenges. @@ -185,6 +187,9 @@ but for Floating Point it is not permitted due to different results being obtained if the reduction is not executed in strict sequential order. +In essence it becomes the programmer's responsibility to leverage the +pre-determined schedules to desired effect. + ## Scalar result reduce mode Scalar Reduction per se does not exist, instead is implemented in SVP64 @@ -193,9 +198,12 @@ Looping which would terminate if the destination was marked as a Scalar. Scalar Reduction by contrast *keeps issuing Vector Element Operations* even though the destination register is marked as scalar. Thus it is up to the programmer to be aware of this and observe some -conventions. It is also important to appreciate that there is no +conventions. + +It is also important to appreciate that there is no actual imposition or restriction on how this mode is utilised: there -will therefore be several valuable uses (including Vector Iteration) +will therefore be several valuable uses (including Vector Iteration +and "Reverse-Gear") and it is up to the programmer to make best use of the capability provided. @@ -205,7 +213,9 @@ Scalar reduction is thus categorised by: * One of the sources is a Vector * the destination is a scalar -* optionally but most usefully when one source register is also the destination +* optionally but most usefully when one source scalar register is + also the scalar destination (which may be informally termed + the "accumulator") * That the source register type is the same as the destination register type identified as the "accumulator". scalar reduction on `cmp`, `setb` or `isel` makes no sense for example because of the mixture @@ -221,7 +231,8 @@ Implementors **MAY** choose to optimise such instructions in instances where their use results in "extraneous execution", i.e. where it is clear that the sequence of operations, comprising multiple overwrites to a scalar destination **without** cumulative, iterative, or reductive -behaviour, may discard all but the last element operation. Identification +behaviour (no "accumulator"), may discard all but the last element +operation. Identification of such is trivial to do for `setb` and `cmp`: the source register type is a completely different register file from the destination* @@ -238,11 +249,14 @@ However, *unless* the operation is marked as "mapreduce", SV ordinarily operation as "mapreduce" will it continue to issue multiple sub-looped (element) instructions in `Program Order`. -To.perform the loop in reverse order, the ```RG``` (reverse gear) bit must be set. This is useful for leaving a cumulative suffix sum in reverse order: - - for i in (VL-1 downto 0): - # RT-1 = RA gives a suffix sum - iregs[RT+i] = iregs[RA+i] - iregs[RB+i] +To perform the loop in reverse order, the ```RG``` (reverse gear) bit must be set. This may be useful in situations where the results may be different +(floating-point) if executed in a different order. Given that there is +no actual prohibition on Reduce Mode being applied when the destination +is a Vector, the "Reverse Gear" bit turns out to be a way to apply Iterative +or Cumulative Vector operations in reverse. `sv.add/rg r3.v, r4.v, r4.v` +for example will start at the opposite end of the Vector and push +a cumulative series of overlapping add operations into the Execution units of +the underlying hardware. Other examples include shift-mask operations where a Vector of inserts into a single destination register is required, as a way to construct -- 2.30.2