From: Luke Kenneth Casson Leighton Date: Tue, 6 Feb 2024 15:19:36 +0000 (+0000) Subject: bug 676 more on maxloc X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=9cfcf69bf157ddfed7a9f209d4532e9f54a4117c;p=libreriscv.git bug 676 more on maxloc --- diff --git a/openpower/sv/cookbook/fortran_maxloc.mdwn b/openpower/sv/cookbook/fortran_maxloc.mdwn index e9bb553d2..c3ffca6c1 100644 --- a/openpower/sv/cookbook/fortran_maxloc.mdwn +++ b/openpower/sv/cookbook/fortran_maxloc.mdwn @@ -104,9 +104,9 @@ later when doing SVP64 assembler. def m2(a): # array a m, nm, i, n = 0, 0, 0, len(a) while i m: m, nm, i = a[i], i, i+1 - return nm; + while i m: m,nm,i = a[i],i,i+1 # only whilst bigger + return nm ``` # Implementation in SVP64 Assembler @@ -147,7 +147,7 @@ use than a binary index, as it can be used directly as a Predicate Mask The algorithm works by excluding previous operations using `i-in-unary`, combined with VL being truncated due to use of Data-Dependent Fail-First. -What therefore happens for example on the `sv.com/ff=gt/m=ge` operation +What therefore happens for example on the `sv.cmp/ff=gt/m=ge` operation is that it is *VL* (the Vector Length) that gets truncated to only contain those elements that are smaller than the current largest value found (`m` aka `r4`). Calling `sv.creqv` then sets **only** the @@ -155,6 +155,36 @@ CR field bits up to length `VL`, which on the next loop will exclude them because the Predicate Mask is `m=ge` (ok if the CR field bit is **zero**). +Therefore, the way that Data-Dependent Fail-First works, it attempts +*up to* the current Vector Length, and on detecting the first failure +will truncate at that point. In effect this is speculative sequential +execution of `while (im)` +again in a single instruction, but this time it is a little more +involved. Firstly: mapreduce mode is used, with `r4` as both source +and destination, `r4` acts as the sequential accumulator. Secondly, +again it is masked (`m=ge`) which again excludes testing of previously-tested +elements. The next few instructions extract the information provided +by Vector Length (VL) being truncated - potentially even to zero! +(Note that `mtcrf 128,0` takes care of the possibility of VL=0, which if +that happens then CR0 would be left it in its previous state: a +very much undesirable behaviour!) + +`crternlogi 0,1,2,127` will combine the setting of CR0.EQ and CR0.LT +to give us a true Greater-than-or-equal, including under the circumstance +where VL=0. The `sv.crand` will then take a copy of the `i-in-unary` +mask, but only when CR0.EQ is set. This is why the third operand `BB` +is a Scalar not a Vector (BT=16/Vector, BA=19/Vector, BB=0/Scalar) +which effectively performs a broadcast-splat-ANDing, as follows: + +``` + CR4.SO = CR4.EQ AND CR0.EQ (if VL >= 1) + CR5.SO = CR5.EQ AND CR0.EQ (if VL >= 2) + CR6.SO = CR6.EQ AND CR0.EQ (if VL >= 3) + CR7.SO = CR7.EQ AND CR0.EQ (if VL = 4) +``` + [[!tag svp64_cookbook ]]