From b2a7d4649daf46413d4c6b59df5a4677d81d6396 Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 14 Apr 2023 16:14:45 +0100 Subject: [PATCH] --- openpower/sv/svstep.mdwn | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/openpower/sv/svstep.mdwn b/openpower/sv/svstep.mdwn index e537b7545..9f7724dbc 100644 --- a/openpower/sv/svstep.mdwn +++ b/openpower/sv/svstep.mdwn @@ -75,19 +75,19 @@ the LE bit for the second, the GT bit for the outermost loop and the SO bit set on the very last element, when all loops reach their maximum extent. -*Programmer's note (1): VL in some situations, particularly larger Matrices, -may exceed 64, -meaning that `sv.svshape` returning a considerable number of values. Under -such circumstances `sv.svshape/ew=8` is recommended.* +*Programmer's note: VL in some situations, particularly larger Matrices +(5x7x3 will set MAXVL=105), +will cause `sv.svstep` to return a considerable number of values. Under +such circumstances `sv.svstep/ew=8` is recommended.* -*Programmer's note (2): having conveniently obtained a pre-computed +*Programmer's note: having conveniently obtained a pre-computed Schedule with `sv.svstep`, it may then be used as the input to Indexed REMAP Mode to achieve the exact same Schedule. It is evident however that before use some of the Indices may be arbitrarily altered as desired. `sv.svstep` helps the programmer avoid having to manually recreate Indices for certain -types of common Loop patterns, and in its simplest form, without REMAP +types of common Loop patterns. In its simplest form, without REMAP (SVi=5 or SVi=6), is equivalent to the `iota` instruction found in other Vector ISAs* @@ -96,7 +96,8 @@ is equivalent to the `iota` instruction found in other Vector ISAs* Vertical First is effectively like an implicit single bit predicate applied to every SVP64 instruction. **ONLY** one element in each SVP64 Vector instruction is executed; srcstep and dststep do **not** -increment, and the Program Counter progresses **immediately** to +increment automatically on completion of one instruction, +and the Program Counter progresses **immediately** to the next instruction just as it would for any standard scalar v3.0B instruction. @@ -121,12 +122,19 @@ the underlying hardware that any masked-out element must be skipped. *This includes in Vertical-First Mode*, and programmers should be keenly aware that srcstep or dststep or both *may* jump by more than one as a result, because the actual request under these circumstances was to execute -on the first available next *non-masked-out* element. - -*Programmers should be aware that VL, srcstep and dststep are global in nature. +on the first available next *non-masked-out* element. It should be +evident that it is the `sv.svstep` instruction that must be Predicated +in order for the **entire** loop to use the Predicate correctly, and +it is strongly recommended for all instructions within the same +Vertical-First Loop to utilise the exact same Predicate Mask(s).* + +Programmers should be aware that VL, srcstep and dststep and +the SUBVL substeps are global in nature. Nested looping with different schedules is perfectly possible, as is -calling of functions, however SVSTATE (and any associated SVSTATE) should -obviously be stored on the stack in order to achieve this benefit* +calling of functions, however SVSTATE (and any associated SVSHAPEs +if REMAP is being used) should +obviously be stored on the stack in order to achieve this benefit +not normally found in Vector ISAs. [[!tag standards]] -- 2.30.2