From: lkcl Date: Sun, 26 Mar 2023 13:08:00 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls001_v3~53 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=8629527c6c6fbe064769883dde48f81ea0f8b7e1;p=libreriscv.git --- diff --git a/openpower/sv/rfc/ls008.mdwn b/openpower/sv/rfc/ls008.mdwn index ffeaa893c..57ad4d623 100644 --- a/openpower/sv/rfc/ls008.mdwn +++ b/openpower/sv/rfc/ls008.mdwn @@ -157,14 +157,8 @@ Special Registers Altered: CR0 (if Rc=1) -------------- - -**svstep "Mode of Enquiry"** -It is possible to -* `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep to the next - element, taking pack and unpack into consideration. * `SVi=1`: test inner middle and outer loop end conditions from SVSTATE0 and store in CR.EQ CR.LE CR.GT * `SVi=2`: test SVSTATE1 (and return conditions) @@ -177,6 +171,109 @@ It is possible to * `SVi=14`: `SVSTATE.pack` is set to zero and `SVSTATE.unpack` set to zero * `SVi=15`: `SVSTATE.pack` is set to zero and `SVSTATE.unpack` set to zero +**Description** + +svstep may be used +to enquire about the REMAP Schedule and it may be used to alter Vectorisation +State. When `vf=1` then stepping occurs. +When `vf=0` the enquiry is performed without altering internal +state. If `SVi=0, Rc=0, vf=0` the instruction is a `nop`. +The following Modes exist: + +* `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep to the next + element, taking pack and unpack into consideration. +* When `SVi` is 1-4 the REMAP Schedule for a given SVSHAPE may be +returned in `RT`. SVi=1 selects SVSHAPE0 current state, +through to SVi=4 selects SVSHAPE3. +* When `SVi` is 5, `SVSTATE.srcstep` is returned. +* When `SVi` is 6, `SVSTATE.dststep` is returned. +* When `SVi` is 0b1100 pack/unpack in SVSTATE is cleared +* When `SVi` is 0b1101 pack in SVSTATE is set, unpack is cleared +* When `SVi` is 0b1110 unpack in SVSTATE is set, pack is cleared +* When `SVi` is 0b1111 pack/unpack in SVSTATE are set + +As this is a Single-Predicated (1P) instruction, predication may be applied +to skip (or zero) elements. + +* Vertical-First Mode will return the requested index + (and move to the next state if `vf=1`) +* Horizontal-First Mode can be used to return all indices, + i.e. walks through all possible states. + +**Vectorisation of svstep itself** + +As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as +`sv.svstep`. This will work perfectly well in Horizontal-First +as it will in Vertical-First Mode. + +Example: to obtain the full set of possible computed element +indices use `sv.svstep RT.v,SVI,1` which will store all computed element +indices, starting from RT. If Rc=1 then a co-result Vector of CR Fields +will also be returned, comprising the "loop end-points" of each of the inner +loops when either Matrix Mode or DCT/FFT is set. In other words, +for example, when the `xdim` inner loop reaches the end and on the next +iteration it will begin again at zero, the CR Field `EQ` will be set. +With a maximum of three loops within both Matrix and DCT/FFT Modes, +the CR Field's EQ bit will be set at the end of the first inner loop, +the LE bit for the second, the GT bit for the outermost loop and the +SO bit set on the very last element, when all loops reach their maximum +extent. + +*Programmer's note (1): VL in some situations, particularly larger Matrices, +may exceed 64, +meaning that `sv.svshape` returning a considerable number of values. Under +such circumstances `sv.svshape/ew=8` is recommended.* + +*Programmer's note (2): having conveniently obtained a pre-computed +Schedule with `sv.svstep`, +it may then be used as the input to Indexed REMAP Mode +to achieve the exact same Schedule. It is evident however that +before use some of the Indices may be arbitrarily altered as desired. +`sv.svstep` helps the programmer avoid having to manually recreate +Indices for certain +types of common Loop patterns, and in its simplest form, without REMAP +(SVi=5 or SVi=6), +is equivalent to the `iota` instruction found in other Vector ISAs* + +**Vertical First Mode** + +Vertical First is effectively like an implicit single bit predicate +applied to every SVP64 instruction. **ONLY** one element in each +SVP64 Vector instruction is executed; srcstep and dststep do **not** +increment, and the Program Counter progresses **immediately** to +the next instruction just as it would for any standard scalar v3.0B +instruction. + +A mode of srcstep (SVi=0) is called which can move srcstep and +dststep on to the next element, still respecting predicate +masks. + +In other words, where normal SVP64 Vectorisation acts "horizontally" +by looping first through 0 to VL-1 and only then moving the PC +to the next instruction, Vertical-First moves the PC onwards +(vertically) through multiple instructions **with the same +srcstep and dststep**, then an explict instruction used to +advance srcstep/dststep. An outer loop is expected to be +used (branch instruction) which completes a series of +Vector operations. + +Testing any end condition of any loop of any REMAP state allows branches to be +used to create loops. + +Programmer's note: when Predicate Non-Zeroing is used this indicates to +the underlying hardware that any masked-out element must be skipped. +*This includes in Vertical-First Mode*, and programmers should be keenly +aware that srcstep or dststep or both *may* jump by more than one as +a result, because the actual request under these circumstances was to execute +on the first available next *non-masked-out* element. + +*Programmers should be aware that VL, srcstep and dststep are global in nature. +Nested looping with different schedules is perfectly possible, as is +calling of functions, however SVSTATE (and any associated SVSTATE) should +obviously be stored on the stack in order to achieve this benefit* + +------------- + \newpage{} @@ -294,52 +391,6 @@ Additionally, in reality it is **`VL`** being set. Therefore, rather than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE is set if `VL` is non-zero. -# Vertical First Mode - -Vertical First is effectively like an implicit single bit predicate -applied to every SVP64 instruction. **ONLY** one element in each -SVP64 Vector instruction is executed; srcstep and dststep do **not** -increment, and the Program Counter progresses **immediately** to -the next instruction just as it would for any standard scalar v3.0B -instruction. - -An explicit mode of setvl is called which can move srcstep and -dststep on to the next element, still respecting predicate -masks. - -In other words, where normal SVP64 Vectorisation acts "horizontally" -by looping first through 0 to VL-1 and only then moving the PC -to the next instruction, Vertical-First moves the PC onwards -(vertically) through multiple instructions **with the same -srcstep and dststep**, then an explict instruction used to -advance srcstep/dststep. An outer loop is expected to be -used (branch instruction) which completes a series of -Vector operations. - -```svfstep``` mode is enabled when vf=1, vs=0 and ms=0. -When Rc=1 it is possible to determine when any level of -loops reach an end condition, or if VL has been reached. The immediate can -be reinterpreted as indicating which SVSTATE (0-3) -should be tested and placed into CR0 (when Rc=1) - -When RT is not zero, an internal stepping index may also be returned, -either the REMAP index or srcstep or dststep. This table is identical -to that of [[sv/svstep]]: - -* `SVi=1`: also include inner middle and outer - loop end conditions from SVSTATE0 into CR.EQ CR.LE CR.GT -* `SVi=2`: test SVSTATE1 (and return conditions) -* `SVi=3`: test SVSTATE2 (and return conditions) -* `SVi=4`: test SVSTATE3 (and return conditions) -* `SVi=5`: `SVSTATE.srcstep` is returned. -* `SVi=6`: `SVSTATE.dststep` is returned. - -Testing any end condition of any loop of any REMAP state allows branches to be used to create loops. - -*Programmers should be aware that VL, srcstep and dststep are global in nature. -Nested looping with different schedules is perfectly possible, as is -calling of functions, however SVSTATE (and any associated SVSTATE) should be stored on the stack.* - **SUBVL** Sub-vector elements are not be considered "Vertical". The vec2/3/4