From c56745a2ad3f2adbb2b86c3ba3ede1d93a024101 Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 5 Jan 2024 16:04:36 +0000 Subject: [PATCH] --- openpower/sv/cookbook/daxpy_example.mdwn | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/openpower/sv/cookbook/daxpy_example.mdwn b/openpower/sv/cookbook/daxpy_example.mdwn index 459fc233f..0519714a1 100644 --- a/openpower/sv/cookbook/daxpy_example.mdwn +++ b/openpower/sv/cookbook/daxpy_example.mdwn @@ -37,9 +37,10 @@ having to pre-subtract an offset before running the loop. For `sv.lfdup`, RA is Scalar so continuously accumulates additions of the immediate (8) but only *after* RA has been used -as the Effective Address. +as the Effective Address each time. The last write to RA is the address for -the next block (the next time round the CTR loop). +the next block (the next time round the CTR loop). + To understand this it is necessary to appreciate that SVP64 is as if a sequence of loop-unrolled scalar instructions were issued. With that sequence all writing the new version of RA @@ -47,6 +48,13 @@ before the next element-instruction, the end result is identical in effect to Element-Strided, except that RA points to the start of the next batch. +If `sv.lfdup` was not available, `sv.lfdu` could be used to the same +effect, but RA would have to be *pre-subtracted by one element*, outside +of the loop. Due to the compactness of this highly hardware-parallelizable +algorithm, that one additinal instruction would increase the implementation +code size by 5 percent! This helps explain why Post-Increment Update +Load/Store instructions are so important. + Use of Element-Strided on `sv.lfd/els` ensures the Immediate (8) results in a contiguous LD *without* modifying RA. -- 2.30.2