doc(ls003): Added first draft rfc ls003 (only maddedu and divmod2du)

author Andrey Miroshnikov <andrey@technepisteme.xyz>

Thu, 20 Oct 2022 14:51:03 +0000 (15:51 +0100)

committer Andrey Miroshnikov <andrey@technepisteme.xyz>

Thu, 20 Oct 2022 14:51:03 +0000 (15:51 +0100)
author Andrey Miroshnikov <andrey@technepisteme.xyz>
Thu, 20 Oct 2022 14:51:03 +0000 (15:51 +0100)
committer Andrey Miroshnikov <andrey@technepisteme.xyz>
Thu, 20 Oct 2022 14:51:03 +0000 (15:51 +0100)
diff --git a/openpower/sv/rfc/ls003.mdwn b/openpower/sv/rfc/ls003.mdwn

new file mode 100644 (file)

index 0000000..d0b8b40
--- /dev/null
+++ b/openpower/sv/rfc/ls003.mdwn
@@ -0,0 +1,205 @@
+# RFC ls003 Big Integer 
+
+**URLs**:
+
+* <https://libre-soc.org/openpower/sv/>
+* <https://libre-soc.org/openpower/sv/rfc/ls003/>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=944>
+* <https://git.openpower.foundation/isa/PowerISA/issues/87>
+
+**Severity**: Major
+
+**Status**: New
+
+**Date**: -- Oct 2022 **(UPDATE)**
+
+**Target**: v3.2B
+
+**Source**: v3.0B
+
+**Books and Section affected**: **UPDATE**
+
+```
+    Book I Scalar Floating-Point 4.6.2.1
+    Appendix E Power ISA sorted by opcode
+    Appendix F Power ISA sorted by version
+    Appendix G Power ISA sorted by Compliancy Subset
+    Appendix H Power ISA sorted by mnemonic
+```
+
+**Summary**
+
+```
+    Instructions added
+    maddedu - Multiply-Add Extended Double Unsigned
+    divmod2du - Divide/Modulo Quad-Double Unsigned
+```
+
+**Submitter**: Luke Leighton (Libre-SOC)
+
+**Requester**: Libre-SOC
+
+**Impact on processor**:
+
+```
+    Addition of two new GPR-based instructions
+```
+
+**Impact on software**:
+
+```
+    Requires support for new instructions in assembler, debuggers,
+    and related tools.
+```
+
+**Keywords**:
+
+```
+    GPR, Big-integer, Double-word
+```
+
+**Motivation**
+
+Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
+accumulation affect. As the second result location is implicitly defined as the register after the first result (RS=RT+1), the Scalar Register set can be used
+for vector computation.
+Similar to `divdeu`, and has similar advantages to `maddedu`. Modulo result is
+available with the quotient.
+
+**Notes and Observations**:
+
+1. There is no need for an Rc=1 variant as VA-Form is being used.
+2. There is no need for Special Registers as VA-Form is being used. 
+
+**Changes**
+
+Add the following entries to:
+
+* the Appendices of Book I
+* Instructions of Book I added to Section 3.3.9.1
+
+----------------
+
+\newpage{}
+
+# Multiply-Add Extended Double Unsigned
+
+`maddedu RT, RA, RB, RC`
+
+|  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
+|-------|------|-------|-------|-------|-------|---------|
+| EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
+
+Pseudocode:
+
+```
+    prod[0:127] <- (RA) * (RB)   # Multiply RA and RB, result 128-bit
+    sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
+    RT <- sum[64:127]            # Store low half in RT
+    RS <- sum[0:63]              # RS implicit register, see below
+```
+
+Special registers altered:
+
+    None
+
+RC is zero-extended (not shifted, not sign-extended), the 128-bit product added
+to it; the lower half of that result stored in RT and the upper half
+in RS.
+
+The differences here to `maddhdu` are that `maddhdu` stores the upper
+half in RT, where `maddedu` stores the upper half in RS. There is **no
+equivalent to `maddld`** because `maddld` performs sign-extension on RC.
+
+RS is implictly defined as the register following RT (RS=RT+1).
+
+*Programmer's Note:
+As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
+To achieve a big-integer rolling-accumulation effect:
+assuming the scalar to multiply is in r0, 
+the vector to multiply by starts at r4 and the result vector
+in r20, instructions may be issued `maddedu r20,r4,r0,r20
+maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
+stored the upper half of the 128-bit multiply into r21, such
+that it may be picked up by the second `maddedu`. Repeat inline
+to construct a larger bigint scalar-vector multiply,
+as Scalar GPR register file space permits.*
+
+Examples:
+
+```
+    maddedu r4, r0, r1, r2 # ((r0)*(r1))+(r2), store lower in r4, upper in r5
+```
+
+# Divide/Modulo Quad-Double Unsigned
+
+**Should name be Divide/Module Double Extended Unsigned?**
+**Check the pseudo-code comments**
+
+`divmod2du RT,RA,RB,RC`
+
+|  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
+|-------|------|-------|-------|-------|-------|---------|
+| EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
+
+Pseudo-code:
+
+    if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then   # Check RA<RB, for divide-by-0
+        dividend[0:(XLEN*2)-1] <- (RA) || (RC)    # Combine RA/RC, zero extend
+        divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
+        result <- dividend / divisor             # Division
+        modulo <- dividend % divisor             # Modulo
+        RT <- result[XLEN:(XLEN*2)-1]            # Store result in RT
+        RS <- modulo[XLEN:(XLEN*2)-1]            # Modulo in RC, implicit
+    else                                         # In case of error
+        RT <- [1]*XLEN                           # RT all 1's
+        RS <- [0]*XLEN                           # RS all 0's
+
+Special registers altered:
+
+    None
+
+Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
+that is near-identical to `divdeu` except that:
+
+* the lower 64 bits of the dividend, instead of being zero, contain a
+  register, RC.
+* it performs a fused divide and modulo in a single instruction, storing
+  the modulo in an implicit RS (similar to `maddedu`)
+
+RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
+division, producing a (pair) of 64 bit result(s), in the same way that
+Intel [divq](https://www.felixcloutier.com/x86/div) works.
+Overflow conditions
+are detected in exactly the same fashion as `divdeu`, except that rather
+than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
+zeros on overflow.
+
+*Programmer's note: there are no Rc variants of any of these VA-Form
+instructions. `cmpi` will need to be used to detect overflow conditions:
+the saving in instruction count is that both RT and RS will have already
+been set to useful values (all 1s and all zeros respectively)
+needed as part of implementing Knuth's
+Algorithm D*
+
+For Scalar usage, just as for `maddedu`, `RS=RT+1` (similar to `lq` and `stq`).
+
+Examples:
+
+```
+    divmod2du r4, r0, r1, r2 # ((r0)||(r2)) / (r1), store in r4
+                            # ((r0)||(r2)) % (r1), store in r5
+```
+
+# Appendices
+
+    Appendix E Power ISA sorted by opcode
+    Appendix F Power ISA sorted by version
+    Appendix G Power ISA sorted by Compliancy Subset
+    Appendix H Power ISA sorted by mnemonic
+
+| Form | Book | Page | Version | mnemonic | Description |
+|------|------|------|---------|----------|-------------|
+| VA   | I    | #    | 3.0B    | maddedu  | Multiply-Add Extend Double Unsigned |
+| VA   | I    | #    | 3.0B    | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned
+
author	Andrey Miroshnikov <andrey@technepisteme.xyz>
	Thu, 20 Oct 2022 14:51:03 +0000 (15:51 +0100)
committer	Andrey Miroshnikov <andrey@technepisteme.xyz>
	Thu, 20 Oct 2022 14:51:03 +0000 (15:51 +0100)