69384c3bf7df42065c1778062ba7c8b234c8e7ce
[libreriscv.git] / openpower / sv / rfc / ls003.mdwn
1 # RFC ls003 Big Integer
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=944>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: -- Oct 2022 **(UPDATE)**
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**: **UPDATE**
21
22 ```
23 Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
28 ```
29
30 **Summary**
31
32 ```
33 Instructions added
34 maddedu - Multiply-Add Extended Double Unsigned
35 divmod2du - Divide/Modulo Quad-Double Unsigned
36 ```
37
38 **Submitter**: Luke Leighton (Libre-SOC)
39
40 **Requester**: Libre-SOC
41
42 **Impact on processor**:
43
44 ```
45 Addition of two new GPR-based instructions
46 ```
47
48 **Impact on software**:
49
50 ```
51 Requires support for new instructions in assembler, debuggers,
52 and related tools.
53 ```
54
55 **Keywords**:
56
57 ```
58 GPR, Big-integer, Double-word
59 ```
60
61 **Motivation**
62
63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
64 accumulation affect. As the second result location is implicitly defined as the register after the first result (RS=RT+1), the Scalar Register set can be used
65 for vector computation.
66 Similar to `divdeu`, and has similar advantages to `maddedu`. Modulo result is
67 available with the quotient.
68
69 **Notes and Observations**:
70
71 1. There is no need for an Rc=1 variant as VA-Form is being used.
72 2. There is no need for Special Registers as VA-Form is being used.
73
74 **Changes**
75
76 Add the following entries to:
77
78 * the Appendices of Book I
79 * Instructions of Book I added to Section 3.3.9.1
80
81 ----------------
82
83 \newpage{}
84
85 # Multiply-Add Extended Double Unsigned
86
87 `maddedu RT, RA, RB, RC`
88
89 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
90 |-------|------|-------|-------|-------|-------|---------|
91 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
92
93 Pseudocode:
94
95 ```
96 prod[0:127] <- (RA) * (RB) # Multiply RA and RB, result 128-bit
97 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
98 RT <- sum[64:127] # Store low half in RT
99 RS <- sum[0:63] # RS implicit register, see below
100 ```
101
102 Special registers altered:
103
104 None
105
106 RC is zero-extended (not shifted, not sign-extended), the 128-bit product added
107 to it; the lower half of that result stored in RT and the upper half
108 in RS.
109
110 The differences here to `maddhdu` are that `maddhdu` stores the upper
111 half in RT, where `maddedu` stores the upper half in RS. There is **no
112 equivalent to `maddld`** because `maddld` performs sign-extension on RC.
113
114 RS is implictly defined as the register following RT (RS=RT+1).
115
116 *Programmer's Note:
117 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
118 To achieve a big-integer rolling-accumulation effect:
119 assuming the scalar to multiply is in r0,
120 the vector to multiply by starts at r4 and the result vector
121 in r20, instructions may be issued `maddedu r20,r4,r0,r20
122 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
123 stored the upper half of the 128-bit multiply into r21, such
124 that it may be picked up by the second `maddedu`. Repeat inline
125 to construct a larger bigint scalar-vector multiply,
126 as Scalar GPR register file space permits.*
127
128 Examples:
129
130 ```
131 maddedu r4, r0, r1, r2 # ((r0)*(r1))+(r2), store lower in r4, upper in r5
132 ```
133
134 # Divide/Modulo Quad-Double Unsigned
135
136 **Should name be Divide/Module Double Extended Unsigned?**
137 **Check the pseudo-code comments**
138
139 `divmod2du RT,RA,RB,RC`
140
141 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
142 |-------|------|-------|-------|-------|-------|---------|
143 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
144
145 Pseudo-code:
146
147 if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then # Check RA<RB, for divide-by-0
148 dividend[0:(XLEN*2)-1] <- (RA) || (RC) # Combine RA/RC, zero extend
149 divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
150 result <- dividend / divisor # Division
151 modulo <- dividend % divisor # Modulo
152 RT <- result[XLEN:(XLEN*2)-1] # Store result in RT
153 RS <- modulo[XLEN:(XLEN*2)-1] # Modulo in RC, implicit
154 else # In case of error
155 RT <- [1]*XLEN # RT all 1's
156 RS <- [0]*XLEN # RS all 0's
157
158 Special registers altered:
159
160 None
161
162 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
163 that is near-identical to `divdeu` except that:
164
165 * the lower 64 bits of the dividend, instead of being zero, contain a
166 register, RC.
167 * it performs a fused divide and modulo in a single instruction, storing
168 the modulo in an implicit RS (similar to `maddedu`)
169
170 RB, the divisor, remains 64 bit. The instruction is therefore a 128/64
171 division, producing a (pair) of 64 bit result(s), in the same way that
172 Intel [divq](https://www.felixcloutier.com/x86/div) works.
173 Overflow conditions
174 are detected in exactly the same fashion as `divdeu`, except that rather
175 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
176 zeros on overflow.
177
178 *Programmer's note: there are no Rc variants of any of these VA-Form
179 instructions. `cmpi` will need to be used to detect overflow conditions:
180 the saving in instruction count is that both RT and RS will have already
181 been set to useful values (all 1s and all zeros respectively)
182 needed as part of implementing Knuth's
183 Algorithm D*
184
185 For Scalar usage, just as for `maddedu`, `RS=RT+1` (similar to `lq` and `stq`).
186
187 Examples:
188
189 ```
190 divmod2du r4, r0, r1, r2 # ((r0)||(r2)) / (r1), store in r4
191 # ((r0)||(r2)) % (r1), store in r5
192 ```
193
194 # Appendices
195
196 Appendix E Power ISA sorted by opcode
197 Appendix F Power ISA sorted by version
198 Appendix G Power ISA sorted by Compliancy Subset
199 Appendix H Power ISA sorted by mnemonic
200
201 | Form | Book | Page | Version | mnemonic | Description |
202 |------|------|------|---------|----------|-------------|
203 | VA | I | # | 3.0B | maddedu | Multiply-Add Extend Double Unsigned |
204 | VA | I | # | 3.0B | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned
205