042f942b730f077a966f9fb9c9cbc4793c441a27
[libreriscv.git] / openpower / sv / rfc / ls003.mdwn
1 # RFC ls003 Big Integer
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 20 Oct 2022
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**: **UPDATE**
21
22 ```
23 Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
28 ```
29
30 **Summary**
31
32 ```
33 Instructions added
34 maddedu - Multiply-Add Extended Double Unsigned
35 divmod2du - Divide/Modulo Quad-Double Unsigned
36 ```
37
38 **Submitter**: Luke Leighton (Libre-SOC)
39
40 **Requester**: Libre-SOC
41
42 **Impact on processor**:
43
44 ```
45 Addition of two new GPR-based instructions
46 ```
47
48 **Impact on software**:
49
50 ```
51 Requires support for new instructions in assembler, debuggers,
52 and related tools.
53 ```
54
55 **Keywords**:
56
57 ```
58 GPR, Big-integer, Double-word
59 ```
60
61 **Motivation**
62
63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
64 accumulation affect: `RC` effectively becomes a 64-bit carry in chains
65 of highly efficient loop-unrolled arbitrary-length big-integer operations.
66 Similar to `divdeu`, and has similar advantages to `maddedu`,
67 Modulo result is available with the quotient.
68
69 **Notes and Observations**:
70
71 1. There is no need for an Rc=1 variant as VA-Form is being used.
72 2. There is no need for Special Registers as VA-Form is being used.
73 3. Both instructions have been present in Intel x86 for several decades.
74 4. Neither instruction is present in VSX: these are 128/64 whereas
75 VSX is 128/128.
76 5. `maddedu` and `divmod2du` are inverses of each other, including
77 when used for arbitrary-length big-integer arithmetic
78
79 **Changes**
80
81 Add the following entries to:
82
83 * the Appendices of Book I
84 * Instructions of Book I added to Section 3.3.9.1
85
86 ----------------
87
88 \newpage{}
89
90 # Multiply-Add Extended Double Unsigned
91
92 `maddedu RT, RA, RB, RC`
93
94 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
95 |-------|------|-------|-------|-------|-------|---------|
96 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
97
98 Pseudocode:
99
100 ```
101 prod[0:127] <- (RA) * (RB) # Multiply RA and RB, result 128-bit
102 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
103 RT <- sum[64:127] # Store low half in RT
104 RS <- sum[0:63] # RS implicit register, equal to RC
105 ```
106
107 Special registers altered:
108
109 None
110
111 RC is zero-extended (not shifted, not sign-extended), the 128-bit product added
112 to it; the lower half of that result stored in RT and the upper half
113 in RS.
114
115 The differences here to `maddhdu` are that `maddhdu` stores the upper
116 half in RT, where `maddedu` stores the upper half in RS.
117
118 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
119 performing sign-extension on RC, because RT is the full mathematical result
120 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
121 results modulo 2^64. This is why there is no maddldu instruction.
122
123 RS is implictly defined as the same register as RC.
124
125 *Programmer's Note:
126 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
127 To achieve a big-integer rolling-accumulation effect:
128 assuming the scalar to multiply is in r0,
129 the vector to multiply by starts at r4 and the result vector
130 in r20, instructions may be issued `maddedu r20,r4,r0,r20
131 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
132 stored the upper half of the 128-bit multiply into r21, such
133 that it may be picked up by the second `maddedu`. Repeat inline
134 to construct a larger bigint scalar-vector multiply,
135 as Scalar GPR register file space permits.*
136
137 Examples:
138
139 ```
140 # (r0 * r1) + r2, store lower in r4, upper in r2
141 maddedu r4, r0, r1, r2
142 ```
143
144 # Divide/Modulo Quad-Double Unsigned
145
146 **Should name be Divide/Module Double Extended Unsigned?**
147 **Check the pseudo-code comments**
148
149 `divmod2du RT,RA,RB,RC`
150
151 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
152 |-------|------|-------|-------|-------|-------|---------|
153 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
154
155 Pseudo-code:
156
157 if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then # Check RA<RB, for divide-by-0
158 dividend[0:(XLEN*2)-1] <- (RA) || (RC) # Combine RA/RC, zero extend
159 divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
160 result <- dividend / divisor # Division
161 modulo <- dividend % divisor # Modulo
162 RT <- result[XLEN:(XLEN*2)-1] # Store result in RT
163 RS <- modulo[XLEN:(XLEN*2)-1] # Modulo in RC, implicit
164 else # In case of error
165 RT <- [1]*XLEN # RT all 1's
166 RS <- [0]*XLEN # RS all 0's
167
168 Special registers altered:
169
170 None
171
172 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
173 that is near-identical to `divdeu` except that:
174
175 * the lower 64 bits of the dividend, instead of being zero, contain a
176 register, RC.
177 * it performs a fused divide and modulo in a single instruction, storing
178 the modulo in an implicit RS (similar to `maddedu`)
179
180 RB, the divisor, remains 64 bit. The instruction is therefore a 128/64
181 division, producing a (pair) of 64 bit result(s), in the same way that
182 Intel [divq](https://www.felixcloutier.com/x86/div) works.
183 Overflow conditions
184 are detected in exactly the same fashion as `divdeu`, except that rather
185 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
186 zeros on overflow.
187
188 *Programmer's note: there are no Rc variants of any of these VA-Form
189 instructions. `cmpi` will need to be used to detect overflow conditions:
190 the saving in instruction count is that both RT and RS will have already
191 been set to useful values (all 1s and all zeros respectively)
192 needed as part of implementing Knuth's
193 Algorithm D*
194
195 For Scalar usage, just as for `maddedu`, `RS=RC`
196 Examples:
197
198 ```
199 # ((r0 << 64) + r2) / r1, store in r4
200 # ((r0 << 64) + r2) % r1, store in r2
201 divmod2du r4, r0, r1, r2
202 ```
203
204 [[!tag opf_rfc]]
205
206 # Appendices
207
208 Appendix E Power ISA sorted by opcode
209 Appendix F Power ISA sorted by version
210 Appendix G Power ISA sorted by Compliancy Subset
211 Appendix H Power ISA sorted by mnemonic
212
213 | Form | Book | Page | Version | mnemonic | Description |
214 |------|------|------|---------|----------|-------------|
215 | VA | I | # | 3.0B | maddedu | Multiply-Add Extend Double Unsigned |
216 | VA | I | # | 3.0B | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned
217