(no commit message)
[libreriscv.git] / openpower / sv / rfc / ls003.mdwn
1 # RFC ls003 Big Integer
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 20 Oct 2022
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**: **UPDATE**
21
22 ```
23 Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
28 ```
29
30 **Summary**
31
32 ```
33 Instructions added
34 maddedu - Multiply-Add Extended Double Unsigned
35 divmod2du - Divide/Modulo Quad-Double Unsigned
36 ```
37
38 **Submitter**: Luke Leighton (Libre-SOC)
39
40 **Requester**: Libre-SOC
41
42 **Impact on processor**:
43
44 ```
45 Addition of two new GPR-based instructions
46 ```
47
48 **Impact on software**:
49
50 ```
51 Requires support for new instructions in assembler, debuggers,
52 and related tools.
53 ```
54
55 **Keywords**:
56
57 ```
58 GPR, Big-integer, Double-word
59 ```
60
61 **Motivation**
62
63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
64 accumulation affect: `RC` effectively becomes a 64-bit carry in chains
65 of highly-efficient loop-unrolled arbitrary-length big-integer operations.
66 Similar to `divdeu`, and has similar advantages to `maddedu`,
67 Modulo result is available with the quotient in a single instruction
68 allowing highly-efficient arbitrary-length big-integer division.
69
70 **Notes and Observations**:
71
72 1. There is no need for an Rc=1 variant as VA-Form is being used.
73 2. There is no need for Special Registers as VA-Form is being used.
74 3. Both instructions have been present in Intel x86 for several decades.
75 4. Neither instruction is present in VSX: these are 128/64 whereas
76 VSX is 128/128.
77 5. `maddedu` and `divmod2du` are full inverses of each other, including
78 when used for arbitrary-length big-integer arithmetic
79
80 **Changes**
81
82 Add the following entries to:
83
84 * the Appendices of Book I
85 * Instructions of Book I added to Section 3.3.9.1
86
87 ----------------
88
89 \newpage{}
90
91 # Multiply-Add Extended Double Unsigned
92
93 `maddedu RT, RA, RB, RC`
94
95 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
96 |-------|------|-------|-------|-------|-------|---------|
97 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
98
99 Pseudocode:
100
101 ```
102 prod[0:127] <- (RA) * (RB) # Multiply RA and RB, result 128-bit
103 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
104 RT <- sum[64:127] # Store low half in RT
105 RS <- sum[0:63] # RS implicit register, equal to RC
106 ```
107
108 Special registers altered:
109
110 None
111
112 RC is zero-extended (not shifted, not sign-extended), the 128-bit product added
113 to it; the lower half of that result stored in RT and the upper half
114 in RS.
115
116 The differences here to `maddhdu` are that `maddhdu` stores the upper
117 half in RT, where `maddedu` stores the upper half in RS.
118
119 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
120 performing sign-extension on RC, because RT is the full mathematical result
121 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
122 results modulo 2^64. This is why there is no maddldu instruction.
123
124 RS is implictly defined as the same register as RC.
125
126 *Programmer's Note:
127 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
128 To achieve a big-integer rolling-accumulation effect:
129 assuming the scalar to multiply is in r0,
130 the vector to multiply by starts at r4 and the result vector
131 in r20, instructions may be issued `maddedu r20,r4,r0,r20
132 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
133 stored the upper half of the 128-bit multiply into r21, such
134 that it may be picked up by the second `maddedu`. Repeat inline
135 to construct a larger bigint scalar-vector multiply,
136 as Scalar GPR register file space permits.*
137
138 Examples:
139
140 ```
141 # (r0 * r1) + r2, store lower in r4, upper in r2
142 maddedu r4, r0, r1, r2
143 ```
144
145 # Divide/Modulo Quad-Double Unsigned
146
147 **Should name be Divide/Module Double Extended Unsigned?**
148 **Check the pseudo-code comments**
149
150 `divmod2du RT,RA,RB,RC`
151
152 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
153 |-------|------|-------|-------|-------|-------|---------|
154 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
155
156 Pseudo-code:
157
158 ```
159 if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then # Check RA<RB, for divide-by-0
160 dividend[0:(XLEN*2)-1] <- (RA) || (RC) # Combine RA/RC, zero extend
161 divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
162 result <- dividend / divisor # Division
163 modulo <- dividend % divisor # Modulo
164 RT <- result[XLEN:(XLEN*2)-1] # Store result in RT
165 RS <- modulo[XLEN:(XLEN*2)-1] # Modulo in RC, implicit
166 else # In case of error
167 RT <- [1]*XLEN # RT all 1's
168 RS <- [0]*XLEN # RS all 0's
169 ```
170
171 Special registers altered:
172
173 None
174
175 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
176 that is near-identical to `divdeu` except that:
177
178 * the lower 64 bits of the dividend, instead of being zero, contain a
179 register, RC.
180 * it performs a fused divide and modulo in a single instruction, storing
181 the modulo in an implicit RS (similar to `maddedu`)
182
183 RB, the divisor, remains 64 bit. The instruction is therefore a 128/64
184 division, producing a (pair) of 64 bit result(s), in the same way that
185 Intel [divq](https://www.felixcloutier.com/x86/div) works.
186 Overflow conditions
187 are detected in exactly the same fashion as `divdeu`, except that rather
188 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
189 zeros on overflow.
190
191 *Programmer's note: there are no Rc variants of any of these VA-Form
192 instructions. `cmpi` will need to be used to detect overflow conditions:
193 the saving in instruction count is that both RT and RS will have already
194 been set to useful values (all 1s and all zeros respectively)
195 needed as part of implementing Knuth's Algorithm D*
196
197 For Scalar usage, just as for `maddedu`, `RS=RC`
198 Examples:
199
200 ```
201 # ((r0 << 64) + r2) / r1, store in r4
202 # ((r0 << 64) + r2) % r1, store in r2
203 divmod2du r4, r0, r1, r2
204 ```
205
206 [[!tag opf_rfc]]
207
208 # Appendices
209
210 Appendix E Power ISA sorted by opcode
211 Appendix F Power ISA sorted by version
212 Appendix G Power ISA sorted by Compliancy Subset
213 Appendix H Power ISA sorted by mnemonic
214
215 | Form | Book | Page | Version | mnemonic | Description |
216 |------|------|------|---------|----------|-------------|
217 | VA | I | # | 3.0B | maddedu | Multiply-Add Extend Double Unsigned |
218 | VA | I | # | 3.0B | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned
219