fix minmax pseudo-code -- CR0 must not have lt/gt swapped
[libreriscv.git] / openpower / sv / rfc / ls013.mdwn
1 # RFC ls013 Min/Max GPR/FPR
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
8
9 **Severity**: Major
10
11 **Status**: New
12
13 **Date**: 14 Apr 2023
14
15 **Target**: v3.2B
16
17 **Source**: v3.1B
18
19 **Books and Section affected**:
20
21 ```
22 Book I Fixed-Point and Floating-Point Instructions
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
27 ```
28
29 **Summary**
30
31 ```
32 Instructions added
33 ```
34
35 **Submitter**: Luke Leighton (Libre-SOC)
36
37 **Requester**: Libre-SOC
38
39 **Impact on processor**:
40
41 ```
42 Addition of new GPR-based and FPR-based instructions
43 ```
44
45 **Impact on software**:
46
47 ```
48 Requires support for new instructions in assembler, debuggers,
49 and related tools.
50 ```
51
52 **Keywords**:
53
54 ```
55 GPR, FPR, min, max, fmin, fmax
56 ```
57
58 **Motivation**
59
60 Minimum/Maximum are common operations that can take an astounding number of
61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
63 instruction in order to effectively implement Reduce-Min/Max.
64
65 **Notes and Observations**:
66
67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
68 work with, for best effectiveness. With no SFFS minimum/maximum
69 instructions Simple-V min/max Parallel Reduction is severely compromised.
70 2. Once one FP min/max mode is implemented the rest are not much more hardware.
71 3. There exists similar instructions in VSX (not IEEE754-2019 though).
72 This is frequently used to justify not adding them. However SVP64/VSX may
73 have different meaning from SVP64/SFFS, so it is *really* crucial to have
74 SFFS ops even if "equivalent" to VSX in order for SVP64 to not be
75 compromised (non-orthogonal).
76 4. FP min/max are rather complex to implement in software, the most commonly
77 used FP max function `fmax` from glibc compiled for SFFS is an astounding
78 32 instructions.
79
80 **Changes**
81
82 Add the following entries to:
83
84 * the Appendices of Book I
85 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
86 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
87 * Book I 1.6.1 and 1.6.2
88
89 ----------------
90
91 \newpage{}
92
93 # Floating-Point Instructions
94
95 This group is to provide Floating-Point min/max however with IEEE754 having advanced
96 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
97
98 ## `FMM` -- Floating Min/Max Mode
99
100 <a id="fmm-floating-min-max-mode"></a>
101
102 | `FMM` | Extended Mnemonic | Origin | Semantics |
103 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
104 | 0000 | fminnum08[s] FRT, FRA, FRB | IEEE 754-2008 | FRT = minNum(FRA, FRB) (1) |
105 | 0001 | fmin19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minimum(FRA, FRB) |
106 | 0010 | fminnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minimumNumber(FRA, FRB) |
107 | 0011 | fminc[s] FRT, FRA, FRB | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB |
108 | 0100 | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3)) | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
109 | 0101 | fminmag19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) |
110 | 0110 | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
111 | 0111 | fminmagc[s] FRT, FRA, FRB | - | FRT = minmaxmag(FRA, FRB, False, fminc) (2) |
112 | 1000 | fmaxnum08[s] FRT, FRA, FRB | IEEE 754-2008 | FRT = maxNum(FRA, FRB) (1) |
113 | 1001 | fmax19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = maximum(FRA, FRB) |
114 | 1010 | fmaxnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = maximumNumber(FRA, FRB) |
115 | 1011 | fmaxc[s] FRT, FRA, FRB | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB |
116 | 1100 | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3)) | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) |
117 | 1101 | fmaxmag19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) |
118 | 1110 | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) |
119 | 1111 | fmaxmagc[s] FRT, FRA, FRB | - | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) |
120
121 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
122 +0.0. This is left unspecified in IEEE 754-2008.
123
124 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
125
126 ```python
127 def minmaxmag(x, y, is_max, fallback):
128 a = abs(x) < abs(y)
129 b = abs(x) > abs(y)
130 if is_max:
131 a, b = b, a # swap
132 if a:
133 return x
134 if b:
135 return y
136 # equal magnitudes, or NaN input(s)
137 return fallback(x, y)
138 ```
139
140 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
141 minimum/maximumMagnitudeNumber
142
143 ----------------
144
145 \newpage{}
146
147 ## Floating Minimum/Maximum MM-form
148
149 * fminmax FRT, FRA, FRB, FMM
150 * fminmax. FRT, FRA, FRB, FMM
151
152 ```
153 |0 |6 |11 |16 |21 |25 |31 |
154 | PO | FRT | FRA | FRB | FMM | XO | Rc |
155 ```
156
157 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
158 result in FRT.
159
160 Special Registers altered:
161
162 ```
163 FX VXSNAN
164 CR1 (if Rc=1)
165 ```
166
167 Extended Mnemonics:
168
169 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
170
171 ----------
172
173 ## Floating Minimum/Maximum Single MM-form
174
175 * fminmaxs FRT, FRA, FRB, FMM
176 * fminmaxs. FRT, FRA, FRB, FMM
177
178 ```
179 |0 |6 |11 |16 |21 |25 |31 |
180 | PO | FRT | FRA | FRB | FMM | XO | Rc |
181 ```
182
183 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
184 result in FRT.
185
186 Special Registers altered:
187
188 ```
189 FX VXSNAN
190 CR1 (if Rc=1)
191 ```
192
193 Extended Mnemonics:
194
195 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
196
197 ----------
198
199 \newpage{}
200
201 # Fixed-Point Instructions
202
203 These are signed and unsigned, min or max. SVP64 Prefixing defines Saturation
204 semantics therefore Saturated variants of these instructions need not be proposed.
205
206 ## `MMM` -- Integer Min/Max Mode
207
208 <a id="mmm-integer-min-max-mode"></a>
209
210 * bit 0: set if word variant else dword
211 * bit 1: set if signed else unsigned
212 * bit 2: set if max else min
213
214 | `MMM` | Extended Mnemonic | Semantics |
215 |-------|-------------------|----------------------------------------------|
216 | 000 | `minu RT,RA,RB` | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
217 | 001 | `maxu RT,RA,RB` | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
218 | 010 | `mins RT,RA,RB` | `RT = (int64_t)RA < (int64_t)RB ? RA : RB` |
219 | 011 | `maxs RT,RA,RB` | `RT = (int64_t)RA > (int64_t)RB ? RA : RB` |
220 | 100 | `minuw RT,RA,RB` | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
221 | 101 | `maxuw RT,RA,RB` | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
222 | 110 | `minsw RT,RA,RB` | `RT = (int32_t)RA < (int32_t)RB ? RA : RB` |
223 | 111 | `maxsw RT,RA,RB` | `RT = (int32_t)RA > (int32_t)RB ? RA : RB` |
224
225 ## Minimum/Maximum MM-Form
226
227 * minmax RT, RA, RB, MMM
228 * minmax. RT, RA, RB, MMM
229
230 ```
231 |0 |6 |11 |16 |21 |24 |25 |31 |
232 | PO | RT | RA | RB | MMM | / | XO | Rc |
233 ```
234
235 ```
236 a <- (RA|0)
237 b <- (RB)
238 if MMM[0] then # word mode
239 # shift left by XLEN/2 to make the dword comparison
240 # do word comparison of the original inputs
241 a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
242 b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
243 if MMM[1] then # signed mode
244 # invert sign bits to make the unsigned comparison
245 # do signed comparison of the original inputs
246 a[0] <- ¬a[0]
247 b[0] <- ¬b[0]
248 # if Rc = 1 then store the result of comparing a and b to CR0
249 if Rc = 1 then
250 if a <u b then
251 CR0 <- 0b100 || XER.SO
252 if a = b then
253 CR0 <- 0b001 || XER.SO
254 if a >u b then
255 CR0 <- 0b010 || XER.SO
256 if MMM[2] then # max mode
257 # swap a and b to make the less than comparison do
258 # greater than comparison of the original inputs
259 t <- a
260 a <- b
261 b <- t
262 # store the entire selected source (even in word mode)
263 # if Rc = 1 then store the result of comparing a and b to CR0
264 if a <u b then RT <- (RA|0)
265 else RT <- (RB)
266 ```
267
268 Compute the integer minimum/maximum according to `MMM` of `(RA|0)` and `(RB)`
269 and store the result in `RT`.
270
271 Special Registers altered:
272
273 ```
274 CR0 (if Rc=1)
275 ```
276
277 Extended Mnemonics:
278
279 see [`MMM` -- Integer Min/Max Mode](#mmm-integer-min-max-mode)
280
281 ----------
282
283 \newpage{}
284
285 # Instruction Formats
286
287 Add the following entries to Book I 1.6.1 Word Instruction Formats:
288
289 ## MM-FORM
290
291 ```
292 |0 |6 |11 |16 |21 |24 |25 |31 |
293 | PO | FRT | FRA | FRB | FMM | XO | Rc |
294 | PO | RT | RA | RB | MMM | / | XO | Rc |
295 ```
296
297 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
298
299 ```
300 FMM (21:24)
301 Field used to specify minimum/maximum mode for fminmax[s].
302
303 Formats: MM
304
305 MMM (21:23)
306 Field used to specify minimum/maximum mode for integer minmax.
307
308 Formats: MM
309 ```
310
311 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
312 `Rc`, `RT`, `RA` and `RB`.
313
314 ----------
315
316 \newpage{}
317
318 # Appendices
319
320 Appendix E Power ISA sorted by opcode
321 Appendix F Power ISA sorted by version
322 Appendix G Power ISA sorted by Compliancy Subset
323 Appendix H Power ISA sorted by mnemonic
324
325 | Form | Book | Page | Version | Mnemonic | Description |
326 |------|------|------|---------|----------|-------------|
327 | MM | I | # | 3.2B | fminmax | Floating Minimum/Maximum |
328 | MM | I | # | 3.2B | fminmaxs | Floating Minimum/Maximum Single |
329 | MM | I | # | 3.2B | minmax | Minimum/Maximum |
330
331 ## fmax instruction count
332
333 32 instructions are required in SFFS to emulate fmax.
334
335 ```
336 #include <stdint.h>
337 #include <string.h>
338
339 inline uint64_t asuint64(double f) {
340 union {
341 double f;
342 uint64_t i;
343 } u = {f};
344 return u.i;
345 }
346
347 inline int issignaling(double v) {
348 // copied from glibc:
349 // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/sysdeps/ieee754/dbl-64/math_config.h#L101
350 uint64_t ix = asuint64(v);
351 return 2 * (ix ^ 0x0008000000000000) > 2 * 0x7ff8000000000000ULL;
352 }
353
354 double fmax(double x, double y) {
355 // copied from glibc:
356 // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/math/s_fmax_template.c
357 if(__builtin_isgreaterequal(x, y))
358 return x;
359 else if(__builtin_isless(x, y))
360 return y;
361 else if(issignaling(x) || issignaling(y))
362 return x + y;
363 else
364 return __builtin_isnan(y) ? x : y;
365 }
366 ```
367
368 Assembly listing:
369
370 ```
371 fmax(double, double):
372 fcmpu 0,1,2
373 fmr 0,1
374 cror 30,1,2
375 beq 7,.L12
376 blt 0,.L13
377 stfd 1,-16(1)
378 lis 9,0x8
379 li 8,-1
380 sldi 9,9,32
381 rldicr 8,8,0,11
382 ori 2,2,0
383 ld 10,-16(1)
384 xor 10,10,9
385 sldi 10,10,1
386 cmpld 0,10,8
387 bgt 0,.L5
388 stfd 2,-16(1)
389 ori 2,2,0
390 ld 10,-16(1)
391 xor 9,10,9
392 sldi 9,9,1
393 cmpld 0,9,8
394 ble 0,.L6
395 .L5:
396 fadd 1,0,2
397 blr
398 .L13:
399 fmr 1,2
400 blr
401 .L6:
402 fcmpu 0,2,2
403 fmr 1,2
404 bnulr 0
405 .L12:
406 fmr 1,0
407 blr
408 .long 0
409 .byte 0,9,0,0,0,0,0,0
410 ```
411
412 [[!tag opf_rfc]]
413