fix missing p in post-increment load/stores
[libreriscv.git] / openpower / sv / rfc / ls011.mdwn
1 # RFC ls011 LD/ST-Update-PostIncrement
2
3 * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU
4 Horizon2020 Grant 825310, and NGI0 Entrust No 101069594
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=1048>
6 * <https://libre-soc.org/openpower/sv/rfc/ls011/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1045>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 21 Apr 2023.
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**:
21
22 ```
23 Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3
24 Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3
25 ```
26
27 **Summary**
28
29 ```
30 TODO
31 ```
32
33 **Submitter**: Luke Leighton (Libre-SOC)
34
35 **Requester**: Libre-SOC
36
37 **Impact on processor**:
38
39 ```
40 Addition of new Load/Store Fixed and Floating Point instructions
41 ```
42
43 **Impact on software**:
44
45 ```
46 Requires support for new instructions in assembler, debuggers, and related tools.
47 Reduces instructions in hot-loops
48 ```
49
50 **Keywords**:
51
52 ```
53
54 ```
55
56 **Motivation**
57
58 Moving the update of RA to *after* the Memory operation saves on instruction count
59 both outside and inside hot-loops. strncpy may be reduced to 11 Vector instructions,
60 3 of which are the zeroing loop, 5 of which are the copy. Percentage-wise LD/ST
61 Update Post-Increment represents a massive 20% reduction.
62
63 **Notes and Observations**:
64
65 These types of instructions are already present in x86 (sort-of).
66
67 * x86 chose that store should be pre-indexed and load should be post-indexed
68 * Power ISA chose everything to be pre-indexed
69 * Motorola 68000 (decades old) has pre- and post- indexed
70
71 <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
72
73 <https://azeria-labs.com/memory-instructions-load-and-store-part-4/>
74
75 **Changes**
76
77 Add the following entries to:
78
79 * New Load/Store Sections
80 * Appendices
81
82 [[!tag opf_rfc]]
83
84 --------
85
86 \newpage{}
87
88 TODO (key stub notes below)
89
90
91
92 The LD/ST-Immediate-Post-Increment instructions are all Primary
93 Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment
94 are all effectively 9-bit XO and consequently may easily
95 fit into one single Primary Opcode. EXT2xx is recommended.
96
97 One alternative idea is that bit 31 could be allocated (retrospectively)
98 to Post-Increment. Although it may be too late for Scalar Power ISA
99 it **may** be possible to consider for SVP64Single and/or SVP64-Vector,
100 but this risks creating a non-Orthogonal ISA.
101
102
103
104 ```
105 # LD/ST-Postincrement
106 lbzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
107 lbzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
108 lhzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
109 lhzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
110 lhaup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
111 lhaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
112 lwzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
113 lwzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
114 lwaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
115 ldup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
116 ldupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
117 stbup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
118 stbupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
119 sthup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
120 sthupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
121 stwup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
122 stwupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
123 stdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
124 stdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
125
126 # FP LD/ST-Postincrement
127 lfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
128 lfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
129 lfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
130 lsdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
131 stfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
132 stfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
133 stfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
134 stfsupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
135
136 # LD/ST-Shifted-Postincrement
137 lbzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
138 lhzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
139 lhauspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
140 lwzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
141 lwauspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
142 lduspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
143 stbuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
144 sthuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
145 stwuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
146 stduspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
147
148 # FP LD/ST-Shifted-Postincrement
149 lfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
150 lfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
151 stfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
152 stfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
153
154 ```
155
156 # Example
157
158 Here is an annotated example where the pseudo-code changes to
159 just use `RA` as the address, otherwise remaining the same.
160 No actual change to the Effective Address computation itself
161 occurs, in any of the Post-Update instructions.
162
163 **Load Byte and Zero with Post-Update**
164
165 D-Form
166
167 * lbzup RT,D(RA)
168
169 Pseudo-code:
170
171 ```
172 EA <- (RA) # EA just RA
173 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load
174 RA <- (RA) + EXTS(D) # then update RA after
175 ```
176
177 Special Registers Altered:
178
179 ```
180 None
181 ```
182
183 where the same pseudocode for `lbzu` is:
184
185 ```
186 EA <- (RA) + EXTS(D) # EA includes D
187 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # load from RA+D
188 RA <- EA # and update RA
189 ```
190 -----
191
192 \newpage{}
193
194 # Fixed-point Load with Post-Update
195
196 Add the following additional Section to Fixed-Point Load: Book I 3.3.2.1
197
198 ## Load Byte and Zero with Post-Update
199
200 D-Form
201
202 ```
203 |0 |6 |9 |10 |11 |16 |31 |
204 | PO | RT | RA| D |
205 ```
206
207 * lbzup RT,D(RA)
208
209 Pseudo-code:
210
211 ```
212 EA <- (RA)
213 RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
214 RA <- (RA) + EXTS(D)
215 ```
216
217 Let the effective address (EA) be (RA|0).
218 The byte in storage addressed by EA is loaded into
219 RT[56:63]. RT[0:55] are set to 0.
220
221 The sum (RA|0)+D is placed into register RA.
222
223 If RA=0 or RA=RT, the instruction form is invalid.
224
225 Special Registers Altered:
226
227 None
228
229 ## Load Byte and Zero with Post-Update Indexed
230
231 X-Form
232
233 ```
234 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
235 | PO | RT | RA | RB | XO | / |
236 ```
237
238 * lbzupx RT,RA,RB
239
240 Pseudo-code:
241
242 ```
243 EA <- (RA)
244 RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
245 RA <- (RA) + (RB)
246 ```
247
248 Let the effective address (EA) be (RA).
249 The byte in storage addressed by EA is loaded into
250 RT[56:63]. RT[0:55] are set to 0.
251
252 The sum (RA)+(RB) is placed into register RA.
253
254 If RA=0 or RA=RT, the instruction form is invalid.
255
256 Special Registers Altered:
257
258 None
259
260 ## Load Halfword and Zero with Post-Update
261
262 D-Form
263
264 ```
265 |0 |6 |9 |10 |11 |16 |31 |
266 | PO | RT | RA| D |
267 ```
268
269 * lhzup RT,D(RA)
270
271 Pseudo-code:
272
273 ```
274 EA <- (RA)
275 RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
276 RA <- (RA) + EXTS(D)
277 ```
278
279 Let the effective address (EA) be (RA|0).
280 The halfword in storage addressed by EA is loaded into
281 RT[48:63]. RT[0:47] are set to 0.
282
283 The sum (RA|0)+D is placed into register RA.
284
285 If RA=0 or RA=RT, the instruction form is invalid.
286
287 Special Registers Altered:
288
289 None
290
291 ## Load Halfword and Zero with Post-Update Indexed
292
293 X-Form
294
295 ```
296 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
297 | PO | RT | RA | RB | XO | / |
298 ```
299
300 * lhzupx RT,RA,RB
301
302 Pseudo-code:
303
304 ```
305 EA <- (RA)
306 RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
307 RA <- (RA) + (RB)
308 ```
309
310 Let the effective address (EA) be (RA).
311 The halfword in storage addressed by EA is loaded into
312 RT[48:63]. RT[0:47] are set to 0.
313
314 The sum (RA)+(RB) is placed into register RA.
315
316 If RA=0 or RA=RT, the instruction form is invalid.
317
318 Special Registers Altered:
319
320 None
321
322 ## Load Halfword Algebraic with Post-Update
323
324 D-Form
325
326 ```
327 |0 |6 |9 |10 |11 |16 |31 |
328 | PO | RT | RA| D |
329 ```
330
331 * lhaup RT,D(RA)
332
333 Pseudo-code:
334
335 ```
336 EA <- (RA)
337 RT <- EXTS(MEM(EA, 2))
338 RA <- (RA) + EXTS(D)
339 ```
340
341 Special Registers Altered:
342
343 None
344
345 ## Load Halfword Algebraic with Post-Update Indexed
346
347 X-Form
348
349 ```
350 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
351 | PO | RT | RA | RB | XO | / |
352 ```
353
354 * lhaupx RT,RA,RB
355
356 Pseudo-code:
357
358 ```
359 EA <- (RA)
360 RT <- EXTS(MEM(EA, 2))
361 RA <- (RA) + (RB)
362 ```
363
364 Special Registers Altered:
365
366 None
367
368 ## Load Word and Zero with Post-Update
369
370 D-Form
371
372 ```
373 |0 |6 |9 |10 |11 |16 |31 |
374 | PO | RT | RA| D |
375 ```
376
377 * lwzup RT,D(RA)
378
379 Pseudo-code:
380
381 ```
382 EA <- (RA)
383 RT <- [0]*32 || MEM(EA, 4)
384 RA <- (RA) + EXTS(D)
385 ```
386
387 Let the effective address (EA) be (RA|0).
388 The word in storage addressed by EA is loaded into
389 RT[32:63]. RT[0:31] are set to 0.
390
391 The sum (RA|0)+D is placed into register RA.
392
393 If RA=0 or RA=RT, the instruction form is invalid.
394
395 Special Registers Altered:
396
397 None
398
399 ## Load Word and Zero with Post-Update Indexed
400
401 X-Form
402
403 ```
404 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
405 | PO | RT | RA | RB | XO | / |
406 ```
407
408 * lwzupx RT,RA,RB
409
410 Pseudo-code:
411
412 ```
413 EA <- (RA)
414 RT <- [0] * 32 || MEM(EA, 4)
415 RA <- (RA) + (RB)
416 ```
417
418 Let the effective address (EA) be (RA).
419 The word in storage addressed by EA is loaded into
420 RT[32:63]. RT[0:31] are set to 0.
421
422 The sum (RA)+(RB) is placed into register RA.
423
424 If RA=0 or RA=RT, the instruction form is invalid.
425
426 Special Registers Altered:
427
428 None
429
430 ## Load Word Algebraic with Post-Update Indexed
431
432 X-Form
433
434 ```
435 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
436 | PO | RT | RA | RB | XO | / |
437 ```
438
439 * lwaupx RT,RA,RB
440
441 Pseudo-code:
442
443 ```
444 EA <- (RA)
445 RT <- EXTS(MEM(EA, 4))
446 RA <- (RA) + (RB)
447 ```
448
449 Special Registers Altered:
450
451 None
452
453 ## Load Doubleword with Post-Update Indexed
454
455 DS-Form
456
457 * ldup RT,DS(RA)
458
459 Pseudo-code:
460
461 ```
462 EA <- (RA)
463 RT <- MEM(EA, 8)
464 RA <- (RA) + EXTS(DS || 0b00)
465 ```
466
467 Special Registers Altered:
468
469 None
470
471 ## Load Doubleword with Post-Update Indexed
472
473 X-Form
474
475 ```
476 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
477 | PO | RT | RA | RB | XO | / |
478 ```
479
480 * ldupx RT,RA,RB
481
482 Pseudo-code:
483
484 ```
485 EA <- (RA)
486 RT <- MEM(EA, 8)
487 RA <- (RA) + (RB)
488 ```
489
490 Special Registers Altered:
491
492 None
493
494 -----
495
496 \newpage{}
497
498 # Fixed-Point Store Post-Update
499
500 Add the following as a new section in Fixed-Point Store, Book I
501
502 ## Store Byte with Update
503
504 D-Form
505
506 ```
507 |0 |6 |9 |10 |11 |16 |31 |
508 | PO | RT | RA| D |
509 ```
510
511 * stbup RS,D(RA)
512
513 Pseudo-code:
514
515 ```
516 EA <- (RA) + EXTS(D)
517 ea <- (RA)
518 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
519 RA <- EA
520 ```
521
522 Special Registers Altered:
523
524 None
525
526 ## Store Byte with Update Indexed
527
528 X-Form
529
530 ```
531 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
532 | PO | RS | RA | RB | XO | / |
533 ```
534
535 * stbupx RS,RA,RB
536
537 Pseudo-code:
538
539 ```
540 EA <- (RA) + (RB)
541 ea <- (RA)
542 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
543 RA <- EA
544 ```
545
546 Special Registers Altered:
547
548 None
549
550 ## Store Halfword with Update
551
552 D-Form
553
554 ```
555 |0 |6 |9 |10 |11 |16 |31 |
556 | PO | RT | RA| D |
557 ```
558
559 * sthup RS,D(RA)
560
561 Pseudo-code:
562
563 ```
564 EA <- (RA) + EXTS(D)
565 ea <- (RA)
566 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
567 RA <- EA
568 ```
569
570 Special Registers Altered:
571
572 None
573
574 ## Store Halfword with Update Indexed
575
576 X-Form
577
578 ```
579 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
580 | PO | RS | RA | RB | XO | / |
581 ```
582
583 * sthupx RS,RA,RB
584
585 Pseudo-code:
586
587 ```
588 EA <- (RA) + (RB)
589 ea <- (RA)
590 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
591 RA <- EA
592 ```
593
594 Special Registers Altered:
595
596 None
597
598 ## Store Word with Update
599
600 D-Form
601
602 ```
603 |0 |6 |9 |10 |11 |16 |31 |
604 | PO | RT | RA| D |
605 ```
606
607 * stwup RS,D(RA)
608
609 Pseudo-code:
610
611 ```
612 EA <- (RA) + EXTS(D)
613 ea <- (RA)
614 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
615 RA <- EA
616 ```
617
618 Special Registers Altered:
619
620 None
621
622 ## Store Word with Update Indexed
623
624 X-Form
625
626 ```
627 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
628 | PO | RS | RA | RB | XO | / |
629 ```
630
631 * stwupx RS,RA,RB
632
633 Pseudo-code:
634
635 ```
636 EA <- (RA) + (RB)
637 ea <- (RA)
638 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
639 RA <- EA
640 ```
641
642 Special Registers Altered:
643
644 None
645
646 ## Store Doubleword with Update
647
648 DS-Form
649
650 * stdup RS,DS(RA)
651
652 Pseudo-code:
653
654 ```
655 EA <- (RA) + EXTS(DS || 0b00)
656 ea <- (RA)
657 MEM(ea, 8) <- (RS)
658 RA <- EA
659 ```
660
661 Special Registers Altered:
662
663 None
664
665 ## Store Doubleword with Update Indexed
666
667 X-Form
668
669 ```
670 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
671 | PO | RS | RA | RB | XO | / |
672 ```
673
674 * stdupx RS,RA,RB
675
676 Pseudo-code:
677
678 ```
679 EA <- (RA) + (RB)
680 ea <- (RA)
681 MEM(ea, 8) <- (RS)
682 RA <- EA
683 ```
684
685 Special Registers Altered:
686
687 None
688
689 [[!tag opf_rfc]]