bug 1048, ls011: Remove duplicate Fixed-Point Store Post-Update.
[libreriscv.git] / openpower / sv / rfc / ls011.mdwn
1 # RFC ls011 LD/ST-Update-PostIncrement
2
3 * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU
4 Horizon2020 Grant 825310, and NGI0 Entrust No 101069594
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=1048>
6 * <https://libre-soc.org/openpower/sv/rfc/ls011/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1045>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 21 Apr 2023.
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**:
21
22 ```
23 Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3
24 Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3
25 ```
26
27 **Summary**
28
29 ```
30 TODO
31 ```
32
33 **Submitter**: Luke Leighton (Libre-SOC)
34
35 **Requester**: Libre-SOC
36
37 **Impact on processor**:
38
39 ```
40 Addition of new Load/Store Fixed and Floating Point instructions
41 ```
42
43 **Impact on software**:
44
45 ```
46 Requires support for new instructions in assembler, debuggers, and related tools.
47 Reduces instructions in hot-loops
48 ```
49
50 **Keywords**:
51
52 ```
53
54 ```
55
56 **Motivation**
57
58 Moving the update of RA to *after* the Memory operation saves on instruction count
59 both outside and inside hot-loops. strncpy may be reduced to 11 Vector instructions,
60 3 of which are the zeroing loop, 5 of which are the copy. Percentage-wise LD/ST
61 Update Post-Increment represents a massive 20% reduction.
62
63 **Notes and Observations**:
64
65 These types of instructions are already present in x86 (sort-of).
66
67 * x86 chose that store should be pre-indexed and load should be post-indexed
68 * Power ISA chose everything to be pre-indexed
69 * Motorola 68000 (decades old) has pre- and post- indexed
70
71 <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
72
73 <https://azeria-labs.com/memory-instructions-load-and-store-part-4/>
74
75 **Changes**
76
77 Add the following entries to:
78
79 * New Load/Store Sections
80 * Appendices
81
82 [[!tag opf_rfc]]
83
84 --------
85
86 \newpage{}
87
88 TODO (key stub notes below)
89
90
91
92 The LD/ST-Immediate-Post-Increment instructions are all Primary
93 Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment
94 are all effectively 9-bit XO and consequently may easily
95 fit into one single Primary Opcode. EXT2xx is recommended.
96
97 One alternative idea is that bit 31 could be allocated (retrospectively)
98 to Post-Increment. Although it may be too late for Scalar Power ISA
99 it **may** be possible to consider for SVP64Single and/or SVP64-Vector,
100 but this risks creating a non-Orthogonal ISA.
101
102
103
104 ```
105 # LD/ST-Postincrement
106 lbzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
107 lbzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
108 lhzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
109 lhzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
110 lhaup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
111 lhaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
112 lwzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
113 lwzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
114 lwaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
115 ldup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
116 ldupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
117 stbup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
118 stbupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
119 sthup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
120 sthupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
121 stwup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
122 stwupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
123 stdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
124 stdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
125
126 # FP LD/ST-Postincrement
127 lfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
128 lfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
129 lfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
130 lsdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
131 stfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
132 stfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
133 stfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
134 stfsupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
135
136 # LD/ST-Shifted-Postincrement
137 lbzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
138 lhzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
139 lhaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
140 lwzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
141 lwaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
142 ldupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
143 stbupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
144 sthupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
145 stwupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
146 stdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
147
148 # FP LD/ST-Shifted-Postincrement
149 lfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
150 lfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
151 stfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
152 stfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
153
154 ```
155
156 # Example
157
158 Here is an annotated example where the pseudo-code changes to
159 just use `RA` as the address, otherwise remaining the same.
160 No actual change to the Effective Address computation itself
161 occurs, in any of the Post-Update instructions.
162
163 **Load Byte and Zero with Post-Update**
164
165 D-Form
166
167 * lbzup RT,D(RA)
168
169 Pseudo-code:
170
171 ```
172 EA <- (RA) # EA just RA
173 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load
174 RA <- (RA) + EXTS(D) # then update RA after
175 ```
176
177 Special Registers Altered:
178
179 ```
180 None
181 ```
182
183 where the same pseudocode for `lbzu` is:
184
185 ```
186 EA <- (RA) + EXTS(D) # EA includes D
187 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # load from RA+D
188 RA <- EA # and update RA
189 ```
190 -----
191
192 \newpage{}
193
194 # Fixed-point Load with Post-Update
195
196 Add the following additional Section to Fixed-Point Load: Book I 3.3.2.1
197
198 [[!inline pages="openpower/isa/pifixedload" raw=yes ]]
199
200 -----
201
202 \newpage{}
203
204 # Fixed-Point Store Post-Update
205
206 Add the following as a new section in Fixed-Point Store, Book I
207
208 [[!inline pages="openpower/isa/pifixedstore" raw=yes ]]
209
210 \newpage{}
211 [[!inline pages="openpower/isa/fixedload" raw=yes ]]
212 \newpage{}
213 [[!inline pages="openpower/isa/fixedstore" raw=yes ]]
214 \newpage{}
215 [[!inline pages="openpower/isa/fpload" raw=yes ]]
216 \newpage{}
217 [[!inline pages="openpower/isa/fpstore" raw=yes ]]
218
219 [[!tag opf_rfc]]