openpower/sv/rfc/ls004.mdwn

   1 # RFC ls004  Shift-And-Add
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
   7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
  10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  11
  12 **Severity**: Major
  13
  14 **Status**: New
  15
  16 **Date**: 31 Oct 2022
  17
  18 **Target**: v3.2B
  19
  20 **Source**: v3.0B
  21
  22 **Books and Section affected**:
  23
  24 ```
  25     Book I Fixed-Point Shift Instructions 3.3.14.2
  26     Appendix E Power ISA sorted by opcode
  27     Appendix F Power ISA sorted by version
  28     Appendix G Power ISA sorted by Compliancy Subset
  29     Appendix H Power ISA sorted by mnemonic
  30 ```
  31
  32 **Summary**
  33
  34 ```
  35     Instructions added
  36     shadd - Shift and Add
  37     shaddw - Shift and Add Signed Word
  38     shadduw - Shift and Add Unsigned Word
  39 ```
  40
  41 **Submitter**: Luke Leighton (Libre-SOC)
  42
  43 **Requester**: Libre-SOC
  44
  45 **Impact on processor**:
  46
  47 ```
  48     Addition of three new GPR-based instructions
  49 ```
  50
  51 **Impact on software**:
  52
  53 ```
  54     Requires support for new instructions in assembler, debuggers,
  55     and related tools.
  56 ```
  57
  58 **Keywords**:
  59
  60 ```
  61     GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
  62 ```
  63
  64 **Motivation**
  65
  66 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
  67 and x86.  Adding more LD/ST is thirty eight instructions, a compromise is to
  68 add shift-and-add.  Replaces a pair of explicit instructions in hot-loops.
  69
  70 **Notes and Observations**:
  71
  72 1. `shadd` and `shadduw` operate on unsigned integers.
  73 2. `shadduw` is intended for performing address offsets,
  74     as the second operand is constrained to lower 32-bits
  75     and zero-extended.
  76 3. All three are 2-in 1-out instructions.
  77 4. shift-add operations are present in both x86 and aarch64,
  78     since they are useful for both general arithmetic and for
  79     computing addresses even when not immediately followed
  80     with a load/store.
  81 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
  82     to use `int` for array indexing. for additional details see
  83     <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
  84
  85 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
  86 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  87
  88 **Changes**
  89
  90 Add the following entries to:
  91
  92 * the Appendices of Book I
  93 * Instructions of Book I added to Section 3.3.14.2
  94
  95 ----------------
  96
  97 \newpage{}
  98
  99 # Table of LD/ST-Indexed-Shift
 100
 101 The following demonstrates the alternative instructions that could
 102 be considered to be added. They are all 9-bit XO which is not hugely
 103 costly.  The totals are
 104
 105 * 12 Load Indexed Shifted (with Update)
 106 * 3 Load Indexed Shifted Byte-reverse
 107 * 8 Store Indexed Shifted (with Update)
 108 * 3 Store Indexed Shifted Byte-reverse
 109 * 6 Floating-Point Load Indexed Shifted (with Update)
 110 * 6 Floating-Point Store Indexed Shifted (with Update)
 111
 112 Total count: 38 new 9-bit XO instructions, for an approximate total
 113 XO cost of 3 bits within a single Primary Opcode.  With the savings
 114 that these instructions represent in hot-loops, as evidenced by their
 115 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
 116 justifiable.  However there is no point in placing these in EXT2xx, they
 117 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
 118 reduction in binary size is not achieved.
 119
 120 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction          |
 121 |-------|------|-------|-------|-------|-------|----------------------|
 122 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzsx RT,RA,RB,sm    |
 123 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzusx RT,RA,RB,sm   |
 124 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzsx RT,RA,RB,sm    |
 125 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzusx RT,RA,RB,sm   |
 126 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhasx RT,RA,RB,sm    |
 127 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhausx RT,RA,RB,sm   |
 128 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzsx RT,RA,RB,sm    |
 129 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzusx RT,RA,RB,sm   |
 130 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwasx RT,RA,RB,sm    |
 131 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwausx RT,RA,RB,sm   |
 132 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldsx RT,RA,RB,sm     |
 133 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldusx RT,RA,RB,sm    |
 134 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhbrsx RT,RA,RB,sm   |
 135 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwbrsx RT,RA,RB,sm   |
 136 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldbrsx RT,RA,RB,sm   |
 137 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbsx RS,RA,RB,sm    |
 138 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbusx RS,RA,RB,sm   |
 139 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthsx RS,RA,RB,sm    |
 140 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthusx RS,RA,RB,sm   |
 141 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwsx RS,RA,RB,sm    |
 142 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwusx RS,RA,RB,sm   |
 143 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdsx RS,RA,RB,sm    |
 144 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdusx RS,RA,RB,sm   |
 145 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthbrsx RS,RA,RB,sm  |
 146 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwbrsx RS,RA,RB,sm  |
 147 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdbrsx RS,RA,RB,sm  |
 148 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsxs FRT,RA,RB,sm   |
 149 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsuxs FRT,RA,RB,sm  |
 150 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdxs FRT,RA,RB,sm   |
 151 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfduxs FRT,RA,RB,sm  |
 152 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwaxs FRT,RA,RB,sm |
 153 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwzxs FRT,RA,RB,sm |
 154 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsxs FRS,RA,RB,sm  |
 155 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsuxs FRS,RA,RB,sm |
 156 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdxs FRS,RA,RB,sm  |
 157 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfduxs FRS,RA,RB,sm |
 158 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfiwxs FRS,RA,RB,sm |
 159
 160 ----------------
 161
 162 \newpage{}
 163
 164 # Shift-and-Add
 165
 166 `shadd RT, RA, RB`
 167
 168 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 169 |-------|------|-------|-------|-------|-------|----|----------|
 170 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 171
 172 Pseudocode:
 173
 174 ```
 175     shift <- sm + 1                     # Shift is between 1-4
 176     sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
 177     RT <- sum                           # Result stored in RT
 178 ```
 179
 180 When `sm` is zero, the contents of register RB are multiplied by 2,
 181 added to the contents of register RA, and the result stored in RT.
 182
 183 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 184
 185 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 186
 187 **NEED EXAMPLES (not sure how to embed sm)!!!**
 188 Examples:
 189
 190 ```
 191     # adds r1 to (r2*8)
 192     shadd r4, r1, r2, 3
 193 ```
 194
 195 # Shift-and-Add Signed Word
 196
 197 `shaddw RT, RA, RB`
 198
 199 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 200 |-------|------|-------|-------|-------|-------|----|----------|
 201 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 202
 203 Pseudocode:
 204
 205 ```
 206     shift <- sm + 1                  # Shift is between 1-4
 207     n <- EXTS64((RB)[32:63])         # Only use lower 32-bits of RB
 208     sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
 209     RT <- sum                        # Result stored in RT
 210 ```
 211
 212 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 213 added to the contents of register RA, and the result stored in RT.
 214
 215 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 216
 217 Operands RA and RB, and the result RT are all 64-bit, signed integers.
 218
 219 *Programmer's Note:
 220 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 221 address. RB is the offset into data structure limited to 32-bit.*
 222
 223 Examples:
 224
 225 ```
 226 #
 227 shaddw r4, r1, r2
 228 ```
 229
 230 [[!tag opf_rfc]]
 231
 232 # Shift-and-Add Unsigned Word
 233
 234 `shadduw RT, RA, RB`
 235
 236 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 237 |-------|------|-------|-------|-------|-------|----|----------|
 238 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 239
 240 Pseudocode:
 241
 242 ```
 243     shift <- sm + 1                  # Shift is between 1-4
 244     n <- (RB)[32:63]                 # Only use lower 32-bits of RB
 245     sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
 246     RT <- sum                        # Result stored in RT
 247 ```
 248
 249 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 250 added to the contents of register RA, and the result stored in RT.
 251
 252 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 253
 254 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 255
 256 *Programmer's Note:
 257 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 258 address. RB is the offset into data structure limited to 32-bit.*
 259
 260 Examples:
 261
 262 ```
 263 #
 264 shadduw r4, r1, r2
 265 ```
 266
 267 [[!tag opf_rfc]]
 268
 269 # Appendices
 270
 271     Appendix E Power ISA sorted by opcode
 272     Appendix F Power ISA sorted by version
 273     Appendix G Power ISA sorted by Compliancy Subset
 274     Appendix H Power ISA sorted by mnemonic
 275
 276 | Form | Book | Page | Version | mnemonic | Description |
 277 |------|------|------|---------|----------|-------------|
 278 | Z23  | I    | #    | 3.0B    | shadd    | Shift-and-Add |
 279 | Z23  | I    | #    | 3.0B    | shaddw   | Shift-and-Add Signed Word |
 280 | Z23  | I    | #    | 3.0B    | shadduw  | Shift-and-Add Unsigned Word |