openpower/sv/rfc/ls004.mdwn

   1 # RFC ls004  Shift-And-Add
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
   7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
  10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  11
  12 **Severity**: Major
  13
  14 **Status**: New
  15
  16 **Date**: 31 Oct 2022
  17
  18 **Target**: v3.2B
  19
  20 **Source**: v3.0B
  21
  22 **Books and Section affected**:
  23
  24 ```
  25     Book I Fixed-Point Shift Instructions 3.3.14.2
  26     Appendix E Power ISA sorted by opcode
  27     Appendix F Power ISA sorted by version
  28     Appendix G Power ISA sorted by Compliancy Subset
  29     Appendix H Power ISA sorted by mnemonic
  30 ```
  31
  32 **Summary**
  33
  34 ```
  35     Instructions added
  36     shadd - Shift and Add
  37     shaddw - Shift and Add Signed Word
  38     shadduw - Shift and Add Unsigned Word
  39 ```
  40
  41 **Submitter**: Luke Leighton (Libre-SOC)
  42
  43 **Requester**: Libre-SOC
  44
  45 **Impact on processor**:
  46
  47 ```
  48     Addition of three new GPR-based instructions
  49 ```
  50
  51 **Impact on software**:
  52
  53 ```
  54     Requires support for new instructions in assembler, debuggers,
  55     and related tools.
  56 ```
  57
  58 **Keywords**:
  59
  60 ```
  61     GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
  62 ```
  63
  64 **Motivation**
  65
  66 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
  67 and x86.  Adding more LD/ST is thirty eight instructions, a compromise is to
  68 add shift-and-add.  Replaces a pair of explicit instructions in hot-loops.
  69
  70 **Notes and Observations**:
  71
  72 1. `shadd` and `shadduw` operate on unsigned integers.
  73 2. `shadduw` is intended for performing address offsets,
  74     as the second operand is constrained to lower 32-bits
  75     and zero-extended.
  76 3. All three are 2-in 1-out instructions.
  77 4. shift-add operations are present in both x86 and aarch64,
  78     since they are useful for both general arithmetic and for
  79     computing addresses even when not immediately followed
  80     with a load/store.
  81 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
  82     to use `int` for array indexing. for additional details see
  83     <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
  84
  85 **Changes**
  86
  87 Add the following entries to:
  88
  89 * the Appendices of Book I
  90 * Instructions of Book I added to Section 3.3.14.2
  91
  92 ----------------
  93
  94 \newpage{}
  95
  96 # Table of LD/ST-Indexed-Shift
  97
  98 The following demonstrates the alternative instructions that could
  99 be considered to be added. They are all 9-bit XO which is not hugely
 100 costly.  The totals are
 101
 102 * 12 Load Indexed Shifted (with Update)
 103 * 3 Load Indexed Shifted Byte-reverse
 104 * 8 Store Indexed Shifted (with Update)
 105 * 3 Store Indexed Shifted Byte-reverse
 106 * 6 Floating-Point Load Indexed Shifted (with Update)
 107 * 6 Floating-Point Store Indexed Shifted (with Update)
 108
 109 Total count: 38 new 9-bit XO instructions, for an approximate total
 110 XO cost of 3 bits within a single Primary Opcode.  With the savings
 111 that these instructions represent in hot-loops, as evidenced by their
 112 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
 113 justifiable.  However there is no point in placing these in EXT2xx, they
 114 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
 115 reduction in binary size is not achieved.
 116
 117 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction          |
 118 |-------|------|-------|-------|-------|-------|----------------------|
 119 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzsx RT,RA,RB,sm    |
 120 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzusx RT,RA,RB,sm   |
 121 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzsx RT,RA,RB,sm    |
 122 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzusx RT,RA,RB,sm   |
 123 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhasx RT,RA,RB,sm    |
 124 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhausx RT,RA,RB,sm   |
 125 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzsx RT,RA,RB,sm    |
 126 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzusx RT,RA,RB,sm   |
 127 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwasx RT,RA,RB,sm    |
 128 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwausx RT,RA,RB,sm   |
 129 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldsx RT,RA,RB,sm     |
 130 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldusx RT,RA,RB,sm    |
 131 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhbrsx RT,RA,RB,sm   |
 132 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwbrsx RT,RA,RB,sm   |
 133 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldbrsx RT,RA,RB,sm   |
 134 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbsx RS,RA,RB,sm    |
 135 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbusx RS,RA,RB,sm   |
 136 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthsx RS,RA,RB,sm    |
 137 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthusx RS,RA,RB,sm   |
 138 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwsx RS,RA,RB,sm    |
 139 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwusx RS,RA,RB,sm   |
 140 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdsx RS,RA,RB,sm    |
 141 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdusx RS,RA,RB,sm   |
 142 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthbrsx RS,RA,RB,sm  |
 143 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwbrsx RS,RA,RB,sm  |
 144 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdbrsx RS,RA,RB,sm  |
 145 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsxs FRT,RA,RB,sm   |
 146 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsuxs FRT,RA,RB,sm  |
 147 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdxs FRT,RA,RB,sm   |
 148 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfduxs FRT,RA,RB,sm  |
 149 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwaxs FRT,RA,RB,sm |
 150 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwzxs FRT,RA,RB,sm |
 151 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsxs FRS,RA,RB,sm  |
 152 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsuxs FRS,RA,RB,sm |
 153 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdxs FRS,RA,RB,sm  |
 154 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfduxs FRS,RA,RB,sm |
 155 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfiwxs FRS,RA,RB,sm |
 156
 157 ----------------
 158
 159 \newpage{}
 160
 161 # Shift-and-Add
 162
 163 `shadd RT, RA, RB`
 164
 165 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 166 |-------|------|-------|-------|-------|-------|----|----------|
 167 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 168
 169 Pseudocode:
 170
 171 ```
 172     shift <- sm + 1                     # Shift is between 1-4
 173     sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
 174     RT <- sum                           # Result stored in RT
 175 ```
 176
 177 When `sm` is zero, the contents of register RB are multiplied by 2,
 178 added to the contents of register RA, and the result stored in RT.
 179
 180 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 181
 182 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 183
 184 **NEED EXAMPLES (not sure how to embed sm)!!!**
 185 Examples:
 186
 187 ```
 188     # adds r1 to (r2*8)
 189     shadd r4, r1, r2, 3
 190 ```
 191
 192 # Shift-and-Add Signed Word
 193
 194 `shaddw RT, RA, RB`
 195
 196 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 197 |-------|------|-------|-------|-------|-------|----|----------|
 198 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 199
 200 Pseudocode:
 201
 202 ```
 203     shift <- sm + 1                  # Shift is between 1-4
 204     n <- EXTS64((RB)[32:63])         # Only use lower 32-bits of RB
 205     sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
 206     RT <- sum                        # Result stored in RT
 207 ```
 208
 209 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 210 added to the contents of register RA, and the result stored in RT.
 211
 212 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 213
 214 Operands RA and RB, and the result RT are all 64-bit, signed integers.
 215
 216 *Programmer's Note:
 217 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 218 address. RB is the offset into data structure limited to 32-bit.*
 219
 220 Examples:
 221
 222 ```
 223 #
 224 shaddw r4, r1, r2
 225 ```
 226
 227 [[!tag opf_rfc]]
 228
 229 # Shift-and-Add Unsigned Word
 230
 231 `shadduw RT, RA, RB`
 232
 233 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 234 |-------|------|-------|-------|-------|-------|----|----------|
 235 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 236
 237 Pseudocode:
 238
 239 ```
 240     shift <- sm + 1                  # Shift is between 1-4
 241     n <- (RB)[32:63]                 # Only use lower 32-bits of RB
 242     sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
 243     RT <- sum                        # Result stored in RT
 244 ```
 245
 246 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 247 added to the contents of register RA, and the result stored in RT.
 248
 249 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 250
 251 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 252
 253 *Programmer's Note:
 254 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 255 address. RB is the offset into data structure limited to 32-bit.*
 256
 257 Examples:
 258
 259 ```
 260 #
 261 shadduw r4, r1, r2
 262 ```
 263
 264 [[!tag opf_rfc]]
 265
 266 # Appendices
 267
 268     Appendix E Power ISA sorted by opcode
 269     Appendix F Power ISA sorted by version
 270     Appendix G Power ISA sorted by Compliancy Subset
 271     Appendix H Power ISA sorted by mnemonic
 272
 273 | Form | Book | Page | Version | mnemonic | Description |
 274 |------|------|------|---------|----------|-------------|
 275 | Z23  | I    | #    | 3.0B    | shadd    | Shift-and-Add |
 276 | Z23  | I    | #    | 3.0B    | shaddw   | Shift-and-Add Signed Word |
 277 | Z23  | I    | #    | 3.0B    | shadduw  | Shift-and-Add Unsigned Word |