openpower/sv/rfc/ls004.mdwn

   1 # RFC ls004  Shift-And-Add
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
   7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
  10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  11
  12 **Severity**: Major
  13
  14 **Status**: New
  15
  16 **Date**: 31 Oct 2022
  17
  18 **Target**: v3.2B
  19
  20 **Source**: v3.0B
  21
  22 **Books and Section affected**:
  23
  24 ```
  25     Book I Fixed-Point Shift Instructions 3.3.14.2
  26     Appendix E Power ISA sorted by opcode
  27     Appendix F Power ISA sorted by version
  28     Appendix G Power ISA sorted by Compliancy Subset
  29     Appendix H Power ISA sorted by mnemonic
  30 ```
  31
  32 **Summary**
  33
  34 ```
  35     Instructions added
  36     shadd - Shift and Add
  37     shadduw - Shift and Add Unsigned Word
  38 ```
  39
  40 **Submitter**: Luke Leighton (Libre-SOC)
  41
  42 **Requester**: Libre-SOC
  43
  44 **Impact on processor**:
  45
  46 ```
  47     Addition of two new GPR-based instructions
  48 ```
  49
  50 **Impact on software**:
  51
  52 ```
  53     Requires support for new instructions in assembler, debuggers,
  54     and related tools.
  55 ```
  56
  57 **Keywords**:
  58
  59 ```
  60     GPR, Big-manip, Shift, Arithmetic
  61 ```
  62
  63 **Motivation**
  64
  65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
  66 and x86.  Adding more LD/ST is thirty eight instructions, a compromise is to
  67 add shift-and-add.  Replaces a pair of explicit instructions in hot-loops.
  68
  69 **Notes and Observations**:
  70
  71 1. `shadd` and `shadduw` operate on unsigned integers.
  72 2. `shadduw` is intended for performing address offsets,
  73    as the second operand is constrained to lower 32-bits
  74    and zero-extended.
  75 3. Both are 2-in 1-out instructions.
  76
  77 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
  78 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  79
  80 **Changes**
  81
  82 Add the following entries to:
  83
  84 * the Appendices of Book I
  85 * Instructions of Book I added to Section 3.3.14.2
  86
  87 ----------------
  88
  89 \newpage{}
  90
  91 # Table of LD/ST-Indexed-Shift
  92
  93 The following demonstrates the alternative instructions that could
  94 be considered to be added. They are all 9-bit XO which is not hugely
  95 costly.  The totals are
  96
  97 * 12 Load Indexed Shifted (with Update)
  98 * 3 Load Indexed Shifted Byte-reverse
  99 * 8 Store Indexed Shifted (with Update)
 100 * 3 Store Indexed Shifted Byte-reverse
 101 * 6 Floating-Point Load Indexed Shifted (with Update)
 102 * 6 Floating-Point Store Indexed Shifted (with Update)
 103
 104 Total count: 38 new 9-bit XO instructions, for an approximate total
 105 XO cost of 3 bits within a single Primary Opcode.  With the savings
 106 that these instructions represent in hot-loops, as evidenced by their
 107 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
 108 justifiable.  However there is no point in placing these in EXT2xx, they
 109 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
 110 reduction in binary size is not achieved.
 111
 112 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction     |
 113 |-------|------|-------|-------|-------|-------|----------|
 114 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzsx RT,RA,RB,sm  |
 115 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzusx RT,RA,RB,sm  |
 116 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzsx RT,RA,RB,sm  |
 117 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzusx RT,RA,RB,sm  |
 118 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhasx RT,RA,RB,sm  |
 119 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhausx RT,RA,RB,sm  |
 120 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzsx RT,RA,RB,sm  |
 121 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzusx RT,RA,RB,sm  |
 122 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwasx RT,RA,RB,sm  |
 123 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwausx RT,RA,RB,sm  |
 124 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldsx RT,RA,RB,sm   |
 125 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldusx RT,RA,RB,sm   |
 126 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhbrsx RT,RA,RB,sm   |
 127 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwbrsx RT,RA,RB,sm   |
 128 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldbrsx RT,RA,RB,sm   |
 129 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbus RS,RA,RB,sm   |
 130 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbusx RS,RA,RB,sm   |
 131 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthsx RS,RA,RB,sm   |
 132 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthusx RS,RA,RB,sm   |
 133 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwsx RS,RA,RB,sm   |
 134 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwusx RS,RA,RB,sm   |
 135 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdsx RS,RA,RB,sm   |
 136 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdusx RS,RA,RB,sm   |
 137 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthbrsx RS,RA,RB,sm   |
 138 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwbrsx RS,RA,RB,sm   |
 139 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdbrsx RS,RA,RB,sm   |
 140 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsxs FRT,RA,RB,sm  |
 141 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsuxs FRT,RA,RB,sm  |
 142 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdxs FRT,RA,RB,sm  |
 143 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfduxs FRT,RA,RB,sm  |
 144 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwaxs FRT,RA,RB,sm  |
 145 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwzxs FRT,RA,RB,sm  |
 146 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsxs FRS,RA,RB,sm   |
 147 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsuxs FRS,RA,RB,sm   |
 148 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdxs FRS,RA,RB,sm   |
 149 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfduxs FRS,RA,RB,sm   |
 150 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfiwxs FRS,RA,RB,sm   |
 151
 152 ----------------
 153
 154 \newpage{}
 155
 156 # Shift-and-Add
 157
 158 `shadd RT, RA, RB`
 159
 160 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 161 |-------|------|-------|-------|-------|-------|----|----------|
 162 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 163
 164 Pseudocode:
 165
 166 ```
 167     shift <- sm + 1                                 # Shift is between 1-4
 168     sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
 169     RT <- sum                                       # Result stored in RT
 170 ```
 171
 172 When `sm` is zero, the contents of register RB are multiplied by 2,
 173 added to the contents of register RA, and the result stored in RT.
 174
 175 `sm` is a 2-bit bitfield, and allows multiplication of RB by 2, 4, 8, 16.
 176
 177 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 178
 179 **NEED EXAMPLES (not sure how to embedd sm)!!!**
 180 Examples:
 181
 182 ```
 183     # adds r1 to (r2*8)
 184     shadd r4, r1, r2, 3
 185 ```
 186
 187 # Shift-and-Add Unsigned Word
 188
 189 `shadd RT, RA, RB`
 190
 191 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 192 |-------|------|-------|-------|-------|-------|----|----------|
 193 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 194
 195 Pseudocode:
 196
 197 ```
 198     shift <- sm + 1                                     # Shift is between 1-4
 199     n <- (RB)[32:63]                            # Only use lower 32-bits of RB
 200     sum[0:63] <- (n << shift) + (RA)    # Shift n, add RA
 201     RT <- sum                                       # Result stored in RT
 202 ```
 203
 204 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 205 added to the contents of register RA, and the result stored in RT.
 206
 207 `sm` is a 2-bit bitfield, and allows multiplication of RB by 2, 4, 8, 16.
 208
 209 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 210
 211 *Programmer's Note:
 212 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 213 address. RB is the offset into data structure limited to 32-bit.*
 214
 215 Examples:
 216
 217 ```
 218 #
 219 shadduw r4, r1, r2
 220 ```
 221
 222
 223 [[!tag opf_rfc]]
 224
 225 # Appendices
 226
 227     Appendix E Power ISA sorted by opcode
 228     Appendix F Power ISA sorted by version
 229     Appendix G Power ISA sorted by Compliancy Subset
 230     Appendix H Power ISA sorted by mnemonic
 231
 232 | Form | Book | Page | Version | mnemonic | Description |
 233 |------|------|------|---------|----------|-------------|
 234 | Z23  | I    | #    | 3.0B    | shadd    | Shift-and-Add |
 235 | Z23  | I    | #    | 3.0B    | shadduw  | Shift-and-Add Unsigned Word |
 236