lxo/ChangeLog

   1 2021-01-24
   2
   3         * GCC: Introduced vector modes, registers, classes, constraints,
   4         renumbered and remapped registers, went over literals referring to
   5         register numbers, and started implementation of move/load/store
   6         and add for the V*DI integral types.  Still have to test that the
   7         compiler still works after the renumbering.  The new insns are not
   8         generated yet, I haven't made the new registers usable for
   9         anything yet.  (12:13)
  10
  11 2021-01-22
  12
  13         * 578: Specifying and debating the task with luke and, later,
  14         jacob.  Difficulties in conveying the requirements and overcoming
  15         the complexities involved in figuring out how to parse each asm
  16         operand in Python, underspecification of the input language,
  17         disagreement as to the complexity and the amount of work required
  18         to duplicate existing binutils functionality in python, and then
  19         duplicate this work one more time into binutils later, led Luke to
  20         take it upon himself.
  21         * 579: Talked to Jacob a bit about potential implementation
  22         strategies.  The need to build an immediate constant to use as the
  23         operand to .long/svp64 makes for plenty of complexity, even in
  24         C++.  I'm again unhappy with a plan that involves so much
  25         intentional waste of effort.  I'm also very surprised with the
  26         estimated amount of work involved in this task, compared with
  27         578, that is a much bigger one with all the rewriting of an asm
  28         parser, and likely more rewriting as the extended asm syntax
  29         evolves.  And thus pretty much a full workday ends up wasted,
  30         most of it complaining about planning to waste work.  (8:29)
  31
  32 2021-01-19
  33
  34         * Virtual Coffe (1:39)
  35
  36 2021-01-13
  37
  38         * Microwatts meeting (1:08)
  39
  40 2021-01-07
  41
  42         * 572: New, split out of 570, on what .[sv], elwidth, subvl
  43         affect in load/store ops: the address [vector] or the in-memory
  44         [vector]?
  45
  46 2021-01-06
  47
  48         * 570: New.  It's not specified whether selection of elwidth
  49         sub-dword bytes get byte-reversed into LE before or after the
  50         selection.  The specs say we convert loaded words to LE as quickly
  51         as possible, so that all internal operations are LE, but this
  52         would lead to reversal of sub-register vector elements when
  53         loading, even when using svp64 loads with the correct elwidth_src.
  54         * 569: New.  Also concerned about how to get bit arrays properly
  55         loaded into predicate registers so that the *bits* are reversed to
  56         match LE requirements.
  57         * 568: New.  After gotting clarification from Jacob about setvl's
  58         behavior: VL gets set to MIN(VL, MAXVL), you can count on its not
  59         being a smaller value.  This is documented only in pseudocode, it
  60         could be made more self-evident.  (3:13)
  61
  62 2021-01-05
  63
  64         * 567: Cesar filed it for me; I clarified it a bit further.
  65
  66 2021-01-04
  67
  68         * 560: Tried to show I understand the effects of loads and
  69         byte-swapping loads in both endiannesses, and restated my
  70         suggestion of iteration order matching the natural memory layout
  71         of arrays/vectors.  (1:46)
  72
  73 2021-01-03
  74
  75         * 560: Pointed out the circular reasoning in assuming LE in
  76         showing it works for LE and BE, stated the problem with BE and how
  77         the current BE status is incompatible with both PPC vectors and
  78         with how svp64 vectors are said to be expected to work.
  79         Recommended ruling BE out entirely for now, if the approach is to
  80         not look into the problems, this will result in broken,
  81         self-inconsistent specs that we'll either have to discontinue or
  82         carry indefinitely.
  83         * 558: Looked at the riscv implementation, particularly commit
  84         4922a0bed80f8fa1b7d507eee6f94fb9c34bfc32, the testcases in
  85         299ed9a4eaa569a5fc2b2291911ebf55318e85e4, and the reduction of
  86         redundant setvli in e71a47e3cd553cec24afbc752df2dc2214dd3850, and
  87         5fa22b2687b1f6ca1558fb487fc07e83ffe95707 that enables vl to not be
  88         a power of two.
  89         * 560: Wrote up about significance, ordering, endianness and such
  90         conventions.  (6:21)
  91
  92 2020-12-30
  93
  94         * 559: Luke split out the issue of whether we should we have
  95         automatic detection and reversal of direction of vectors, so that
  96         they always behave as if parallel, even if implemented as
  97         sequential.  Jacob pointed out that reversal is not enough for
  98         some 3-operand cases.
  99         * SVP64: Second review call.
 100         * 562: Filed, on elwidth encoding.
 101         * 558: Raised the need for the compiler to be able to save and
 102         restore VL, if it's exposed separately from maxvl; also brought up
 103         calling conventions.
 104         * 560: Commented on potential endianness issue: identity of
 105         register as scalar and of first element of vector starting at that
 106         register.  More questions on issues that arise in big endian mode,
 107         and compatibility we may wish to aim for.  Some difficulties in
 108         getting as much as a conversation going on endianness-influenced
 109         sub-register iteration order; presented a simple scenario that
 110         demonstrates the fundamental programming problems that will arise
 111         out of favoring LE as we seem to.
 112         * 558: Explained why disregarding things the compiler will do on
 113         its own and arguing it shouldn't do that doesn't make the initial
 114         project simpler, but harder, and also more fragile and likely to
 115         be throw-away code in the end.  Argued for in favor of seeing
 116         where we want to get to in the end, and then mapping out what it
 117         takes to get features we want for the first stage so that it's a
 118         step in the general direction of the end goal.  (6:43)
 119
 120 2020-12-28
 121
 122         * 558: Commented on vector modes, insns, regalloc, scheduling,
 123         auto vectorization, instrinsics, and the possibilities of vector
 124         length and component modes as parameters to template insns and
 125         instrinsics, and of mechanic generation thereof.  (2:22)
 126
 127 2020-12-26
 128
 129         * SVP64: Reviewed overview and proposed encoding, posted more
 130         questions.  (2:30)
 131
 132 2020-12-25
 133
 134         * Email backlog.
 135         * SVP64: More studying, more making sense.  Asked about
 136         parallelism vs dependencies.  (3:02)
 137
 138         * 550: Implemented the first cut at svp64 prefix in the assembler,
 139         namely, a 32-bit pseudo-insn that takes a 24-bit immediate
 140         operand, encoding it as an insn with EXT01 as the major opcode,
 141         MSB0 bits 7 and 9 also set, and the top two bits of the immediate
 142         shuffled into bits 6 and 8.  Added patch to bugzill and to the
 143         wiki.  Updated status.  (1:41)
 144
 145 2020-12-23
 146
 147         * SVP64: Review meeting.
 148         * 555: Reduce flag/s for fma.  Commented on the possibilities.
 149         (1:26)
 150
 151 2020-12-20
 152
 153         * 532: Implemented logic for mode-switching 32-bit insns with 6
 154         bits for the opcode, a 16-bit embedded compressed insn, and 10
 155         bits corresponding to subsequent insns, to tell whether or not
 156         each of them is compressed.  This nearly doubled the compression
 157         rate, using one such mode-switching insn per 3 compressed insns.
 158         (1:48)
 159
 160 2020-12-14
 161
 162         * 532: Reported on compression ratio findings and analyses.
 163         (1:06)
 164
 165 2020-12-13
 166
 167         * 532: Questioned some bullets under 16-imm opcodes.  Implemented
 168         condition register and system opcodes, 16-imm opcodes, extended
 169         load and store to cover 16-imm modes, condition bit expression
 170         parsing and finally bc 16-imm and bclr 10- and 16-bit opcodes.
 171         Tested a bit by visual inspection, introduced logic to backtrack
 172         into 32-bit and count such pairs as 10-bit nop + 16-imm insn,
 173         followed by 32-bit.  Fixed size estimation: count[2] was still
 174         counted as 16+16-imm, rather than a single 16-imm.  (5:30)
 175
 176 2020-12-06
 177
 178         * 532: Adjusted the logic in comp16-v1-skel.py for 16-bit 16-imm
 179         rather than the 16+16 I'd invented.  Implemented the most relevant
 180         opcodes for 10-bit, and many of the 16-bit ones too.  Not yet
 181         implemented are conditional branches, Immediate, CR and System
 182         opcodes.  With all of nop, unconditional branch, ld/st,
 183         arithmetic, logical and floating-point, we get less than 3%
 184         compression in GCC, with not-entirely-unreasonable reg subsets.
 185         It's not looking good.  (8:27)
 186
 187 2020-12-02
 188
 189         * Microwatts meeting.
 190         * 238: Added some thoghts on bl and blr, and implications about
 191         modes.  Also detailed my worries about how to preserve dynamic
 192         state, specifically switch-back-to-compressed-after-insn, across
 193         interrupts.  (1:44)
 194
 195 2020-11-30
 196
 197         * 238: Settled the N-without-M issue, it was likely an error in
 198         the tables.  Raised an inconsistency in decoder pseudocode's
 199         reversal of M and N.  Returned to the uncertainty and need for
 200         specifying how to handle conflicts between
 201         standard-then-compressed followed by 10-bit with M=0.  Raised
 202         issue of missing documentation that branch targets are always
 203         uncompressed, not just 32-bit aligned.  Raised issue of the
 204         purpose of M and N bits, particularly in unconditional branches.
 205         Explained why I believe phase 1 decoder hsa to look at Cmaj.m bits
 206         to tell whether or not N is there, brought crnand and crand
 207         encodings as example, and asked whether crand with M=0 should
 208         switch to 32-bit mode for only one insn, because the bit that
 209         usually holds N=1, or permanently, because there's no N field in
 210         the applicable encoding.  (2:33)
 211
 212         * 238: Detailed the motivations for my proposal of bit-shuffling
 213         in the 16-bit encoding, to reduce wires and selections in the
 214         realigning muxer.  Restated my question on N without M as I can't
 215         relate the answer with the question, it appears to have been
 216         misunderstood.  Further expanded on the advantages of moving the
 217         Cmaj.m and M bits as suggested, even going as far as enabling an
 218         extended compressed opcode reusing the bit that signals a match
 219         for a 10-bit insn in uncompressed mode.  (3:29)
 220
 221 2020-11-29
 222
 223         * 238: Noted some apparent contradictions in the rejection of
 224         extended 16-bit insns in the face of 16+16-bit insns.  Luke hit me
 225         with clarification that there's no such thing as a 16+16-bit insn
 226         in compressed mode, and I could see how I'd totally made it up by
 227         myself by reviewing the proposal.  Hit and asked other questions:
 228         what's the N for when there's no M, and what are the SV prefixes
 229         mentioned there, now that I no longer assume them to be something
 230         like extend-next.  Then I recorded some thoughts on minimizing the
 231         bits the muxer has to look into by making the bits that encode N,
 232         Cmaj.m and M onto the same bits that, in traditional mode, encode
 233         the primary opcode.  Finally, I was hit by the realization that,
 234         if we change the perspective from "uncompressed insns used to be
 235         32-bit only" to "uncompressed can be 32- or 16-bit depending on
 236         the opcode", on account of the 10-bit insns, the need for taking
 237         the opcode into account to tell whether we're looking at a 16- or
 238         32-bit insn, so why is it ok there, but not ok in compressed mode?
 239         Finally, I propose an encoding scheme that encodes lengths of
 240         subsequent insns in an early insn, achieving more coverage for
 241         16-bit insns, better limit compression, far more flexible mode
 242         switching, enabling savings at far more sparse settings, and
 243         without eating up a pair of primary opcodes: the 32-bit
 244         mode-switching insn could even be an extended opcode, though it
 245         would probably not have as many pre-length encoding bits then.  It
 246         would fit an entire 16-bit insn, which could do useful work, or
 247         queue up further pre-length bits, that correspond to static
 248         upcoming insns and tell whether to decode them as 32-bit or as
 249         (pairs of?) 16-bit ones.  Compared max ratio, representation
 250         overhead, and break-even density.  Shared some more thoughts on
 251         48- and 64-bit insns.  (7:39)
 252
 253         * 532: Got a little confused about some encodings; it's not clear
 254         whether the N and M bits in 16-bit instructions have uniform
 255         interpretation, or whether some proposed opcodes are repurposing
 256         them.  I'm surprised with such short immediate operands in the
 257         immediate instructions, if they don't get a 16-bit extension, or
 258         otherwise with the apparent requirement for an extended 16-bit
 259         immediate for something as simple as an mr encoded as addi.  Asked
 260         for clarification.  Not sure about how to proceed before I get it;
 261         the logic of the estimator would be too significantly impacted.
 262         (2:48)
 263
 264 2020-11-28
 265
 266         * 532: Figured out and implemented the logic to infer mode
 267         switching for best compression under attempt 1 proposed encoding,
 268         namely with 10-bit insns, 16-bit insns, 16+16-bit insns, and
 269         32-bit insns.  10-bit insns appear in uncompressed mode, and can
 270         be followed by insns in either mode; 16-bit ones appear in
 271         compressed mode, and can remain in compressed mode, or switch to
 272         uncomprssed mode for 1 insn or for good; 16+16-bit ones appear in
 273         compressed mode, and cannot switch modes; 32-bit ones appear only
 274         in uncompressed mode, or in the single-insn slot after a 16-bit
 275         that requests it.  If we find a 16-bit insn while we're in
 276         uncompressed mode, use a 10-bit nop to tentatively switch.  Insns
 277         that can be encoded in 10-bits, but appear in compressed mode, had
 278         better be encoded in 16-bits, for that offers further subsequent
 279         encoding options, without downsides for size estimation.  Insns
 280         that can be encoded as 16+16-bit decay to 32-bit if in
 281         uncompressed mode, or if, after a sequence thereof, a later insn
 282         forces a switch to 32-bit mode without an intervening switching
 283         insn.  Still missing: the code to select what insns can be encoded
 284         in what modes.  (6:42)
 285
 286         * 532: Implemented a skeleton for compression ratio estimation,
 287         initially with the simpler mode switching of the 8-bit nop,
 288         odd-address 16-bit insns.  Next, rewrite it for all the complexity
 289         of mode switching envisioned for the "attempt 1" proposal.  (2:02)
 290
 291 2020-11-23
 292
 293         * 238: Debating various possibilities of 16-bit encoding.  (5:20)
 294
 295         * 532: Wrote a histogram python script, that breaks counts down
 296         per opcode, and within them, by operands.  (2:05)
 297
 298 2020-11-22
 299
 300         * 529: Brought up the possibilities of using 8-bit nops to switch
 301         between modes, so that 16-bit insns would be at odd addresses, so
 302         that we could use the full 16-bits; of using 2-operand insns
 303         instead of 3- for 16-bit mode so as to increase the coverage of
 304         the compact encoding.
 305         * 238: Luke moved the comment above here, where it belonged.
 306         * 529: Elaborated how using actual odd-addresses for 16-bit insns
 307         would be dealt with WRT endianness.  Prompted by luke, added it to
 308         the wiki.
 309         * Wiki: Added self to team.  (11:50)
 310
 311 2020-11-21
 312
 313         * 532: Wrote patch for binutils to print insn histogram.
 314         * Mission: Restated the proposal of adding "and users" to the
 315         mission statement, next to customers, as those we wish to enable
 316         to trust our products.  (6:48)
 317
 318 2020-11-20
 319
 320         Reposted join message to the correct list.
 321         * 238: Started looking into it, from
 322         https://libre-soc.org/openpower/sv/16_bit_compressed/
 323
 324 2020-11-19
 325
 326         Joined.