2021-02-20

	* GCC: Lowering DWARF_FRAME_REGISTERS, once I rebuilt the
	compiler, libgcc, and the test program, avoided the problem.  That
	didn't make much sense, so I reversed that change and got back to
	debugging.  The signal frame seemed to be unwound correctly, but
	instead of using the linux-unwind fallback frame stuff, that I'd
	messed with a week before, I noticed it was using frame info from
	the __sigtramp64rt (sp?) entry point in the kernel-supplied vdso.
	Though I'm pretty sure that changing that file got me some
	different results the week before, with vdso it couldn't possibly
	be where things got wrong.  So I proceeded to unwinding the frame
	until we hit the caller of the infinitely-recursive function, and
	found we got to the end of the stack before reaching it.  Huh?  A
	GDB stack frame also hit the same problem.  Oh, maybe there was
	something wrong with the frame info for those early calls in the
	thread.  But the stack frame only stopped at the third or fourth
	recursive call.  That seemed fishy, so I started the program over,
	and checked the stack trace at the point of the signal delivery,
	and found it was fine.  I stepped into the signal handler, and
	into the exception raising machinery, and it was still fine.  Only
	after we started the unwinding did it get corrupted.  At first I
	suspected something going wrong because of out-of-range accesses
	to the regs array, recompiled compiler and library and program
	just to be sure, and still the same issue.  Finally, then it
	occurred to me to check where the alternate stack stack, in which
	the stack overflow signal was handled, and found it to be running
	into the other end of the task's stack.  Turns out the Ada
	runtime, when starting a task, allocates an alt stack to handle
	stack overflows out of the stack itself.  With the larger register
	file, unwinding was taking up more of the alt stack space,
	overflowing it and thus overwriting part of the task's call stack,
	corrupting it to the point that the unwinder could no longer reach
	the exception handler in the task setup code, supposed to catch an
	escaping exception for the task parent to analyze/reraise.
	Growing the alt stack size in the Ada runtime fixes the problem,
	but since this explains why lowering DWARF_FRAME_REGISTERS avoided
	the problem, I'm now happy to have it set to the lower value, at
	least until call-saved SVP64 regs are needed.  Adjusted other
	references to ARG_POINTER_REGNUM in libgcc to use a fixed index.
	Wrote a blog post about this, while regstrapping the fix.
	https://www.fsfla.org/blogs/lxo/2021-02-20-longest-debugging-session.en.html
	Success, no regressions.  (9:09)

2021-02-13

	* GCC: Found libgcc/config/linux-unwind.h using GCC's internal
	register numbers, and thus in need of renumbering as well.  Alas,
	the right fix didn't jump at me.  There's some confusion about
	using mapped register numbers or not.  Using the pristine
	libgcc_eh.a to link the program built with the new compiler, using
	newly-built libraries, it works, but with the new libgcc_eh.a, it
	fails, whether using 291 or 99 or 67 for R_AR, that used to be
	ARG_POINTER_REGNUM.  Changing R_AR and rebuilding doesn't alter
	anything within gcc/ada, so it's not the Ada runtime.  I guess
	I may have to go back to debugging, as it's not clear whether GCC
	is losing track of the frames or not finding the handler that
	would propagate the EH to the thread that activated the task.
	Tried experimenting with overriding DWARF_FRAME_REGISTERS to its
	original value.  (6:13)

2021-02-10

	* MW (0:48)

2021-02-09

	* VC (1:59)

2021-02-05

	* GCC: Started investigating the remaining regressions, all in
	Ada.  They all turn out to be -fstack-check tests.  (0:40)

2021-01-31

	* GCC: Started debugging regressions in the stage1 non-svp64
	compiler.  Noticed that the renaming of mov to @altivec_mov
	removed expanders for some modes used by altivec but not by svp64.
	Reintroduced them, and added floating-point svp64 mov patterns.
	Split out of the main patch a preparation patch that could be
	submitted upstream right away, for it just prepares for register
	renumbering.  Fixed conditional register usage that, when svp64
	was not enabled, caused the LAST/MAX_* non-SVP64 registers to be
	marked as fixed, which caused the frame pointer to not be
	preserved across calls.  Fixed the *logue routines to account for
	the register renumbering.  Fixed the svp64 add expander to use a
	correct expansion of <VI_unit>, covering V2DI at power8.  With
	that, we're down to a single C regression, when not enabling
	svp64.  While the expected behavior is for the compiler to
	optimize gcc.target/powerpc/dform-3.c's gpr function so that p->c
	is (re)loaded to e.g. r10&r11 with a vsx_movv2df_64bit, because
	the MEM cost for its reload is negative, whereas the
	svp64-modified compiler keeps both such instructions, first
	loading to a VSX reg, then splitting it into a pair of GPRs.  It
	is a performance bug, but the generated code works.  Trying a
	bootstrap!  Stage2 wouldn't build because of /* within comments in
	rs6000-modes.def; adjusted the commented-out entries I'd put in to
	avoid that.  The memory move costs were off because of the use of
	literal regno 32 when computing the costs for FPR classes.
	Regstrapped the prepping patch successfully, then went back to the
	patch that introduces svp64 support, still disabled.  -msvp64 is
	still slightly, but without enabling it, we may be down to no
	regressions, testing should confirm.  (14:37)

2021-01-30

	* GCC: Fixed the boundaries of the loops that disable SVP64
	registers when SVP64 is not enabled.  Fixed macros used for
	parameter and return value assignment to reflect the new FP
	numbers.  Require at least one register operand for svp64 vector
	mov pattern.  Add emit of altivec insn when not using svp64.
	Introduced a first svp64 reload change for preferred_reload, to
	avoid trying to reload constants into altivec registers.  A lot
	more work will be needed for svp64 reloads.  A non-svp64 native
	compiler builds stage1, but the compiler is still pretty broken,
	with thousands of regressions.  A svp one builds stage1 and fails
	in libgcc, with a bunch of asm failures because of (unsupported)
	sv.* opcodes, and one reload failure in decContext, that I started
	investigating.  (8:22)

2021-01-27

	* µW (1:10)

2021-01-26

	* VCoffee (1:41)

2021-01-24

	* GCC: Introduced vector modes, registers, classes, constraints,
	renumbered and remapped registers, went over literals referring to
	register numbers, and started implementation of move/load/store
	and add for the V*DI integral types.  Still have to test that the
	compiler still works after the renumbering.  The new insns are not
	generated yet, I haven't made the new registers usable for
	anything yet.  (12:13)

2021-01-22

	* 578: Specifying and debating the task with luke and, later,
	jacob.  Difficulties in conveying the requirements and overcoming
	the complexities involved in figuring out how to parse each asm
	operand in Python, underspecification of the input language,
	disagreement as to the complexity and the amount of work required
	to duplicate existing binutils functionality in python, and then
	duplicate this work one more time into binutils later, led Luke to
	take it upon himself.
	* 579: Talked to Jacob a bit about potential implementation
	strategies.  The need to build an immediate constant to use as the
	operand to .long/svp64 makes for plenty of complexity, even in
	C++.  I'm again unhappy with a plan that involves so much
	intentional waste of effort.  I'm also very surprised with the
	estimated amount of work involved in this task, compared with
	578, that is a much bigger one with all the rewriting of an asm
	parser, and likely more rewriting as the extended asm syntax
	evolves.  And thus pretty much a full workday ends up wasted,
	most of it complaining about planning to waste work.  (8:29)

2021-01-19

	* Virtual Coffe (1:39)

2021-01-13

	* Microwatts meeting (1:08)

2021-01-07

	* 572: New, split out of 570, on what .[sv], elwidth, subvl
	affect in load/store ops: the address [vector] or the in-memory
	[vector]?

2021-01-06

	* 570: New.  It's not specified whether selection of elwidth
	sub-dword bytes get byte-reversed into LE before or after the
	selection.  The specs say we convert loaded words to LE as quickly
	as possible, so that all internal operations are LE, but this
	would lead to reversal of sub-register vector elements when
	loading, even when using svp64 loads with the correct elwidth_src.
	* 569: New.  Also concerned about how to get bit arrays properly
	loaded into predicate registers so that the *bits* are reversed to
	match LE requirements.
	* 568: New.  After gotting clarification from Jacob about setvl's
	behavior: VL gets set to MIN(VL, MAXVL), you can count on its not
	being a smaller value.  This is documented only in pseudocode, it
	could be made more self-evident.  (3:13)

2021-01-05

	* 567: Cesar filed it for me; I clarified it a bit further.

2021-01-04

	* 560: Tried to show I understand the effects of loads and
	byte-swapping loads in both endiannesses, and restated my
	suggestion of iteration order matching the natural memory layout
	of arrays/vectors.  (1:46)

2021-01-03

	* 560: Pointed out the circular reasoning in assuming LE in
	showing it works for LE and BE, stated the problem with BE and how
	the current BE status is incompatible with both PPC vectors and
	with how svp64 vectors are said to be expected to work.
	Recommended ruling BE out entirely for now, if the approach is to
	not look into the problems, this will result in broken,
	self-inconsistent specs that we'll either have to discontinue or
	carry indefinitely.
	* 558: Looked at the riscv implementation, particularly commit
	4922a0bed80f8fa1b7d507eee6f94fb9c34bfc32, the testcases in
	299ed9a4eaa569a5fc2b2291911ebf55318e85e4, and the reduction of
	redundant setvli in e71a47e3cd553cec24afbc752df2dc2214dd3850, and
	5fa22b2687b1f6ca1558fb487fc07e83ffe95707 that enables vl to not be
	a power of two.
	* 560: Wrote up about significance, ordering, endianness and such
	conventions.  (6:21)

2020-12-30

	* 559: Luke split out the issue of whether we should we have
	automatic detection and reversal of direction of vectors, so that
	they always behave as if parallel, even if implemented as
	sequential.  Jacob pointed out that reversal is not enough for
	some 3-operand cases.
	* SVP64: Second review call.
	* 562: Filed, on elwidth encoding.
	* 558: Raised the need for the compiler to be able to save and
	restore VL, if it's exposed separately from maxvl; also brought up
	calling conventions.
	* 560: Commented on potential endianness issue: identity of
	register as scalar and of first element of vector starting at that
	register.  More questions on issues that arise in big endian mode,
	and compatibility we may wish to aim for.  Some difficulties in
	getting as much as a conversation going on endianness-influenced
	sub-register iteration order; presented a simple scenario that
	demonstrates the fundamental programming problems that will arise
	out of favoring LE as we seem to.
	* 558: Explained why disregarding things the compiler will do on
	its own and arguing it shouldn't do that doesn't make the initial
	project simpler, but harder, and also more fragile and likely to
	be throw-away code in the end.  Argued for in favor of seeing
	where we want to get to in the end, and then mapping out what it
	takes to get features we want for the first stage so that it's a
	step in the general direction of the end goal.  (6:43)

2020-12-28

	* 558: Commented on vector modes, insns, regalloc, scheduling,
	auto vectorization, instrinsics, and the possibilities of vector
	length and component modes as parameters to template insns and
	instrinsics, and of mechanic generation thereof.  (2:22)

2020-12-26

	* SVP64: Reviewed overview and proposed encoding, posted more
	questions.  (2:30)

2020-12-25

	* Email backlog.
	* SVP64: More studying, more making sense.  Asked about
	parallelism vs dependencies.  (3:02)

	* 550: Implemented the first cut at svp64 prefix in the assembler,
	namely, a 32-bit pseudo-insn that takes a 24-bit immediate
	operand, encoding it as an insn with EXT01 as the major opcode,
	MSB0 bits 7 and 9 also set, and the top two bits of the immediate
	shuffled into bits 6 and 8.  Added patch to bugzill and to the
	wiki.  Updated status.  (1:41)

2020-12-23

	* SVP64: Review meeting.
	* 555: Reduce flag/s for fma.  Commented on the possibilities.
	(1:26)

2020-12-20

	* 532: Implemented logic for mode-switching 32-bit insns with 6
	bits for the opcode, a 16-bit embedded compressed insn, and 10
	bits corresponding to subsequent insns, to tell whether or not
	each of them is compressed.  This nearly doubled the compression
	rate, using one such mode-switching insn per 3 compressed insns.
	(1:48)

2020-12-14

	* 532: Reported on compression ratio findings and analyses.
	(1:06)

2020-12-13

	* 532: Questioned some bullets under 16-imm opcodes.  Implemented
	condition register and system opcodes, 16-imm opcodes, extended
	load and store to cover 16-imm modes, condition bit expression
	parsing and finally bc 16-imm and bclr 10- and 16-bit opcodes.
	Tested a bit by visual inspection, introduced logic to backtrack
	into 32-bit and count such pairs as 10-bit nop + 16-imm insn,
	followed by 32-bit.  Fixed size estimation: count[2] was still
	counted as 16+16-imm, rather than a single 16-imm.  (5:30)

2020-12-06

	* 532: Adjusted the logic in comp16-v1-skel.py for 16-bit 16-imm
	rather than the 16+16 I'd invented.  Implemented the most relevant
	opcodes for 10-bit, and many of the 16-bit ones too.  Not yet
	implemented are conditional branches, Immediate, CR and System
	opcodes.  With all of nop, unconditional branch, ld/st,
	arithmetic, logical and floating-point, we get less than 3%
	compression in GCC, with not-entirely-unreasonable reg subsets.
	It's not looking good.  (8:27)

2020-12-02

	* Microwatts meeting.
	* 238: Added some thoghts on bl and blr, and implications about
	modes.  Also detailed my worries about how to preserve dynamic
	state, specifically switch-back-to-compressed-after-insn, across
	interrupts.  (1:44)

2020-11-30

	* 238: Settled the N-without-M issue, it was likely an error in
	the tables.  Raised an inconsistency in decoder pseudocode's
	reversal of M and N.  Returned to the uncertainty and need for
	specifying how to handle conflicts between
	standard-then-compressed followed by 10-bit with M=0.  Raised
	issue of missing documentation that branch targets are always
	uncompressed, not just 32-bit aligned.  Raised issue of the
	purpose of M and N bits, particularly in unconditional branches.
	Explained why I believe phase 1 decoder hsa to look at Cmaj.m bits
	to tell whether or not N is there, brought crnand and crand
	encodings as example, and asked whether crand with M=0 should
	switch to 32-bit mode for only one insn, because the bit that
	usually holds N=1, or permanently, because there's no N field in
	the applicable encoding.  (2:33)

	* 238: Detailed the motivations for my proposal of bit-shuffling
	in the 16-bit encoding, to reduce wires and selections in the
	realigning muxer.  Restated my question on N without M as I can't
	relate the answer with the question, it appears to have been
	misunderstood.  Further expanded on the advantages of moving the
	Cmaj.m and M bits as suggested, even going as far as enabling an
	extended compressed opcode reusing the bit that signals a match
	for a 10-bit insn in uncompressed mode.  (3:29)

2020-11-29

	* 238: Noted some apparent contradictions in the rejection of
	extended 16-bit insns in the face of 16+16-bit insns.  Luke hit me
	with clarification that there's no such thing as a 16+16-bit insn
	in compressed mode, and I could see how I'd totally made it up by
	myself by reviewing the proposal.  Hit and asked other questions:
	what's the N for when there's no M, and what are the SV prefixes
	mentioned there, now that I no longer assume them to be something
	like extend-next.  Then I recorded some thoughts on minimizing the
	bits the muxer has to look into by making the bits that encode N,
	Cmaj.m and M onto the same bits that, in traditional mode, encode
	the primary opcode.  Finally, I was hit by the realization that,
	if we change the perspective from "uncompressed insns used to be
	32-bit only" to "uncompressed can be 32- or 16-bit depending on
	the opcode", on account of the 10-bit insns, the need for taking
	the opcode into account to tell whether we're looking at a 16- or
	32-bit insn, so why is it ok there, but not ok in compressed mode?
	Finally, I propose an encoding scheme that encodes lengths of
	subsequent insns in an early insn, achieving more coverage for
	16-bit insns, better limit compression, far more flexible mode
	switching, enabling savings at far more sparse settings, and
	without eating up a pair of primary opcodes: the 32-bit
	mode-switching insn could even be an extended opcode, though it
	would probably not have as many pre-length encoding bits then.  It
	would fit an entire 16-bit insn, which could do useful work, or
	queue up further pre-length bits, that correspond to static
	upcoming insns and tell whether to decode them as 32-bit or as
	(pairs of?) 16-bit ones.  Compared max ratio, representation
	overhead, and break-even density.  Shared some more thoughts on
	48- and 64-bit insns.  (7:39)

	* 532: Got a little confused about some encodings; it's not clear
	whether the N and M bits in 16-bit instructions have uniform
	interpretation, or whether some proposed opcodes are repurposing
	them.  I'm surprised with such short immediate operands in the
	immediate instructions, if they don't get a 16-bit extension, or
	otherwise with the apparent requirement for an extended 16-bit
	immediate for something as simple as an mr encoded as addi.  Asked
	for clarification.  Not sure about how to proceed before I get it;
	the logic of the estimator would be too significantly impacted.
	(2:48)

2020-11-28

	* 532: Figured out and implemented the logic to infer mode
	switching for best compression under attempt 1 proposed encoding,
	namely with 10-bit insns, 16-bit insns, 16+16-bit insns, and
	32-bit insns.  10-bit insns appear in uncompressed mode, and can
	be followed by insns in either mode; 16-bit ones appear in
	compressed mode, and can remain in compressed mode, or switch to
	uncomprssed mode for 1 insn or for good; 16+16-bit ones appear in
	compressed mode, and cannot switch modes; 32-bit ones appear only
	in uncompressed mode, or in the single-insn slot after a 16-bit
	that requests it.  If we find a 16-bit insn while we're in
	uncompressed mode, use a 10-bit nop to tentatively switch.  Insns
	that can be encoded in 10-bits, but appear in compressed mode, had
	better be encoded in 16-bits, for that offers further subsequent
	encoding options, without downsides for size estimation.  Insns
	that can be encoded as 16+16-bit decay to 32-bit if in
	uncompressed mode, or if, after a sequence thereof, a later insn
	forces a switch to 32-bit mode without an intervening switching
	insn.  Still missing: the code to select what insns can be encoded
	in what modes.  (6:42)

	* 532: Implemented a skeleton for compression ratio estimation,
	initially with the simpler mode switching of the 8-bit nop,
	odd-address 16-bit insns.  Next, rewrite it for all the complexity
	of mode switching envisioned for the "attempt 1" proposal.  (2:02)

2020-11-23

	* 238: Debating various possibilities of 16-bit encoding.  (5:20)

	* 532: Wrote a histogram python script, that breaks counts down
	per opcode, and within them, by operands.  (2:05)

2020-11-22

	* 529: Brought up the possibilities of using 8-bit nops to switch
	between modes, so that 16-bit insns would be at odd addresses, so
	that we could use the full 16-bits; of using 2-operand insns
	instead of 3- for 16-bit mode so as to increase the coverage of
	the compact encoding.
	* 238: Luke moved the comment above here, where it belonged.
	* 529: Elaborated how using actual odd-addresses for 16-bit insns
	would be dealt with WRT endianness.  Prompted by luke, added it to
	the wiki.
	* Wiki: Added self to team.  (11:50)

2020-11-21

	* 532: Wrote patch for binutils to print insn histogram.
	* Mission: Restated the proposal of adding "and users" to the
	mission statement, next to customers, as those we wish to enable
	to trust our products.  (6:48)

2020-11-20

	Reposted join message to the correct list.
	* 238: Started looking into it, from
	https://libre-soc.org/openpower/sv/16_bit_compressed/

2020-11-19

	Joined.