-with SVP64 Horizontal-First, because Horizontal Mode may only
-be applied to a single instruction at a time, and SVP64 is based on
-the principle of strict Program Order even at the element
-level. Thus it becomes
-necessary to add explicit more complex single instructions with
-more operands than would normally be seen in the average RISC ISA
-(3-in, 2-out, in some cases). If it
-was not for Power ISA already having LD/ST with update as well as
-Condition Codes and `lq` this would be hard to justify.
-
-With limited space in the `EXTRA` Field, and Power ISA opcodes
-being only 32 bit, 5 operands is quite an ask. `lq` however sets
-a precedent: `RTp` stands for "RT pair". In other words the result
-is stored in RT and RT+1. For Scalar operations, following this
-precedent is perfectly reasonable. In Scalar mode,
-`maddedu` therefore stores the two halves of the 128-bit multiply
-into RT and RT+1.
-
-What, then, of `sv.maddedu`? If the destination is hard-coded to
-RT and RT+1 the instruction is not useful when Vectorised because
-the output will be overwritten on the next element. To solve this
-is easy: define the destination registers as RT and RT+MAXVL
-respectively. This makes it easy for compilers to statically allocate
-registers even when VL changes dynamically.
+with SVP64 Horizontal-First, because Horizontal Mode may only be applied
+to a single instruction at a time, and SVP64 is based on the principle of
+strict Program Order even at the element level. Thus it becomes necessary
+to add explicit more complex single instructions with more operands than
+would normally be seen in the average RISC ISA (3-in, 2-out, in some
+cases). If it was not for Power ISA already having LD/ST with update as
+well as Condition Codes and `lq` this would be hard to justify.
+
+With limited space in the `EXTRA` Field, and Power ISA opcodes being only
+32 bit, 5 operands is quite an ask. `lq` however sets a precedent: `RTp`
+stands for "RT pair". In other words the result is stored in RT and RT+1.
+For Scalar operations, following this precedent is perfectly reasonable.
+In Scalar mode, `maddedu` therefore stores the two halves of the 128-bit
+multiply into RT and RT+1.
+
+What, then, of `sv.maddedu`? If the destination is hard-coded to RT and
+RT+1 the instruction is not useful when Vectorised because the output
+will be overwritten on the next element. To solve this is easy: define
+the destination registers as RT and RT+MAXVL respectively. This makes
+it easy for compilers to statically allocate registers even when VL
+changes dynamically.