-Where a normal SIMD ISA requires explicit hand-crafted optimisation
-in order to achieve full utilisation of the underlying hardware,
-Simple-V instead can rely to a large extent on standard Multi-Issue
-hardware to achieve similar performance, whilst crucially keeping the
-algorithm implementation down to a shockingly-simple degree that makes
-it easy to understand an easy to review. Again also as with many
-other algorithms when implemented in Simple-V SVP54, by keeping to
-a LOAD-COMPUTE-STORE paradigm the L1 Data Cache usage is minimised,
-and in this case just as with chacha20 the entire algorithm, being
-only 9 lines of assembler fitting into 13 4-byte words it can fit
-into a single L1 I-Cache Line without triggering Virtual Memory TLB
-misses.
-
-Further performance improvements are achievable by using REMAP
-Parallel Reduction, still fitting into a single L1 Cache line,
-but beginning to approach the limit of the 128-long register file.
+Where a normal SIMD ISA requires explicit hand-crafted optimisation in
+order to achieve full utilisation of the underlying hardware, Simple-V
+instead can rely to a large extent on standard Multi-Issue hardware
+to achieve similar performance, whilst crucially keeping the algorithm
+implementation down to a shockingly-simple degree that makes it easy to
+understand an easy to review. Again also as with many other algorithms
+when implemented in Simple-V SVP54, by keeping to a LOAD-COMPUTE-STORE
+paradigm the L1 Data Cache usage is minimised, and in this case just
+as with chacha20 the entire algorithm, being only 9 lines of assembler
+fitting into 13 4-byte words it can fit into a single L1 I-Cache Line
+without triggering Virtual Memory TLB misses.
+
+Further performance improvements are achievable by using REMAP Parallel
+Reduction, still fitting into a single L1 Cache line, but beginning to
+approach the limit of the 128-long register file.