(no commit message)
[libreriscv.git] / openpower / sv / remap / discussion.mdwn
1 # TODO svshape offset concept
2
3 svshape2 offs,inv,yx,rmm,SVd,sk,mm
4
5 | 0.5|6..8|9 |10|11.15 |16..20 | 21..25 | 25 | 26..31| name |
6 | -- |----|---|--| --- | ----- | ------ | -- | ------| -------- |
7 |OPCD|offs|inv|yx| rmm | SVd | 100/mm | sk | XO | svshape |
8
9 * **offs** (3 bits) - unsigned offset
10 * **yx** (1 bit) - swap XY to YX
11 * **inv** (1 bit) inverts X if yx=0, Y if yx=1
12 * **SVd** dimension size
13 * **rmm** REMAP mask
14 * **mm** mask mode
15 * **sk** (1 bit) skips 1st dimension if set
16
17 Dimensions are calculated exactly as `svindex`. `rmm` and
18 `mm` are as per `svindex`.
19
20 # TODO
21
22 * investigate https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6879380/#!po=19.6429
23 in https://bugs.libre-soc.org/show_bug.cgi?id=653
24 * UTF-8 <https://bugs.libre-soc.org/show_bug.cgi?id=794>
25 * Triangular REMAP
26 * Cross-Product REMAP (actually, skew Matrix: https://en.m.wikipedia.org/wiki/Skew-symmetric_matrix)
27 * Convolution REMAP
28
29 # Warshall transitive closure algorithm
30
31 TODO move to [[sv/remap/discussion]] page, copied from here
32 http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-July/003286.html
33
34 with thanks to Hendrik.
35
36 <https://en.m.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm>
37
38 > Just a note: interpreting + as 'or', and * as 'and',
39 > operating on Boolean matrices,
40 > and having result, X, and Y be the exact same matrix,
41 > updated while being used,
42 > gives the traditional Warshall transitive-closure
43 > algorithm, if the loops are nested exactly in thie order.
44
45 this can be done with the ternary instruction which has
46 an in-place triple boolean input:
47
48 ```
49 RT = RT | (RA & RB)
50 ```
51
52 and also has a CR Field variant of the same
53
54 notes from conversations:
55
56 > > for y in y_r:
57 > > for x in x_r:
58 > > for z in z_r:
59 > > result[y][x] +=
60 > > a[y][z] *
61 > > b[z][x]
62
63 > This nesting of loops works for matrix multiply, but not for transitive
64 > closure.
65
66 > > it can be done:
67 > >
68 > >   for z in z_r:
69 > >    for y in y_r:
70 > >     for x in x_r:
71 > >       result[y][x] +=
72 > >          a[y][z] *
73 > >          b[z][x]
74 >
75 > And this ordering of loops *does* work for transitive closure, when a,
76 > b, and result are the very same matrix, updated while being used.
77 >
78 > By the way, I believe there is a graph algorithm that does the
79 > transitive closure thing, but instead of using boolean, "and", and "or",
80 > they use real numbers, addition, and minimum.  I think that one computes
81 > shortest paths between vertices.
82 >
83 > By the time the z'th iteration of the z loop begins, the algorithm has
84 > already peocessed paths that go through vertices numbered < z, and it
85 > adds paths that go through vertices numbered z.
86 >
87 > For this to work, the outer loop has to be the one on the subscript that
88 > bridges a and b (which in this case are teh same matrix, of course).
89
90 # SUBVL Remap
91
92 Remapping of SUBVL (vec2/3/4) elements is not permitted: the vec2/3/4
93 itself must be considered to be the "element". To perform REMAP
94 on the elements of a vec2/3/4, either use [[sv/mv.swizzle]], or,
95 due to the sub-elements themselves being contiguous, treat them as
96 such and use Indexing, or add one
97 extra dimension to Matrix REMAP, the inner dimension being the size
98 of the Subvector (2, 3, or 4).
99
100 Note that Swizzle on Sub-vectors may be applied on top of REMAP.
101 Where this is appropriate is the Rijndael MixColumns
102 stage:
103
104 <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/76/AES-MixColumns.svg/600px-AES-MixColumns.svg.png" width="400px" />
105
106 Assuming that the bytes are stored `a00 a01 a02 a03 a10 .. a33`
107 a 2D REMAP allows:
108
109 * the column bytes (as a vec4) to be iterated over as an inner loop,
110 progressing vertically (`a00 a10 a20 a30`)
111 * the columns themselves to be iterated as an outer loop
112 * a 32 bit `GF(256)` Matrix Multiply on the vec4 to be performed.
113
114 This entirely in-place without special 128-bit opcodes. Below is
115 the pseudocode for [[!wikipedia Rijndael MixColumns]]
116
117 ```
118 void gmix_column(unsigned char *r) {
119 unsigned char a[4];
120 unsigned char b[4];
121 unsigned char c;
122 unsigned char h;
123 // no swizzle here but vec4 byte-level
124 // elwidth overrides can be done though.
125 for (c = 0; c < 4; c++) {
126 a[c] = r[c];
127 h = (unsigned char)((signed char)r[c] >> 7);
128 b[c] = r[c] << 1;
129 b[c] ^= 0x1B & h; /* Rijndael's Galois field */
130 }
131 // These may then each be 4x 8bit Swizzled
132 // r0.vec4 = b.vec4
133 // r0.vec4 ^= a.vec4.WXYZ
134 // r0.vec4 ^= a.vec4.ZWXY
135 // r0.vec4 ^= b.vec4.YZWX ^ a.vec4.YZWX
136 r[0] = b[0] ^ a[3] ^ a[2] ^ b[1] ^ a[1];
137 r[1] = b[1] ^ a[0] ^ a[3] ^ b[2] ^ a[2];
138 r[2] = b[2] ^ a[1] ^ a[0] ^ b[3] ^ a[3];
139 r[3] = b[3] ^ a[2] ^ a[1] ^ b[0] ^ a[0];
140 }
141 ```
142
143 The application of the swizzles allows the remapped vec4 a, b and r
144 variables to perform four straight linear 32 bit XOR operations where a
145 scalar processor would be required to perform 16 byte-level individual
146 operations. Given wide enough SIMD backends in hardware these 3 bit
147 XORs may be done as single-cycle operations across the entire 128 bit
148 Rijndael Matrix.
149
150 The other alternative is to simply perform the actual 4x4 GF(256) Matrix
151 Multiply using the MDS Matrix.
152