From: lkcl Date: Sat, 25 Sep 2021 21:05:49 +0000 (+0100) Subject: (no commit message) X-Git-Tag: DRAFT_SVP64_0_1~1 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=be6ef59b8757d9626cde29557c9d87141134229d;p=libreriscv.git --- diff --git a/3d_gpu/architecture/dynamic_simd/assign.mdwn b/3d_gpu/architecture/dynamic_simd/assign.mdwn index 1ca9557c6..4c2b081f4 100644 --- a/3d_gpu/architecture/dynamic_simd/assign.mdwn +++ b/3d_gpu/architecture/dynamic_simd/assign.mdwn @@ -84,10 +84,20 @@ This is similar to the parallel case except A is repeated | partition | o3 | o2 | o1 | o0 | | --------- | -- | -- | -- | -- | | 000 | [A7A7A7A7] | [A7A7A7A7] | A7A6A5A4 | A3A2A1A0 | -| 001 | [A5A5A5A5] | [A5A5]A5A4 | A3A2A1A0 | [A1A1]A1A0 | -| 010 | [A3A3A3A3] | A3A2A1A0 | [A3A3A3A3] | A3A2A1A0 | -| 011 | [A3A3A3A3] | A3A2A1A0 | [A1A1]A1A0 | [A1A1]A1A0 | -| 100 | [A1A1]A1A0 | [A5A5A5A5] | [A5A5]A5A4 | A3A2A1A0 | -| 101 | [A1A1]A1A0 | [A3A3A3A3] | A3A2A1A0 | [A1A1]A1A0 | -| 110 | [A1A1]A1A0 | [A1A1]A1A0 | [A3A3A3A3] | A3A2A1A0 | -| 111 | [A1A1]A1A0 | [A1A1]A1A0 | [A1A1]A1A0 | [A1A1]A1A0 | +| 001 | [A7A7A7A7] | A7A6A5A4 | A3A2A1A0 | A3A2A1A0 | +| 010 | A7A6A5A4 | A3A2A1A0 | A7A6A5A4 | A3A2A1A0 | +| 011 | A7A6A5A4 | A3A2A1A0 | A3A2A1A0 | A3A2A1A0 | +| 100 | A3A2A1A0 | [A7A7A7A7] | A7A6A5A4 | A3A2A1A0 | +| 101 | A3A2A1A0 | A7A6A5A4 | A3A2A1A0 | A3A2A1A0 | +| 110 | A3A2A1A0 | A3A2A1A0 | A7A6A5A4 | A3A2A1A0 | +| 111 | A3A2A1A0 | A3A2A1A0 | A3A2A1A0v | A3A2A1A0 | + +Note how when the entire partition set is open (1x 16-bit output) +that all of A is copied out, and either zero or sign extended +in the top half of the output. At the other extreme is the +4x 4-bit output partitions, which have four copies of A, truncated +from the first 4 bits of A. + +Unlike the parallel case, A is not itself partitioned, so is copied +over as much as is possible. In some cases such as `1x 4-bit, 1x 12-bit` +the 8-bit scalar source will need sign or zero extending.