add poly1305.mdwn - documentation

[libreriscv.git] / simple_v_extension / remap.mdwn
diff --git a/simple_v_extension/remap.mdwn b/simple_v_extension/remap.mdwn

index afccb13c39837e4e161295373bb4ebd20a03a114..85fcd35517e7620e8ef6cb95f6cf0a56c84b8efa 100644 (file)
--- a/simple_v_extension/remap.mdwn
+++ b/simple_v_extension/remap.mdwn
@@ -2,160 +2,4 @@
  
  # NOTE
  
-This section is under revision (and is optional)
-
-# REMAP CSR <a name="remap" />
-
-There is one 32-bit CSR which may be used to indicate which registers,
-if used in any operation, must be "reshaped" (re-mapped) from a linear
-form to a 2D or 3D transposed form, or "offset" to permit arbitrary
-access to elements within a register.
-
-Their primary use is for Matrix Multiplication, reordering of sequential data in-place.  Three CSRs are provided so that a single FMAC may be used in a single loop to perform 4x4 times 4x4 Matrix multiplication, generating 64 FMACs
-
-The 32-bit REMAP CSR may reshape up to 3 registers:
-
-| 29..28 | 27..26 | 25..24 | 23 | 22..16  | 15 | 14..8   | 7  | 6..0    |
-| ------ | ------ | ------ | -- | ------- | -- | ------- | -- | ------- |
-| shape2 | shape1 | shape0 | 0  | regidx2 | 0  | regidx1 | 0  | regidx0 |
-
-regidx0-2 refer not to the Register CSR CAM entry but to the underlying
-*real* register (see regidx, the value) and consequently is 7-bits wide.
-When set to zero (referring to x0), clearly reshaping x0 is pointless,
-so is used to indicate "disabled".
-shape0-2 refers to one of three SHAPE CSRs.  A value of 0x3 is reserved.
-Bits 7, 15, 23, 30 and 31 are also reserved, and must be set to zero.
-
-It is anticipated that these specialist CSRs not be very often used.
-Unlike the CSR Register and Predication tables, the REMAP CSRs use
-the full 7-bit regidx so that they can be set once and left alone,
-whilst the CSR Register entries pointing to them are disabled, instead.
-
-# SHAPE 1D/2D/3D vector-matrix remapping CSRs
-
-There are three "shape" CSRs, SHAPE0, SHAPE1, SHAPE2, 32-bits in each,
-which have the same format.  When each SHAPE CSR is set entirely to zeros,
-remapping is disabled: the register's elements are a linear (1D) vector.
-
-| 31..25 | 24..22  | 21-18   | 17..12  | 11..6   | 5..0    |
-| ------ | ------- | --      | ------- | ------- | --      | ------- |
-| modulo | permute | offs    | zdimsz  | ydimsz  | xdimsz  |
-
-modulo is applied to the output, causing it to cycle within the range 0..modulo-1. Note that zero indicates "unlimited". With VL being a maximum of 64, modulo is also 6 bits. Modulo is applied after dimensional remapping.
-
-offs is a 4-bit field, spread out across bits 7, 15 and 23, which
-is added to the element index during the loop calculation. It is added prior to the dimensional remapping.
-
-xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates
-that the array dimensionality for that dimension is 1.  A value of xdimsz=2
-would indicate that in the first dimension there are 3 elements in the
-array.  The format of the array is therefore as follows:
-
-    array[xdim+1][ydim+1][zdim+1]
-
-However whilst illustrative of the dimensionality, that does not take the
-"permute" setting into account.  "permute" may be any one of six values
-(0-5, with values of 6 and 7 being reserved, and not legal).  The table
-below shows how the permutation dimensionality order works:
-
-| permute | order | array format             |
-| ------- | ----- | ------------------------ |
-| 000     | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
-| 001     | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
-| 010     | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
-| 011     | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
-| 100     | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
-| 101     | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
-
-In other words, the "permute" option changes the order in which
-nested for-loops over the array would be done.  The algorithm below
-shows this more clearly, and may be executed as a python program:
-
-    # mapidx = REMAP.shape2
-    xdim = 3 # SHAPE[mapidx].xdim_sz+1
-    ydim = 4 # SHAPE[mapidx].ydim_sz+1
-    zdim = 5 # SHAPE[mapidx].zdim_sz+1
-
-    lims = [xdim, ydim, zdim]
-    idxs = [0,0,0] # starting indices
-    order = [1,0,2] # experiment with different permutations, here
-    offs = 0        # experiment with different offsets, here
-
-    for idx in range(xdim * ydim * zdim):
-        new_idx = offs + idxs[0] + idxs[1] * xdim + idxs[2] * xdim * ydim
-        print new_idx,
-        for i in range(3):
-            idxs[order[i]] = idxs[order[i]] + 1
-            if (idxs[order[i]] != lims[order[i]]):
-                break
-            print
-            idxs[order[i]] = 0
-
-Here, it is assumed that this algorithm be run within all pseudo-code
-throughout this document where a (parallelism) for-loop would normally
-run from 0 to VL-1 to refer to contiguous register
-elements; instead, where REMAP indicates to do so, the element index
-is run through the above algorithm to work out the **actual** element
-index, instead.  Given that there are three possible SHAPE entries, up to
-three separate registers in any given operation may be simultaneously
-remapped:
-
-    function op_add(rd, rs1, rs2) # add not VADD!
-      ...
-      ...
-      for (i = 0; i < VL; i++)
-        xSTATE.srcoffs = i # save context
-        if (predval & 1<<i) # predication uses intregs
-           ireg[rd+remap(id)] <= ireg[rs1+remap(irs1)] +
-                                 ireg[rs2+remap(irs2)];
-           if (!int_vec[rd ].isvector) break;
-        if (int_vec[rd ].isvector)  { id += 1; }
-        if (int_vec[rs1].isvector)  { irs1 += 1; }
-        if (int_vec[rs2].isvector)  { irs2 += 1; }
-
-By changing remappings, 2D matrices may be transposed "in-place" for one
-operation, followed by setting a different permutation order without
-having to move the values in the registers to or from memory.  Also,
-the reason for having REMAP separate from the three SHAPE CSRs is so
-that in a chain of matrix multiplications and additions, for example,
-the SHAPE CSRs need only be set up once; only the REMAP CSR need be
-changed to target different registers.
-
-Note that:
-
-* Over-running the register file clearly has to be detected and
-  an illegal instruction exception thrown
-* When non-default elwidths are set, the exact same algorithm still
-  applies (i.e. it offsets elements *within* registers rather than
-  entire registers).
-* If permute option 000 is utilised, the actual order of the
-  reindexing does not change!
-* If two or more dimensions are set to zero, the actual order does not change!
-* The above algorithm is pseudo-code **only**.  Actual implementations
-  will need to take into account the fact that the element for-looping
-  must be **re-entrant**, due to the possibility of exceptions occurring.
-  See MSTATE CSR, which records the current element index.
-* Twin-predicated operations require **two** separate and distinct
-  element offsets.  The above pseudo-code algorithm will be applied
-  separately and independently to each, should each of the two
-  operands be remapped.  *This even includes C.LDSP* and other operations
-  in that category, where in that case it will be the **offset** that is
-  remapped (see Compressed Stack LOAD/STORE section).
-* Offset is especially useful, on its own, for accessing elements
-  within the middle of a register.  Without offsets, it is necessary
-  to either use a predicated MV, skipping the first elements, or
-  performing a LOAD/STORE cycle to memory.
-  With offsets, the data does not have to be moved.
-* Setting the total elements (xdim+1) times (ydim+1) times (zdim+1) to
-  less than MVL is **perfectly legal**, albeit very obscure.  It permits
-  entries to be regularly presented to operands **more than once**, thus
-  allowing the same underlying registers to act as an accumulator of
-  multiple vector or matrix operations, for example.
-
-Clearly here some considerable care needs to be taken as the remapping
-could hypothetically create arithmetic operations that target the
-exact same underlying registers, resulting in data corruption due to
-pipeline overlaps.  Out-of-order / Superscalar micro-architectures with
-register-renaming will have an easier time dealing with this than
-DSP-style SIMD micro-architectures.
-
+This section is **OBSOLETE** and superceded by [[openpower/sv/remap]]