add appendix and compliancy levels to ls010,

[libreriscv.git] / openpower / sv / svp64 / appendix.mdwn
diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn

index 46b399834d3d414bb77f82935fb1004a3d2465d3..5e6478ae0d4eacf0483160775eeb1fc3a16cbef9 100644 (file)
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -1,5 +1,3 @@
-[[!tag standards]]
-
  # Appendix
  
  * <https://bugs.libre-soc.org/show_bug.cgi?id=574> Saturation
@@ -17,7 +15,7 @@ Table of contents:
  
  [[!toc]]
  
-# Partial Implementations
+## Partial Implementations
  
  It is perfectly legal to implement subsets of SVP64 as long as illegal
  instruction traps are always raised on unimplemented features,
@@ -32,7 +30,7 @@ opportunity to emulate the context created by the given SPR.
  
  See [[sv/compliancy_levels]] for full details.
  
-# XER, SO and other global flags
+## XER, SO and other global flags
  
  Vector systems are expected to be high performance.  This is achieved
  through parallelism, which requires that elements in the vector be
@@ -81,7 +79,7 @@ may be performed by setting VL=8, and a one-instruction
  1024-bit Add-with-Carry by setting VL=16, and so on.  More on
  this in [[openpower/sv/biginteger]]
  
-# EXTRA Field Mapping
+## EXTRA Field Mapping
  
  The purpose of the 9-bit EXTRA field mapping is to mark individual
  registers (RT, RA, BFA) as either scalar or vector, and to extend
@@ -139,7 +137,7 @@ through the Power ISA WG Process). It would
  be similar to deciding that `add` should be changed from X-Form
  to D-Form.
  
-# Single Predication <a name="1p"> </a>
+## Single Predication <a name="1p"> </a>
  
  This is a standard mode normally found in Vector ISAs.  every element in every source Vector and in the destination uses the same bit of one single predicate mask.
  
@@ -201,7 +199,7 @@ The following schedule for srcstep and dststep will occur:
  
  Here, both srcstep and dststep remain in lockstep because sz=dz=1
  
-# Twin Predication <a name="2p"> </a>
+## Twin Predication <a name="2p"> </a>
  
  This is a novel concept that allows predication to be applied to a single
  source and a single dest register.  The following types of traditional
@@ -245,7 +243,7 @@ is not actually a Vector ISA: it is a loop-abstraction-concept that
  is applied *in general* to Scalar operations, just like the x86
  `REP` instruction (if put on steroids).
  
-# Pack/Unpack
+## Pack/Unpack
  
  The pack/unpack concept of VSX `vpack` is abstracted out as Sub-Vector
  reordering.
@@ -314,7 +312,7 @@ for Vertical-First Mode.
  
  Pack/Unpack is enabled (set up) through [[sv/svstep]].
  
-# Reduce modes
+## Reduce modes
  
  Reduction in SVP64 is deterministic and somewhat of a misnomer.  A normal
  Vector ISA would have explicit Reduce opcodes with defined characteristics
@@ -344,7 +342,7 @@ Order.
  In essence it becomes the programmer's responsibility to leverage the
  pre-determined schedules to desired effect.
  
-## Scalar result reduction and iteration
+### Scalar result reduction and iteration
  
  Scalar Reduction per se does not exist, instead is implemented in SVP64
  as a simple and natural relaxation of the usual restriction on the Vector
@@ -462,7 +460,7 @@ as far as the user is concerned, all exceptions and interrupts **MUST**
  be precise.
  
  
-# Fail-on-first <a name="fail-first"> </a>
+## Fail-on-first <a name="fail-first"> </a>
  
  Data-dependent fail-on-first has two distinct variants: one for LD/ST
  (see [[sv/ldst]],
@@ -548,7 +546,7 @@ will rely.
  REMAP will need to be activated to invert the ordering of element
  traversal.*
  
-## Data-dependent fail-first on CR operations (crand etc)
+### Data-dependent fail-first on CR operations (crand etc)
  
  Operations that actually produce or alter CR Field as a result
  do not also in turn have an Rc=1 mode.  However it makes no
@@ -566,7 +564,7 @@ There are two primary different types of CR operations:
  
  More details can be found in [[sv/cr_ops]].
  
-# pred-result mode
+## pred-result mode
  
  Pred-result mode may not be applied on CR-based operations.
  
@@ -584,7 +582,7 @@ there can be no pred-result mode  for mtcr and other CR-based instructions
  Arithmetic and Logical Pred-result, which does have Rc=1 or for which
  RC1 Mode makes sense, is covered in [[sv/normal]]
  
-# CR Operations
+## CR Operations
  
  CRs are slightly more involved than INT or FP registers due to the
  possibility for indexing individual bits (crops BA/BB/BT).  Again however
@@ -592,7 +590,7 @@ the access pattern needs to be understandable in relation to v3.0B / v3.1B
  numbering, with a clear linear relationship and mapping existing when
  SV is applied.
  
-## CR EXTRA mapping table and algorithm <a name="cr_extra"></a>
+### CR EXTRA mapping table and algorithm <a name="cr_extra"></a>
  
  Numbering relationships for CR fields are already complex due to being
  in BE format (*the relationship is not clearly explained in the v3.0B
@@ -667,7 +665,7 @@ batches of aligned 32-bit chunks (CR0-7, CR7-15).  This is to greatly
  simplify internal design.  If instructions are issued where CR Vectors
  do not start on a 32-bit aligned boundary, performance may be affected.
  
-## CR fields as inputs/outputs of vector operations
+### CR fields as inputs/outputs of vector operations
  
  CRs (or, the arithmetic operations associated with them)
  may be marked as Vectorised or Scalar.  When Rc=1 in arithmetic operations that have no explicit EXTRA to cover the CR, the CR is Vectorised if the destination is Vectorised.  Likewise if the destination is scalar then so is the CR.
@@ -725,7 +723,7 @@ and VL truncation provide several benefits.
  
  (see [[discussion]].  some alternative schemes are described there)
  
-## Rc=1 when SUBVL!=1
+### Rc=1 when SUBVL!=1
  
  sub-vectors are effectively a form of Packed SIMD (length 2 to 4). Only 1 bit of
  predicate is allocated per subvector; likewise only one CR is allocated
@@ -737,7 +735,7 @@ is to perform a bitwise OR or AND of the subvector tests.  Given that
  OE is ignored in SVP64, this field may (when available) be used to select OR or
  AND behavior.
  
-### Table of CR fields
+#### Table of CR fields
  
  CRn is the notation used by the OpenPower spec to refer to CR field #i,
  so FP instructions with Rc=1 write to CR1 (n=1).
@@ -753,15 +751,15 @@ are arranged.  TODO a python program that auto-generates a CSV file
  which can be included in a table, which is in a new page (so as not to
  overwhelm this one). [[svp64/cr_names]]
  
-# Register Profiles
+## Register Profiles
  
  Instructions are broken down by Register Profiles as listed in the
  following auto-generated page: [[opcode_regs_deduped]].  These tables,
  despite being auto-generated, are part of the Specification.
  
-# SV pseudocode illustration
+## SV pseudocode illustration
  
-## Single-predicated Instruction
+### Single-predicated Instruction
  
  illustration of normal mode add operation: zeroing not included, elwidth
  overrides not included.  if there is no predicate, it is set to all 1s
@@ -805,7 +803,7 @@ intended, then an all-Scalar operation should be used.
  
  See <https://bugs.libre-soc.org/show_bug.cgi?id=552>
  
-# Assembly Annotation
+## Assembly Annotation
  
  Assembly code annotation is required for SV to be able to successfully
  mark instructions as "prefixed".
@@ -861,7 +859,7 @@ For modes:
    - mr OR crm: "normal" map-reduce mode or CR-mode.
    - mr.svm OR crm.svm: when vec2/3/4 set, sub-vector mapreduce is enabled
  
-# Parallel-reduction algorithm
+## Parallel-reduction algorithm
  
  The principle of SVP64 is that SVP64 is a fully-independent
  Abstraction of hardware-looping in between issue and execute phases 
@@ -910,7 +908,7 @@ insert micro-architectural lane-crossing Move operations
  if necessary or desired, to give the level of efficiency or performance
  required.**
  
-# Element-width overrides <a name="elwidth"> </>
+## Element-width overrides <a name="elwidth"> </>
  
  Element-width overrides are best illustrated with a packed structure
  union in the c programming language.  The following should be taken
@@ -991,7 +989,7 @@ Thus it can be clearly seen that elements are packed by their
  element width, and the packing starts from the source (or destination)
  specified by the instruction.
  
-# Twin (implicit) result operations
+## Twin (implicit) result operations
  
  Some operations in the Power ISA already target two 64-bit scalar
  registers: `lq` for example, and LD with update.
@@ -1090,3 +1088,9 @@ with an implicit 2nd destination:
  * [[isa/svfixedarith]]
  * [[isa/svfparith]]
  
+[[!tag standards]]
+
+------
+
+\newpage()
+