add appendix and compliancy levels to ls010,
[libreriscv.git] / openpower / sv / compliancy_levels.mdwn
1 # Simple-V Compliancy Levels
2
3 The purpose of the Compliancy Levels is to provide a documented
4 stable base for implementors to achieve software interoperability
5 without requiring a high and unnecessary hardware cost unrelated
6 to their needs. The bare
7 minimum requirement, particularly suited for Ultra-embedded, requires
8 just one instruction, reservation of SPRs, and the rest may entirely
9 be Soft-emulated by raising Illegal Instruction traps. At the other
10 end of the spectrum is the full REMAP Structure Packing suitable
11 for traditional Vector Processing workloads and High-performance
12 energy-efficient DSP workloads.
13
14 To achieve full soft-emulated interoperability, all implementations
15 **must**, at the bare minimum, raise Illegal Instruction traps for
16 all SPRs including all reserved SPRs, all SVP64-related Context
17 instructions (REMAP), as well as for the entire SVP64 Prefix space.
18
19 *Even if the Power ISA Scalar Specification states that a given
20 Scalar
21 instruction need not or must not raise an illegal instruction on UNDEFINED
22 behaviour, unimiplemented parts of SVP64 *MUST* raise an illegal
23 instruction trap when (and only when)
24 that same Scalar instruction is Prefixed*. It is absolutely critical
25 to note that when not Prefixed, under no circumstances shall the Scalar
26 instruction deviate from the Scalar Power ISA Specification.
27
28 Summary of Compliancy Levels, each Level includes all lower levels:
29
30 * **Zero-Level**: Simple-V is not implemented (at all) in hardware. This
31 Level is required to be listed because all capabilities of Simple-V
32 must be Soft-emulatable.
33 * **Ultra-embedded**: `setvl` instruction and context-switching of SVSTATE
34 to/from SVSRR1. Register Files as Standard Power ISA. `scalar identity`
35 implemented.
36 * **Embedded**: `svstep` instruction,
37 and support for Hardware for-looping
38 in both Horizontal-First and Vertical-First Mode as well as Predication
39 (Single and Twin) for the GPRs r3, r10 and r30. CR-Field-based
40 Predicates do not need to be added.
41 * **Embedded DSP/AV**: 128 registers,
42 element-width
43 overrides, and Saturation and Mapreduce/Iteration Modes.
44 * **High-end DSP/AV**: Same as Embedded-DSP/AV except also
45 including Indexed and Offset REMAP capability.
46 * **3D/Advanced/Supercomputing**: all SV Branch instructions;
47 crweird and vector-assist instructions (`set-before-first` etc);
48 Swizzle Move instructions;
49 Matrix, DCT/FFT and Indexing
50 REMAP capability; Fail-First and Predicate-Result Modes.
51
52 These requirements within each Level constitute the minimum mandatory
53 capabilities.
54 It is also permitted that any Level include any part of a higher Compliancy
55 Level. For example:
56 an Embedded Level is permitted to have 128 GPRs, FPRs and CR Fields,
57 but the Compliance Tests for Embedded will only test for 32. DSP/VPU Level
58 is permitted to implement the DCT REMAP capability, but will not be
59 permitted to declare meeting the 3D/Advanced Level unless implementing
60 *all* REMAP Capabilities.
61
62 **Power ISA Compliancy Levels**
63
64 The SV Compliancy Levels have nothing to do with the Power ISA Compliancy
65 Levels (SFS, SFFS, Linux, AIX). They are separate and independent. It
66 is perfectly fine to implement Ultra-Embedded on AIX, and perfectly fine to implement 3D/Advanced on SFS. **Compliance with SV Levels does not convey or remove the obligation of Compliance with SFS/SFFS/Linux/AIX Levels and vice-versa**.
67
68 ## Zero-Level
69
70 This level exists to indicate the critical importance of all and any
71 features attempted to be executed on hardware that has no support at
72 all for Simple-V being **required** to raise Illegal Exceptions.
73 **This includes existing Power ISA Implementations:** IBM POWER being
74 the most notable.
75
76 With parts of the Power ISA being "silent executed" (hints for example),
77 it is absolutely critical to have all capabilities of Simple-V sit
78 within full Illegal Instruction space of existing and future Hardware.
79
80 ## Ultra-Embedded Level
81
82 This level exists as an entry-level into SVP64, most suited to resource
83 constrained soft cores, or Hardware implementations where unit cost is a much
84 higher priority than execution speed.
85
86 This level sets the bare minimum requirements, where everything with the
87 exception of `scalar identity` and
88 the `setvl` instruction may be software-emulated through
89 JIT Translation or Illegal Instruction traps. SVSTATE, as effectively
90 a Sub-Program-Counter, joins MSR and PC (CIA, NIA)
91 as direct peers and must be switched on any context-switch (Trap or
92 Exception)
93
94 * PC is saved/restored to/from SRR0
95 * MSR is saved/restored to/from SRR1
96 * SVSTATE **must** also be saved/restored to/from SVSRR1
97
98 Any implementation that implements Hypervisor Mode must also
99 correspondingly follow the Power ISA Spec guidelines for HSRR0 and HSRR1,
100 and must save/restore SVSTATE to/from HSVSRR1 in all circumstances
101 involving save/restore to/from HSRR0 and HSRR1.
102
103 Illegal Instruction Trap **must** be raised on:
104
105 * Any SV instructions not implemented
106 * any unimplemented SV Context SPRs read or written
107 * all unimplemented uses of the SVP64 Prefix
108 * non-scalar-identity SVP64 instructions
109
110 Implementors are free and clear to implement any other features of
111 SVP64 however only by meeting all of the mandatory requirements above
112 will Compliance with the Ultra-Embedded Level be achieved.
113
114 Note that `scalar identity` is defined as being when the execution of
115 an SVP64 Prefixed instruction is identical in every respect to
116 Scalar non-prefixed, i.e. as if the Prefix had not been present.
117 Additionally all SV SPRs must be zero and the 24-bit `RM` field must be zero.
118
119 ## Embedded Level
120
121 This level is more suitable for Hardware implementations where performance and power saving begins to matter. A second instruction, `svstep`, used
122 by Vertical-First Mode, is required, as is hardware-level looping in
123 Horizontal-First Mode. Illegal Instruction trap may not be used to
124 emulate `svstep`.
125
126 At the bare minimum, Twin and Single Predication must be supported for
127 at least the GPRs r3, r10 and r30. CR Field Predication may also be
128 supported in hardware but only by also increasing the number of CR Fields
129 to the required total 128.
130
131 Another important aspect is that when Rc=1 is set, CR Field Vector co-results
132 are produced. Should these exceed CR7 (CR8-CR127) and the number of CR Fields
133 has not been increased to 128 then an Illegal Instruction Trap must be
134 raised. In practical terms, to avoid this occurrence in Embedded software,
135 MAXVL should not
136 exceed 8 for Arithmetic or Logical operations with Rc=1.
137
138 Zeroing on source and destination for Predicates
139 must also be supported (sz, dz) however
140 all other Modes (Saturation, Fail-First, Predicate-Result,
141 Iteration/Reduction) are entirely optional. Implementation of Element-Width
142 Overrides is also optional.
143
144 One of the important side-benefits of this SV Compliancy Level is that it
145 brings Hardware-level support for Scalar Predication (VL=MAXVL=1)
146 to the entire Scalar Power
147 ISA, completely without
148 modifying the Scalar Power ISA. The cost in software is that Predicated
149 instructions are Prefixed
150 to 64-bit.
151
152 ## DSP / Audio / Video Level
153
154 This level is best suited to high-performance power-efficient but
155 specialist Compute workloads. 128 GPRs, FPRs and CR Fields are all
156 required, as is element-width overrides to allow data processing
157 down to the 8-bit level. SUBVL support (Sub-Vector vec2/3/4) is also
158 required, as is Pack/Unpack EXTRA format (helps with Pixel and
159 Audio Stream Structured data)
160
161 All SVP64 Modes must be implemented in hardware: Saturation
162 in particular is a necessity for Audio DSP work. Reduction as well to
163 assist with Audio/Video.
164
165 It is not mandatory for this Level to have DCT/FFT REMAP Capability in
166 hardware but
167 due to the high prevalence of DCT and FFT in Audio, Video and DSP
168 workloads it is strongly recommended. Matrix (Dimensional) REMAP
169 and Swizzle may also be useful to help with 24-bit (3 byte) Structured Audio Streams and are also recommended but not mandatory.
170
171 ## High-end DSP
172
173 In this Compliancy Level the benefits of the Offset and Index REMAP
174 subsystem becomes worth its hardware cost. In lower-performing DSP
175 and A/V workloads it is not.
176
177 ## 3D / Advanced / Supercomputing
178
179 This Compliancy Level is for highest performance and energy efficiency.
180 All aspects of SVP64 must be entirely implemented, in full, in Hardware.
181 How that is achieved is entirely at the discretion of the implementor:
182 there are no hard requirements of any kind on the level of performance,
183 just as there are none in the Vulkan(TM) Specification.
184
185 Throughout the SV
186 Specification however there are hints to Micro-Architects: byte-level
187 write-enable lines on Register Files is strongly recommended, for
188 example, in order to avoid unnecessary Read-Modify-Write cycles and
189 additional Register Hazard Dependencies on fine-grained (8/16/32-bit)
190 operations. Just as with SRAMs multiple write-enable lines may be
191 raised to update higher-width elements.
192
193 ## Examples
194
195 Assuming that hardware implements scalar operations only,
196 and implements predication but not elwidth overrides:
197
198 setvli r0, 4 # sets VL equal to 4
199 sv.addi r5, r0, 1 # raises an 0x700 trap
200 setvli r0, 1 # sets VL equal to 1
201 sv.addi r5, r0, 1 # gets executed by hardware
202 sv.addi/ew=8 r5, r0, 1 # raises an 0x700 trap
203 sv.ori/sm=EQ r5, r0, 1 # executed by hardware
204
205 The first `sv.addi` raises an illegal instruction trap because
206 VL has been set to 4, and this is not supported. Likewise
207 elwidth overrides if requested always raise illegal instruction
208 traps.
209
210 Such an implementation would qualify for the "Ultra-Embedded" SV Level.
211 It would not qualify for the "Embedded" level because when VL=4 an
212 Illegal Exception is raised, and the Embedded Level requires full
213 VL Loop support in hardware.
214
215 [[!tag standards]]
216
217 -------
218
219 \newpage()
220
221