0e903519c78c533154a185e3c622868fc963b16c
[libreriscv.git] / openpower / sv / rfc / ls012.mdwn
1 # External RFC ls012: Discuss priorities of Libre-SOC Scalar(Vector) ops
2
3 **Date: 2023apr10. v1**
4
5 * Funded by NLnet Grants under EU Horizon Grants 101069594 825310
6 * <https://git.openpower.foundation/isa/PowerISA/issues/121>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1051>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=1052>
9
10 The purpose of this RFC is:
11
12 * to give a full list of upcoming Scalar opcodes developed by Libre-SOC
13 (being cognisant that *all* of them are Vectorisable)
14 * to give OPF Members and non-Members alike the opportunity to comment and get
15 involved early in RFC submission
16 * formally agree a priority order on an iterative basis with new versions
17 of this RFC,
18 * which ones should be EXT022 Sandbox, which in EXT0xx, which in EXT2xx, which
19 not proposed at all,
20 * keep readers summarily informed of ongoing RFC submissions, with new versions
21 of this RFC,
22 * for IBM (in their capacity as Allocator of Opcodes)
23 to get a clear advance picture of Opcode Allocation
24 *prior* to submission
25
26 As this is a Formal ISA RFC the evaluation shall ultimatly define
27 (in advance of the actual submission of the instructions themselves)
28 which instructions will be submitted over the next 8-18 months.
29
30 *It is expected that readers visit and interact with the Libre-SOC
31 resources in order to do due-diligence on the prioritisation
32 evaluation. Otherwise the ISA WG is overwhelmed by "drip-fed" RFCs
33 that may turn out not to be useful, against a background of having
34 no guiding overview or pre-filtering, and everybody's precious time
35 is wasted. Also note that the Libre-SOC Team, being funded by NLnet
36 under Privacy and Enhanced Trust Grants, are **prohibited** from signing
37 Commercial-Confidentiality NDAs, as doing so is a direct conflict of
38 interest with their funding body's Charitable Foundation Status and
39 remit, and therefore the **entire** set of almost 150 new SFFS instructions
40 can only go via the External RFC Process. Also be advised and aware
41 that "Libre-SOC" != "RED Semiconductor Ltd". The two are completely **separate**
42 organisations*.
43
44 Worth bearing in mind during evaluation that every "Defined Word" may
45 or may not be Vectoriseable, but that every "Defined Word" should have
46 merits on its own, not just when Vectorised. An example of a borderline
47 Vectoriseable Defined Word is `mv.swizzle` which only really becomes
48 high-priority for Audio/Video, Vector GPU and HPC Workloads, but has
49 less merit as a Scalar-only operation.
50
51 Although one of the top world-class ISAs,
52 Power ISA Scalar (SFFS) has not been significantly advanced in 12
53 years: IBM's primary focus has understandably been on PackedSIMD VSX.
54 Unfortunately, with VSX being 914 instructions and 128-bit it is far too
55 much for any new team to consider (10 years development effort) and far
56 outside of Embedded or Tablet/Desktop/Laptop power budgets. Thus bringing
57 Power Scalar up-to-date to modern standards *and on its own merits*
58 is a reasonable goal, and the advantages of the reduced focus is that
59 SFFS remains RISC-paradigm, and that lessons can be learned from other
60 ISAs from the intervening years. Good examples here include `bmask`.
61
62 SVP64 Prefixing - also known by the terms "Zero-Overhead-Loop-Prefixing"
63 as well as "True-Scalable-Vector Prefixing" - also literally brings new
64 dimensions to the Power ISA. Thus when adding new Scalar "Defined Words"
65 it has to unavoidably and simultaneously be taken into consideration
66 their value when Vector-Prefixed, *as well as* SVP64Single-Prefixed.
67
68 **Target areas**
69
70 Whilst entirely general-purpose there are some categories that these
71 instructions are targetting: Bitmanipulation, Big-integer, cryptography,
72 Audio/Visual, High-Performance Compute, GPU workloads and DSP.
73
74 **Instruction count guide and approximate priority order**
75
76 * 6 - SVP64 Management [[ls008]] [[ls009]] [[ls010]]
77 * 5 - CR weirds [[sv/cr_int_predication]]
78 * 4 - INT<->FP mv [[ls006]]
79 * 19 - GPR LD/ST-PostIncrement-Update (saves hugely in hot-loops) [[ls011]]
80 * ~12 - FPR LD/ST-PostIncrement-Update (ditto) [[ls011]]
81 * 2 - Float-Load-Immediate (always saves one LD L1/2/3 D-Cache op) [[ls002]]
82 * 5 - Big-Integer Chained 3-in 2-out (64-bit Carry) [[sv/biginteger]]
83 * 6 - Bitmanip LUT2/3 operations. high cost high reward [[sv/bitmanip]]
84 * 1 - fclass (Scalar variant of xvtstdcsp) [[sv/fclass]]
85 * 5 - Audio-Video [[sv/av_opcodes]]
86 * 2 - Shift-and-Add (mitigates LD-ST-Shift; Cryptography e.g. twofish) [[ls004]]
87 * 2 - BMI group [[sv/vector_ops]]
88 * 2 - GPU swizzle [[sv/mv.swizzle]]
89 * 9 - FP DCT/FFT Butterfly (2/3-in 2-out)
90 * ~9 Integer DCT/FFT Butterfly <https://bugs.libre-soc.org/show_bug.cgi?id=1028>
91 * 18 - Trigonometric (1-arg) [[openpower/transcendentals]]
92 * 15 - Transcendentals (1-arg) [[openpower/transcendentals]]
93 * 25 - Transcendentals (2-arg) [[openpower/transcendentals]]
94
95 Summary tables are created below by different sort categories. Additional
96 columns (and tables) as necessary can be requested to be added as part of update revisions
97 to this RFC.
98
99 \newpage{}
100
101 # Target Area summaries
102
103 Please note that there are some instructions developed thanks to NLnet
104 funding that have not been included here for assessment. Examples
105 include `pcdec` and the Galois Field arithmetic operations. From a purely
106 practical perspective due to the quantity the lower-priority instructions
107 were simply left out. However they remain in the Libre-SOC resources.
108
109 Some of these SFFS instructions appear to be duplicates of VSX.
110 A frequent argument comes up that if instructions
111 are in VSX already they should not be added to SFFS, especially if
112 they are nominally the same. The logic that this effectively damages
113 performance of an SFFS-only implementation was raised earlier, however
114 there is a more subtle reason why the instructions are needed.
115
116 Future versions of SVP64 and SVP64Single are expected to be developed
117 by future Power ISA Stakeholders on top of VSX. The decisions made
118 there about the meaning of Prefixed Vectorised VSX may be **completely**
119 different from those made for Prefixed SFFS instructions. At which
120 point the lack of SFFS equivalents would penalise SFFS implementors
121 in a much more severe way, effectively expecting them and SFFS programmers
122 to work with a non-orthogonal paradigm, to their detriment.
123 The solution is to give the SFFS Subset the space and respect that it deserves
124 and allow it to be stand-alone on its own merits.
125
126 ## SVP64 Management instructions
127
128 These without question have to go in EXT0xx. Future extended variants,
129 bringing even more powerful capabilities, can be followed up later with
130 EXT1xx prefixed variants, which is not possible if placed in EXT2xx.
131 *Only `svstep` is actually Vectoriseable*, all other Management
132 instructions are UnVectoriseable. PO1-Prefixed examples include adding
133 psvshape in order to support both Inner and Outer Product Matrix
134 Schedules, by providing the option to directly reverse the order of the
135 triple loops. Outer is used for standard Matrix Multiply (on top
136 of a standard MAC or FMAC instruction), but Inner is
137 required for Warshall Transitive Closure (on top of a cumulatively-applied
138 max instruction).
139
140 The Management Instructions themselves are all Scalar Operations, so
141 PO1-Prefixing is perfecly reasonable. SVP64 Management instructions of
142 which there are only 6 are all 5 or 6 bit XO, meaning that the opcode
143 space they take up in EXT0xx is not alarmingly high for their intrinsic
144 strategic value.
145
146 ## Transcendentals
147
148 Found at [[openpower/transcendentals]] these subdivide into high
149 priority for accelerating general-purpose and High-Performance Compute,
150 specialist 3D GPU operations suited to 3D visualisation, and low-priority
151 less common instructions where IEEE754 full bit-accuracy is paramount.
152 In 3D GPU scenarios for example even 12-bit accuracy can be overkill,
153 but for HPC Scientific scenarios 12-bit would be disastrous.
154
155 There are a **lot** of operations here, and they also bring Power
156 ISA up-to-date to IEEE754-2019. Fortunately the number of critical
157 instructions is quite low, but the caveat is that if those operations
158 are utilised to synthesise other IEEE754 operations (divide by `pi` for
159 example) full bitlevel accuracy (a hard requirement for IEEE754) is lost.
160
161 Also worth noting that the Khronos Group defines minimum acceptable
162 bit-accuracy levels for 3D Graphics: these are **nowhere near** the full
163 accuracy demanded by IEEE754, the reason for the Khronos definitions is
164 a massive reduction often four-fold in power consumption and gate count
165 when 3D Graphics simply has no need for full accuracy.
166
167 *For 3D GPU markets this definitely needs addressing*
168
169 ## Audio/Video
170
171 Found at [[sv/av_opcodes]] these do not require Saturated variants
172 because Saturation is added via [[sv/svp64]] (Vector Prefixing) and via
173 [[sv/svp64_single]] Scalar Prefixing. This is important to note for
174 Opcode Allocation because placing these operations in the UnVectoriseble
175 areas would irrediemably damage their value. Unlike PackedSIMD ISAs
176 the actual number of AV Opcodes is remarkably small once the usual
177 cascading-option-multipliers (SIMD width, bitwidth, saturation,
178 HI/LO) are abstracted out to RISC-paradigm Prefixing, leaving just
179 absolute-diff-accumulate, min-max, average-add etc. as "basic primitives".
180
181 ## Twin-Butterfly FFT/DCT/DFT for DSP/HPC/AI/AV
182
183 The number of uses in Computer Science for DCT, NTT, FFT and DFT,
184 is astonishing. The wikipedia page lists over a hundred separate and
185 distinct areas: Audio, Video, Radar, Baseband processing, AI, Solomon-Reed
186 Error Correction, the list goes on and on. ARM has special dedicated
187 Integer Twin-butterfly instructions. TI's MSP Series DSPs have had FFT
188 Inner loop support for over 30 years. Qualcomm's Hexagon VLIW Baseband
189 DSP can do full FFT triple loops in one VLIW group.
190
191 It should be pretty clear this is high priority.
192
193 With SVP64 [[sv/remap]] providing the Loop Schedules it falls to
194 the Scalar side of the ISA to add the prerequisite "Twin Butterfly"
195 operations, typically performing for example one multiply but in-place
196 subtracting that product from one operand and adding it to the other.
197 The *in-place* aspect is strategically extremely important for significant
198 reductions in Vectorised register usage, particularly for DCT.
199
200 ## CR Weird group
201
202 Outlined in [[sv/cr_int_predication]] these instructions massively save
203 on CR-Field instruction count. Multi-bit to single-bit and vice-versa
204 normally requiring several CR-ops (crand, crxor) are done in one single
205 instruction. The reason for their addition is down to SVP64 overloading
206 CR Fields as Vector Predicate Masks. Reducing instruction count in
207 hot-loops is considered high priority.
208
209 An additional need is to do popcount on CR Field bit vectors but adding
210 such instructions to the *Condition Register* side was deemed to be far
211 too much. Therefore, priority was given instead to transferring several
212 CR Field bits into GPRs, whereupon the full set of tandard Scalar GPR
213 Logical Operations may be used. This strategy has the side-effect of
214 keeping the CRweird group down to only five instructions.
215
216 ## Big-integer Math
217
218 [[sv/biginteger]] has always been a high priority area for commercial
219 applications, privacy, Banking, as well as HPC Numerical Accuracy:
220 libgmp as well as cryptographic uses in Asymmetric Ciphers. poly1305
221 and ec25519 are finding their way into everyday use via OpenSSL.
222
223 A very early variant of the Power ISA had a 32-bit Carry-in Carry-out
224 SPR. Its removal from subsequent revisions is regrettable. An alternative
225 concept is to add six explicit 3-in 2-out operations that, on close
226 inspection, always turn out to be supersets of *existing Scalar
227 operations* that discard upper or lower DWords, or parts thereof.
228
229 *Thus it is critical to note that not one single one of these operations
230 expands the bitwidth of any existing Scalar pipelines*.
231
232 The `dsld` instruction for example merely places additional LSBs into the
233 64-bit shift (64-bit carry-in), and then places the (normally discarded)
234 MSBs into the second output register (64-bit carry-out). It does **not**
235 require a 128-bit shifter to replace the existing Scalar Power ISA
236 64-bit shifters.
237
238 The reduction in instruction count these operations bring, in critical
239 hotloops, is remarkably high, to the extent where a Scalar-to-Vector
240 operation of *arbitrary length* becomes just the one Vector-Prefixed
241 instruction.
242
243 Whilst these are 5-6 bit XO their utility is considered high strategic
244 value and as such are strongly advocated to be in EXT04. The alternative
245 is to bring back a 64-bit Carry SPR but how it is retrospectively
246 applicable to pre-existing Scalar Power ISA mutiply, divide, and shift
247 operations at this late stage of maturity of the Power ISA is an entire
248 area of research on its own deemed unlikely to be achievable.
249
250 ## fclass and GPR-FPR moves
251
252 [[sv/fclass]] - just one instruction. With SFFS being locked down to
253 exclude VSX, and there being no desire within the nascent OpenPOWER
254 ecosystem outside of IBM to implement the VSX PackedSIMD paradigm, it
255 becomes necessary to upgrade SFFS such that it is stand-alone capable. One
256 omission based on the assumption that VSX would always be present is an
257 equivalent to `xvtstdcsp`.
258
259 Similar arguments apply to the GPR-INT move operations, proposed in
260 [[ls006]], with the opportunity taken to add rounding modes present
261 in other ISAs that Power ISA VSX PackedSIMD does not have. Javascript
262 rounding, one of the worst offenders of Computer Science, requires a
263 phenomental 35 instructions with *six branches* to emulate in Power
264 ISA! For desktop as well as Server HTML/JS back-end execution of
265 javascript this becomes an obvious priority, recognised already by ARM
266 as just one example.
267
268 ## Bitmanip LUT2/3
269
270 These LUT2/3 operations are high cost high reward. Outlined in
271 [[sv/bitmanip]], the simplest ones already exist in PackedSIMD VSX:
272 `xxeval`. The same reasoning applies as to fclass: SFFS needs to be
273 stand-alone on its own merits and should an implementor
274 choose not to implement any aspect of PackedSIMD VSX the performance
275 of their product should not be penalised for making that decision.
276
277 With Predication being such a high priority in GPUs and HPC, CR Field
278 variants of Ternary and Binary LUT instructions were considered high
279 priority, and again just like in the CRweird group the opportunity was
280 taken to work on *all* bits of a CR Field rather than just one bit as
281 is done with the existing CR operations crand, cror etc.
282
283 The other high strategic value instruction is `grevlut` (and `grevluti`
284 which can generate a remarkably large number of regular-patterned magic
285 constants). The grevlut set require of the order of 20,000 gates but
286 provide an astonishing plethora of innovative bit-permuting instructions
287 never seen in any other ISA.
288
289 The downside of all of these instructions is the extremely low XO bit
290 requirements: 2-3 bit XO due to the large immediates *and* the number of
291 operands required. The LUT3 instructions are already compacted down to
292 "Overwrite" variants. (By contrast the Float-Load-Immediate instructions
293 are a much larger XO because despite having 16-bit immediate only one
294 Register Operand is needed).
295
296 Realistically these high-value instructions should be proposed in EXT2xx
297 where their XO cost does not overwhelm EXT0xx.
298
299
300 ## (f)mv.swizzle
301
302 [[sv/mv.swizzle]] is dicey. It is a 2-in 2-out operation whose value
303 as a Scalar instruction is limited *except* if combined with `cmpi` and
304 SVP64Single Predication, whereupon the end result is the RISC-synthesis
305 of Compare-and-Swap, in two instructions.
306
307 Where this instruction comes into its full value is when Vectorised.
308 3D GPU and HPC numerical workloads astonishingly contain between 10 to 15%
309 swizzle operations: access to YYZ, XY, of an XYZW Quaternion, performing
310 balancing of ARGB pixel data. The usage is so high that 3D GPU ISAs make
311 Swizzle a first-class priority in their VLIW words. Even 64-bit Embedded
312 GPU ISAs have a staggering 24-bits dedicated to 2-operand Swizzle.
313
314 So as not to radicalise the Power ISA the Libre-SOC team decided to
315 introduce mv Swizzle operations, which can always be Macro-op fused
316 in exactly the same way that ARM SVE predicated-move extends 3-operand
317 "overwrite" opcodes to full independent 3-in 1-out.
318
319 ## BMI (bitmanipulation) group.
320
321 Whilst the [[sv/vector_ops]] instructions are only two in number, in
322 reality the `bmask` instruction has a Mode field allowing it to cover
323 **24** instructions, more than have been added to any other CPUs by
324 ARM, Intel or AMD. Analyis of the BMI sets of these CPUs shows simple
325 patterns that can greatly simplify both Decode and implementation. These
326 are sufficiently commonly used, saving instruction count regularly,
327 that they justify going into EXT0xx.
328
329 The other instruction is `cprop` - Carry-Propagation - which takes
330 the P and Q from carry-propagation algorithms and generates carry
331 look-ahead. Greatly increases the efficiency of arbitrary-precision
332 integer arithmetic by combining what would otherwise be half a dozen
333 instructions into one. However it is still not a huge priority unlike
334 `bmask` so is probably best placed in EXT2xx.
335
336 ## Float-Load-Immediate
337
338 Very easily justified. As explained in [[ls002]] these always saves one
339 LD L1/2/3 D-Cache memory-lookup operation, by virtue of the Immediate
340 FP value being in the I-Cache side. It is such a high priority that
341 these instuctions are easily justifiable adding into EXT0xx, despite
342 requiring a 16-bit immediate. By designing the second-half instruction
343 as a Read-Modify-Write it saves on XO bitlength (only 5 bits), and can be
344 macro-op fused with its first-half to store a full IEEE754 FP32 immediate
345 into a register.
346
347 There is little point in putting these instructions into EXT2xx. Their
348 very benefit and inherent value *is* as 32-bit instructions, not 64-bit
349 ones. Likewise there is less value in taking up EXT1xx Enoding space
350 because EXT1xx only brings an additional 16 bits (approx) to the table,
351 and that is provided already by the second-half instuction.
352
353 Thus they qualify as both high priority and also EXT0xx candidates.
354
355 ## FPR/GPR LD/ST-PostIncrement-Update
356
357 These instruction, outlined in [[ls011]], save hugely in hot-loops.
358 Early ISAs such as PDP-8, PDP-11, which inspired the iconic Motorola
359 68000, 88100, Mitch Alsup's MyISA 66000, and can even be traced back to
360 the iconic ultra-RISC CDC 6600, all had both pre- and post- increment
361 Addressing Modes.
362
363 The reason is very simple: it is a direct recognition of the practice
364 in c to frequently utilise both `*p++` and `*++p` which itself stems
365 from common need in Computer Science algorithms.
366
367 The problem for the Power ISA is - was - that the opcode space needed
368 to support both was far too great, and the decision was made to go with
369 pre-increment, on the basis that outside the loop a "pre-subtraction"
370 may be performed.
371
372 Whilst this is a "solution" it is less than ideal, and the opportunity
373 exists now with the EXT2xx Primary Opcodes to correct this and bring
374 Power ISA up a level.
375
376 ## Shift-and-add
377
378 Shift-and-Add are proposed in [[ls004]]. They mitigate the need to add
379 LD-ST-Shift instructions which are a high-priority aspect of both x86
380 and ARM. LD-ST-Shift is normally just the one instruction: Shift-and-add
381 brings that down to two, where Power ISA presently requires three.
382 Cryptography e.g. twofish also makes use of Integer double-and-add,
383 so the value of these instructions is not limited to Effective Address
384 computation. They will also have value in Audio DSP.
385
386 Being a 10-bit XO it would be somewhat punitive to place these in EXT2xx
387 when their whole purpose and value is to reduce binary size in Address
388 offset computation, thus they are best placed in EXT0xx.
389
390 \newpage{}
391
392 # Guidance for evaluation
393
394 Deciding which instructions go into an ISA is extremely complex, costly, and a huge
395 responsibility. In public standards mistakes are irrevocable, and in the case of an ISA
396 the Opcode Allocation is a finite resource, meaning that mistakes punish future instructions
397 as well. This section therefore provides some Evaluation Guidance on the decision process.
398
399 **Does anyone want it?**
400
401 Sounds like an obvious question but if there is no driving need (no "Stakeholder")
402 then why is the instruction being proposed? If it is purely out of curiosity or
403 part of a Research effort not intended for production then it's probably best left in the
404 EXT022 Sandbox.
405
406 **How many registers does it need?**
407
408 The basic RISC Paradigm is not only to make instruction encoding simple (often
409 "wasting" encoding space compared to highly-compacted ISAs such as x86), but
410 also to keep the number of registers used down to a minimum.
411
412 Counter-examples are FMAC which had to be added to IEEE754 because the
413 *internal* product requires more accuracy than can fit into a register.
414 Another would be a dotproduct instruction, which again requires an accumulator
415 of at least double the width of the two vector inputs. And in the AMDGPU
416 ISA, there are Texture-mapping instructions taking up to an astounding
417 *twelve* input operands!
418
419 The downside of going too far however has to be a trade-off with the next
420 question. Both MIPS and RISC-V lack Condition Codes, which means that emulating
421 x86 Branch-Conditional requires *ten* MIPS instructions.
422
423 The downside of creating too complex instructions is that the Dependency Hazard
424 Management in high-performance multi-issue out-of-order microarchitectures
425 becomes infeasibly large, and even simple in-order systems may have performance
426 severely compromised by an overabundance of stalls. Also worth remembering
427 is that register file ports are insanely costly, not just to design but also
428 use considerable power.
429
430 That said there do exist genuine reasons why more registers is better than less:
431 Compare-and-Swap has huge benefits but is costly to implement, and DCT/FFT Twin-Butterfly
432 instructions allow creation of in-place in-register algorithms reducing the number
433 of registers needed and thus saving power due to making the *overall* algorithm
434 more efficient, as opposed to micro-focussing on a localised power increase.
435
436 **Can other existing instructions (plural) do the same job**
437
438 The general
439 rule being: if two or more instructions can do the same job, leave it out...
440 *unless* the number of occurrences of that instruction being missing is causing
441 huge increases in binary size. RISC-V has gone too far in this regard,
442 as explained here: <https://news.ycombinator.com/item?id=24459314>
443
444 Good examples are LD-ST-Indexed-shifted (multiply RB by 2, 4 8 or 16)
445 which are high-priority instructions in x86 and ARM, but lacking in
446 Power ISA, MIPS, and RISC-V. With many critical hot-loops in Computer
447 Science having to perform shift and add as explicit instructions, adding
448 LD/ST-shifted should be considered high priority, except that the sheer
449 *number* of such instructions needing to be added takes us into the next
450 question
451
452 **How costly is the encoding?**
453
454
455
456 # Tables
457
458 The original tables are available publicly as as CSV file at
459 <https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/rfc/ls012/optable.csv;hb=HEAD>.
460 A python program auto-generates the tables in the following sections
461 by sorting into different useful priorities.
462
463 The key to headings and sections are as follows:
464
465 * **Area** - Target Area as described in above sections
466 * **XO Cost** - the number of bits required in the XO Field. whilst not
467 the full picture it is a good indicator as to how costly in terms
468 of Opcode Allocation a given instruction will be. Lower number is
469 a higher cost for the Power ISA's precious remaining Opcode space.
470 "PO" indicates that an entire Primary Opcode is required.
471 * **rfc** the Libre-SOC External RFC resource,
472 <https://libre-soc.org/openpower/sv/rfc/> where advance notice of
473 upcoming RFCs in development may be found.
474 *Reading advance Draft RFCs and providing feedback strongly advised*,
475 it saves time and effort for the OPF ISA Workgroup.
476 * **SVP64** - Vectoriseable (SVP64-Prefixable) - also implies that
477 SVP64Single is also permitted (required).
478 * **page** - Libre-SOC wiki page at which further information can
479 be found. Again: **advance reading strongly advised due to the
480 sheer volume of information**.
481 * **PO1** - the instruction is capable of being PO1-Prefixed
482 (given an EXT1xx Opcode Allocation). Bear in mind that this option
483 is **mutually exclusively incompatible** with Vectorisation.
484 * **group** - the Primary Opcode Group recommended for this instruction.
485 Options are EXT0xx (EXT000-EXT063), EXT1xx and EXT2xx. A third area
486 (UnVectoriseable),
487 EXT3xx, was available in an early Draft RFC but has been made "RESERVED"
488 instead. see [[sv/po9_encoding]].
489 * **regs** - a guide to register usage, to how costly Hazard Management
490 will be, in hardware:
491 - 1R: reads one GPR/FPR/SPR/CR.
492 - 1W: writes one GPR/FPR/SPR/CR.
493 - 1r: reads one CR *Field* (not necessarily the entire CR)
494 - 1w: writes one CR *Field* (not necessarily the entire CR)
495
496 [[!inline pages="openpower/sv/rfc/ls012/areas.mdwn" raw=yes ]]
497 [[!inline pages="openpower/sv/rfc/ls012/xo_cost.mdwn" raw=yes ]]
498
499 [[!tag opf_rfc]]