4728fe2f0a716e2147e6d39431caef65661142c6
[libreriscv.git] / openpower / sv / branches.mdwn
1 [[!tag standards]]
2 # SVP64 Branch Conditional behaviour
3
4 **DRAFT STATUS**
5
6 Please note: SVP64 Branch instructions should be
7 considered completely separate and distinct from
8 standard scalar OpenPOWER-approved v3.0B branches.
9 **v3.0B branches are in no way impacted, altered,
10 changed or modified in any way, shape or form by
11 the SVP64 Vectorised Variants**.
12
13 Links
14
15 * <https://bugs.libre-soc.org/show_bug.cgi?id=664>
16 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003416.html>
17 * [[openpower/isa/branch]]
18
19 # Rationale
20
21 Scalar 3.0B Branch Conditional operations, `bc`, `bctar` etc. test a
22 Condition Register. However for parallel processing it is simply impossible
23 to perform multiple independent branches: the Program Counter simply
24 cannot branch to multiple destinations based on multiple conditions.
25 The best that can be done is
26 to test multiple Conditions and make a decision of a *single* branch,
27 based on analysis of a *Vector* of CR Fields
28 which have just been calculated from a *Vector* of results.
29
30 In 3D Shader
31 binaries, which are inherently parallelised and predicated, testing all or
32 some results and branching based on multiple tests is extremely common,
33 and a fundamental part of Shader Compilers. Example:
34 without such multi-condition
35 test-and-branch, if a predicate mask is all zeros a large batch of
36 instructions may be masked out to `nop`, and it would waste
37 CPU cycles to run them. 3D GPU ISAs can test for this scenario
38 and, with the appropriate predicate-analysis instruction,
39 jump over fully-masked-out operations, by spotting that
40 *all* Conditions are false.
41
42 Unless Branches are aware and capable of such analysis, additional
43 instructions would be required which perform Horizontal Cumulative
44 analysis of Vectorised Condition Register Fields, in order to
45 reduce the Vector of CR Fields down to one single yes or no
46 decision that a Scalar-only v3.0B Branch-Conditional could cope with.
47 Such instructions would be unavoidable, required, and costly
48 by comparison to a single Vector-aware Branch.
49 Therefore, in order to be commercially competitive, `sv.bc` and
50 other Vector-aware Branch Conditional instructions are a high priority
51 for 3D GPU workloads.
52
53 Given that Power ISA v3.0B is already quite powerful, particularly
54 the Condition Registers and their interaction with Branches, there
55 are opportunities to create extremely flexible and compact
56 Vectorised Branch behaviour. In addition, the side-effects (updating
57 of CTR, truncation of VL, described below) make it a useful instruction
58 even if the branch points to the next instruction (no actual branch).
59
60 # Overview
61
62 When considering an "array" of branch-tests, there are four useful modes:
63 AND, OR, NAND and NOR of all Conditions.
64 NAND and NOR may be synthesised from AND and OR by
65 inverting `BO[1]` which just leaves two modes:
66
67 * Branch takes place on the **first** CR Field test to succeed
68 (a Great Big OR of all condition tests)
69 * Branch takes place only if **all** CR field tests succeed:
70 a Great Big AND of all condition tests
71
72 Early-exit is enacted such that the Vectorised Branch does not
73 perform needless extra tests, which will help reduce reads on
74 the Condition Register file.
75
76 *Note: Early-exit is **MANDATORY** (required) behaviour.
77 Branches **MUST** exit at the first failure point, for
78 exactly the same reasons for which it is mandatory in
79 programming languages doing early-exit: to avoid
80 damaging side-effects. Speculative testing of Condition
81 Register Fields is permitted, as is speculative updating
82 of CTR, as long as, as usual in any Out-of-Order microarchitecture,
83 that speculative testing is cancelled should an early-exit occur.*
84
85 Additional useful behaviour involves two primary Modes (both of
86 which may be enabled and combined):
87
88 * **VLSET Mode**: identical to Data-Dependent Fail-First Mode
89 for Arithmetic SVP64 operations, with more
90 flexibility and a close interaction and integration into the
91 underlying base Scalar v3.0B Branch instruction.
92 Truncation of VL takes place around the early-exit point.
93 * **CTR-test Mode**: gives much more flexibility over when and why
94 CTR is decremented, including options to decrement if a Condition
95 test succeeds *or if it fails*.
96
97 With these side-effects, basic Boolean Logic Analysis advises that
98 it is important to provide a means
99 to enact them each based on whether testing succeeds *or fails*. This
100 results in a not-insignificant number of additional Mode Augmentation bits,
101 accompanying VLSET and CTR-test Modes respectively.
102
103 Predicate skipping or zeroing may, as usual with SVP64, be controlled
104 by `sz`.
105 Where the predicate is masked out and
106 zeroing is enabled, then in such circumstances
107 the same Boolean Logic Analysis dictates that
108 rather than testing only against zero, the option to test
109 against one is also prudent. This introduces a new
110 immediate field, `SNZ`, which works in conjunction with
111 `sz`.
112
113
114 Vectorised Branches can be used
115 in either SVP64 Horizontal-First or Vertical-First Mode. Essentially,
116 at an element level, the behaviour is identical in both Modes,
117 although the `ALL` bit is meaningless in Vertical-First Mode.
118
119 It is also important
120 to bear in mind that, fundamentally, Vectorised Branch-Conditional
121 is still extremely close to the Scalar v3.0B Branch-Conditional
122 instructions, and that the same v3.0B Scalar Branch-Conditional
123 instructions are still
124 *completely separate and independent*, being unaltered and
125 unaffected by their SVP64 variants in every conceivable way.
126
127 *Programming note: One important point is that SVP64 instructions are 64 bit.
128 (8 bytes not 4). This needs to be taken into consideration when computing
129 branch offsets: the offset is relative to the start of the instruction,
130 which **includes** the SVP64 Prefix*
131
132 # Format and fields
133
134 With element-width overrides being meaningless for Condition
135 Register Fields, bits 4 thru 7 of SVP64 RM may be used for additional
136 Mode bits.
137
138 SVP64 RM `MODE` (includes `ELWIDTH` and `ELWIDTH_SRC` bits) for Branch
139 Conditional:
140
141 | 4 | 5 | 6 | 7 | 19 | 20 | 21 | 22 23 | description |
142 | - | - | - | - | -- | -- | --- |---------|----------------- |
143 |ALL|SNZ| / | / | 0 | 0 | / | LRu sz | normal mode |
144 |ALL|SNZ| / |VSb| 0 | 1 | VLI | LRu sz | VLSET mode |
145 |ALL|SNZ|CTi| / | 1 | 0 | / | LRu sz | CTR-test mode |
146 |ALL|SNZ|CTi|VSb| 1 | 1 | VLI | LRu sz | CTR-test+VLSET mode |
147
148 Brief description of fields:
149
150 * **sz=1** if predication is enabled and `sz=1` and a predicate
151 element bit is zero, `SNZ` will
152 be substituted in place of the CR bit selected by `BI`,
153 as the Condition tested.
154 Contrast this with
155 normal SVP64 `sz=1` behaviour, where *only* a zero is put in
156 place of masked-out predicate bits.
157 * **sz=0** When `sz=0` skipping occurs as usual on
158 masked-out elements, but unlike all
159 other SVP64 behaviour which entirely skips an element with
160 no related side-effects at all, there are certain
161 special circumstances where CTR
162 may be decremented. See CTR-test Mode, below.
163 * **ALL** when set, all branch conditional tests must pass in order for
164 the branch to succeed. When clear, it is the first sequentially
165 encountered successful test that causes the branch to succeed.
166 This is identical behaviour to how programming languages perform
167 early-exit on Boolean Logic chains.
168 * **VLI** VLSET is identical to Data-dependent Fail-First mode.
169 In VLSET mode, VL *may* (depending on `VSb`) be truncated.
170 If VLI (Vector Length Inclusive) is clear,
171 VL is truncated to *exclude* the current element, otherwise it is
172 included. SVSTATE.MVL is not altered: only VL.
173 * **LRu**: Link Register Update. When set, Link Register will
174 only be updated if the Branch Condition succeeds. This avoids
175 destruction of LR during loops (particularly Vertical-First
176 ones).
177 * **VSb** In VLSET Mode, after testing,
178 if VSb is set, VL is truncated if the branch succeeds. If VSb is clear,
179 VL is truncated if the branch did **not** take place.
180 * **CTi** CTR inversion. CTR-test Mode normally decrements per element
181 tested. CTR inversion decrements if a test *fails*. Only relevant
182 in CTR-test Mode.
183
184 # Vectorised CR Field numbering, and Scalar behaviour
185
186 It is important to keep in mind that just like all SVP64 instructions,
187 the `BI` field of the base v3.0B Branch Conditional instruction
188 may be extended by SVP64 EXTRA augmentation, as well as be marked
189 as either Scalar or Vector.
190
191 The `BI` field of Branch Conditional operations is five bits, in scalar
192 v3.0B this would select one bit of the 32 bit CR,
193 comprising eight CR Fields of 4 bits each. In SVP64 there are
194 16 32 bit CRs, containing 128 4-bit CR Fields. Therefore, the 2 LSBs of
195 `BI` select the bit from the CR Field (EQ LT GT SO), and the top 3 bits
196 are extended to either scalar or vector and to select CR Fields 0..127
197 as specified in SVP64 [[sv/svp64/appendix]].
198
199 When the CR Fields selected by SVP64-Augmented `BI` is marked as scalar,
200 then as the usual SVP64 rules apply:
201 the Vector loop ends at the first element tested, after taking
202 predication into consideration. Thus, also as usual, when a predicate mask is
203 given, and `BI` marked as scalar, and `sz` is zero, srcstep
204 skips forward to the first non-zero predicated element, and only that
205 one element is tested.
206
207 In other words, the fact that this is a Branch
208 Operation (instead of an arithmetic one) does not result, ultimately,
209 in significant changes as to
210 how SVP64 is fundamentally applied, except with respect to
211 the unique properties associated with conditionally
212 changing the Program
213 Counter (aka "a Branch"), resulting in early-out
214 opportunities and CTR-testing, which are outlined below.
215
216 # Horizontal-First and Vertical-First Modes
217
218 In SVP64 Horizontal-First Mode, the first failure in ALL mode (Great Big
219 AND) results in early exit: no more updates to CTR occur (if requested);
220 no branch occurs, and LR is not updated (if requested). Likewise for
221 non-ALL mode (Great Big Or) on first success early exit also occurs,
222 however this time with the Branch proceeding. In both cases the testing
223 of the Vector of CRs should be done in linear sequential order (or in
224 REMAP re-sequenced order): such that tests that are sequentially beyond
225 the exit point are *not* carried out. (*Note: it is standard practice in
226 Programming languages to exit early from conditional tests, however
227 a little unusual to consider in an ISA that is designed for Parallel
228 Vector Processing. The reason is to have strictly-defined guaranteed
229 behaviour*)
230
231 In Vertical-First Mode, setting the `ALL` bit results in `UNDEFINED`
232 behaviour. Given that only one element is being tested at a time
233 in Vertical-First Mode, a test designed to be done on multiple
234 bits is meaningless.
235
236 # Description and Modes
237
238 Predication in both INT and CR modes may be applied to `sv.bc` and other
239 SVP64 Branch Conditional operations, exactly as they may be applied to
240 other SVP64 operations. When `sz` is zero, any masked-out Branch-element
241 operations are not included in condition testing, exactly like all other
242 SVP64 operations, *including* side-effects such as potentially updating
243 LR or CTR, which will also be skipped. There is *one* exception here,
244 which is when
245 `BO[2]=0, sz=0, CTR-test=0, CTi=1` and the relevant element
246 predicate mask bit is also zero:
247 under these special circumstances CTR will also decrement.
248
249 When `sz` is non-zero, this normally requests insertion of a zero
250 in place of the input data, when the relevant predicate mask bit is zero.
251 This would mean that a zero is inserted in place of `CR[BI+32]` for
252 testing against `BO`, which may not be desirable in all circumstances.
253 Therefore, an extra field is provided `SNZ`, which, if set, will insert
254 a **one** in place of a masked-out element, instead of a zero.
255
256 (*Note: Both options are provided because it is useful to deliberately
257 cause the Branch-Conditional Vector testing to fail at a specific point,
258 controlled by the Predicate mask. This is particularly useful in `VLSET`
259 mode, which will truncate SVSTATE.VL at the point of the first failed
260 test.*)
261
262 Normally, CTR mode will decrement once per Condition Test, resulting
263 under normal circumstances that CTR reduces by up to VL in Horizontal-First
264 Mode. Just as when v3.0B Branch-Conditional saves at
265 least one instruction on tight inner loops through auto-decrementation
266 of CTR, likewise it is also possible to save instruction count for
267 SVP64 loops in both Vertical-First and Horizontal-First Mode, particularly
268 in circumstances where there is conditional interaction between the
269 element computation and testing, and the continuation (or otherwise)
270 of a given loop. The potential combinations of interactions is why CTR
271 testing options have been added.
272
273 Also, the unconditional bit `BO[0]` is still relevant when Predication
274 is applied to the Branch because in `ALL` mode all nonmasked bits have
275 to be tested, and when `sz=0` skipping occurs.
276 Even when VLSET mode is not used, CTR
277 may still be decremented by the total number of nonmasked elements,
278 acting in effect as either a popcount or cntlz depending on which
279 mode bits are set.
280 In short, Vectorised Branch becomes an extremely powerful tool.
281
282 ## CTR-test
283
284 Where a standard Scalar v3.0B branch unconditionally decrements
285 CTR when `BO[2]` is clear, CTR-test Mode introduces more flexibility
286 which allows CTR to be used for many more types of Vector loops
287 constructs.
288
289 CTR-test mode and CTi interaction is as follows: note that
290 `BO[2]` is still required to be clear for CTR decrements to be
291 considered, exactly as is the case in Scalar Power ISA v3.0B
292
293 * **CTR-test=0, CTi=0**: CTR decrements on a per-element basis
294 if `BO[2]` is zero. Masked-out elements when `sz=0` are
295 skipped (i.e. CTR is *not* decremented when the predicate
296 bit is zero and `sz=0`).
297 * **CTR-test=0, CTi=1**: CTR decrements on a per-element basis
298 if `BO[2]` is zero and a masked-out element is skipped
299 (`sz=0` and predicate bit is zero). This one special case is the
300 **opposite** of other combinations, as well as being
301 completely different from normal SVP64 `sz=0` behaviour)
302 * **CTR-test=1, CTi=0**: CTR decrements on a per-element basis
303 if `BO[2]` is zero and the Condition Test succeeds.
304 Masked-out elements when `sz=0` are skipped (including
305 not decrementing CTR)
306 * **CTR-test=1, CTi=1**: CTR decrements on a per-element basis
307 if `BO[2]` is zero and the Condition Test *fails*.
308 Masked-out elements when `sz=0` are skipped (including
309 not decrementing CTR)
310
311 `CTR-test=0, CTi=1, sz=0` requires special emphasis because it is the
312 only time in the entirety of SVP64 that has side-effects when
313 a predicate mask bit is clear. **All** other SVP64 operations
314 entirely skip an element when sz=0 and a predicate mask bit is zero.
315
316 # VLSET Mode
317
318 VLSET Mode truncates the Vector Length so that subsequent instructions
319 operate on a reduced Vector Length. This is similar to
320 Data-dependent Fail-First and LD/ST Fail-First, where for VLSET the
321 truncation occurs at the Branch decision-point.
322
323 Interestingly, due to the side-effects of `VLSET` mode
324 it is actually useful to use Branch Conditional even
325 to perform no actual branch operation, i.e to point to the instruction
326 after the branch. Truncation of VL would thus conditionally occur yet control
327 flow alteration would not.
328
329 `VLSET` mode with Vertical-First is particularly unusual. Vertical-First
330 is designed to be used for explicit looping, where an explicit call to
331 `svstep` is required to move both srcstep and dststep on to
332 the next element, until VL (or other condition) is reached.
333 Vertical-First Looping is expected (required) to terminate if the end
334 of the Vector, VL, is reached. If however that loop is terminated early
335 because VL is truncated, VLSET with Vertical-First becomes meaningless.
336 Resolving this would require two branches: one Conditional, the other
337 branching unconditionally to create the loop, where the Conditional
338 one jumps over it.
339
340 Therefore, with `VSb`, the option to decide whether truncation should occur if the
341 branch succeeds *or* if the branch condition fails allows for the flexibility
342 required. This allows a Vertical-First Branch to *either* be used as
343 a branch-back (loop) *or* as part of a conditional exit or function
344 call from *inside* a loop, and for VLSET to be integrated into both
345 types of decision-making.
346
347 In the case of a Vertical-First branch-back (loop), with `VSb=0` the branch takes
348 place if success conditions are met, but on exit from that loop
349 (branch condition fails), VL will be truncated. This is extremely
350 useful.
351
352 `VLSET` mode with Horizontal-First when `VSb=0` is still
353 useful, because it can be used to truncate VL to the first predicated
354 (non-masked-out) element.
355
356 The truncation point for VL, when VLi is clear, must not include skipped
357 elements that preceded the current element being tested.
358 Example: `sz=0, VLi=0, predicate mask = 0b110010` and the Condition
359 failure point is at element 4.
360
361 * Testing at element 0 is skipped because its predicate bit is zero
362 * Testing at element 1 passed
363 * Testing elements 2 and 3 are skipped because their
364 respective predicate mask bits are zero
365 * Testing element 4 fails therefore VL is truncated to **2**
366 not 4 due to elements 2 and 3 being skipped.
367
368 If `sz=1` in the above example *then* VL would have been set to 4 because
369 in non-zeroing mode the zero'd elements are still effectively part of the
370 Vector (with their respective elements set to `SNZ`)
371
372 If `VLI=1` then VL would be set to 5 regardless of sz, due to being inclusive
373 of the element actually being tested.
374
375 ## VLSET and CTR-test combined
376
377 If both CTR-test and VLSET Modes are requested, it's important to
378 observe the correct order. What occurs depends on whether VLi
379 is enabled, because VLi affects the length, VL.
380
381 If VLi (VL truncate inclusive) is set:
382
383 1. compute the test including whether CTR triggers
384 2. (optionally) decrement CTR
385 3. (optionally) truncate VL (VSb inverts the decision)
386 4. decide (based on step 1) whether to terminate looping
387 (including not executing step 5)
388 5. decide whether to branch.
389
390 If VLi is clear, then when a test fails that element
391 and any following it
392 should **not** be considered part of the Vector. Consequently:
393
394 1. compute the branch test including whether CTR triggers
395 2. if the test fails against VSb, truncate VL to the *previous*
396 element, and terminate looping. No further steps executed.
397 3. (optionally) decrement CTR
398 4. decide whether to branch.
399
400 # Boolean Logic combinations
401
402 There are an extraordinary number of different combinations which
403 provide completely different and useful behaviour.
404 Available options to combine:
405
406 * `BO[0]` to make an unconditional branch would seem irrelevant if
407 it were not for predication and for side-effects (CTR Mode
408 for example)
409 * Enabling CTR-test Mode and setting `BO[2]` can still result in the
410 Branch
411 taking place, not because the Condition Test itself failed, but
412 because CTR reached zero **because**, as required by CTR-test mode,
413 CTR was decremented as a **result** of Condition Tests failing.
414 * `BO[1]` to select whether the CR bit being tested is zero or nonzero
415 * `R30` and `~R30` and other predicate mask options including CR and
416 inverted CR bit testing
417 * `sz` and `SNZ` to insert either zeros or ones in place of masked-out
418 predicate bits
419 * `ALL` or `ANY` behaviour corresponding to `AND` of all tests and
420 `OR` of all tests, respectively.
421 * Predicate Mask bits, which combine in effect with the CR being
422 tested.
423 * Inversion of Predicate Masks (`~r3` instead of `r3`, or using
424 `NE` rather than `EQ`) which results in an additional
425 level of possible ANDing, ORing etc. that would otherwise
426 need explicit instructions.
427
428 The most obviously useful combinations here are to set `BO[1]` to zero
429 in order to turn `ALL` into Great-Big-NAND and `ANY` into
430 Great-Big-NOR. Other Mode bits which perform behavioural inversion then
431 have to work round the fact that the Condition Testing is NOR or NAND.
432 The alternative to not having additional behavioural inversion
433 (`SNZ`, `VSb`, `CTi`) would be to have a second (unconditional)
434 branch directly after the first, which the first branch jumps over.
435 This contrived construct is avoided by the behavioural inversion bits.
436
437 # Pseudocode and examples
438
439 Pseudocode for Horizontal-First Mode:
440
441 ```
442 cond_ok = not SVRMmode.ALL
443 for srcstep in range(VL):
444 # select predicate bit or zero/one
445 if predicate[srcstep]:
446 # get SVP64 extended CR field 0..127
447 SVCRf = SVP64EXTRA(BI>>2)
448 CRbits = CR{SVCRf}
449 testbit = CRbits[BI & 0b11]
450 # testbit = CR[BI+32+srcstep*4]
451 else if not SVRMmode.sz:
452 # inverted CTR test skip mode
453 if ¬BO[2] & CTRtest & ¬CTI then
454 CTR = CTR - 1
455 continue
456 else
457 testbit = SVRMmode.SNZ
458 # actual element test here
459 el_cond_ok <- BO[0] | ¬(testbit ^ BO[1])
460 # merge in the test
461 if SVRMmode.ALL:
462 cond_ok &= el_cond_ok
463 else
464 cond_ok |= el_cond_ok
465 # test for VL to be set (and exit)
466 if VLSET and VSb = el_cond_ok then
467 if SVRMmode.VLI
468 SVSTATE.VL = srcstep+1
469 else
470 SVSTATE.VL = srcstep
471 break
472 # early exit?
473 if SVRMmode.ALL:
474 if ~el_cond_ok:
475 break
476 else
477 if el_cond_ok:
478 break
479 if SVCRf.scalar:
480 break
481 ```
482
483 Pseudocode for Vertical-First Mode:
484
485 ```
486 # get SVP64 extended CR field 0..127
487 SVCRf = SVP64EXTRA(BI>>2)
488 CRbits = CR{SVCRf}
489 # select predicate bit or zero/one
490 if predicate[srcstep]:
491 if BRc = 1 then # CR0 vectorised
492 CR{SVCRf+srcstep} = CRbits
493 testbit = CRbits[BI & 0b11]
494 else if not SVRMmode.sz:
495 # inverted CTR test skip mode
496 if ¬BO[2] & CTRtest & ¬CTI then
497 CTR = CTR - 1
498 SVSTATE.srcstep = new_srcstep
499 exit # no branch testing
500 else
501 testbit = SVRMmode.SNZ
502 # actual element test here
503 cond_ok <- BO[0] | ¬(testbit ^ BO[1])
504 # test for VL to be set (and exit)
505 if VLSET and cond_ok = VSb then
506 if SVRMmode.VLI
507 SVSTATE.VL = new_srcstep+1
508 else
509 SVSTATE.VL = new_srcstep
510 ```
511
512 v3.0B branch pseudocode including LRu and CTR skipping
513
514 ```
515 if (mode_is_64bit) then M <- 0
516 else M <- 32
517 cond_ok <- BO[0] | ¬(CR[BI+32] ^ BO[1])
518 ctrdec = ¬BO[2]
519 if CTRtest & (cond_ok ^ CTi) then
520 ctrdec = 0b0
521 if ctrdec then CTR <- CTR - 1
522 ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
523 lr_ok <- SVRMmode.LRu
524 if ctr_ok & cond_ok then
525 if AA then NIA <-iea EXTS(BD || 0b00)
526 else NIA <-iea CIA + EXTS(BD || 0b00)
527 lr_ok <- 0b1
528 if LK & lr_ok then LR <-iea CIA + 4
529 ```
530
531 # Example Shader code
532
533 ```
534 while(a > 2) {
535 if(b < 5)
536 f();
537 else
538 g();
539 h();
540 }
541 ```
542
543 which compiles to something like:
544
545 ```
546 vec<i32> a, b;
547 // ...
548 pred loop_pred = a > 2;
549 while(loop_pred.any()) {
550 pred if_pred = loop_pred & (b < 5);
551 if(if_pred.any()) {
552 f(if_pred);
553 }
554 label1:
555 pred else_pred = loop_pred & ~if_pred;
556 if(else_pred.any()) {
557 g(else_pred);
558 }
559 h(loop_pred);
560 }
561 ```
562
563 which will end up as:
564
565 ```
566 sv.cmpi CR60.v a.v, 2 # vector compare a into CR60 vector
567 sv.crweird r30, CR60.GT # transfer GT vector to r30
568 while_loop:
569 sv.cmpi CR80.v, b.v, 5 # vector compare b into CR64 Vector
570 sv.bc/m=r30/~ALL/sz CR80.v.LT skip_f # skip when none
571 # only calculate loop_pred & pred_b because needed in f()
572 sv.crand CR80.v.SO, CR60.v.GT, CR80.V.LT # if = loop & pred_b
573 f(CR80.v.SO)
574 skip_f:
575 # illustrate inversion of pred_b. invert r30, test ALL
576 # rather than SOME, but masked-out zero test would FAIL,
577 # therefore masked-out instead is tested against 1 not 0
578 sv.bc/m=~r30/ALL/SNZ CR80.v.LT skip_g
579 # else = loop & ~pred_b, need this because used in g()
580 sv.crternari(A&~B) CR80.v.SO, CR60.v.GT, CR80.V.LT
581 g(CR80.v.SO)
582 skip_g:
583 # conditionally call h(r30) if any loop pred set
584 sv.bclr/m=r30/~ALL/sz BO[1]=1 h()
585 sv.bc/m=r30/~ALL/sz BO[1]=1 while_loop
586 ```