remove redundant symbols
[libreriscv.git] / openpower / sv / cookbook / pospopcnt.mdwn
1 # Positional popcount SVP64
2
3 * <https://bugs.libre-soc.org/show_bug.cgi?id=672>
4 * <https://github.com/clausecker/pospop/blob/master/countsse2_amd64.s>
5 * RISC-V Bitmanip Extension Document Version 0.94-draft Editor: Claire Wolf Symbiotic GmbH
6 <https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-draft.pdf>
7
8 Positional popcount in optimised assembler is typically done on SIMD ISAs in
9 around 500 lines. Power ISA thanks to `bpermd` can be much more efficient:
10 with SVP64 even more so. The reference implementation showing the concept
11 is below.
12
13 ```
14 // Copyright (c) 2020 Robert Clausecker <fuz@fuz.su>
15 // count8 reference implementation for tests. Do not alter.
16 func count8safe(counts *[8]int, buf []uint8) {
17 for i := range buf {
18 for j := 0; j < 8; j++ {
19 counts[j] += int(buf[i] >> j & 1)
20 }
21 }
22 }
23 ```
24
25 A simple but still hardware-paralleliseable SVP64 assembler for
26 8-bit input values (`count8safe`) is as follows:
27
28 ```
29 mtspr 9, 3 # move r3 to CTR
30 # VL = MIN(CTR,MAXVL=8), Rc=1 (CR0 set if CTR ends)
31 setvl 3,0,8,0,1,1 # set MVL=8, VL=MIN(MVL,CTR)
32 # load VL bytes (update r4 addr) but compressed (dw=8)
33 addi 6, 0, 0 # initialise all 64-bits of r6 to zero
34 sv.lbzu/pi/dw=8 *6, 1(4) # should be /lf here as well
35 # gather performs the transpose (which gets us to positional..)
36 gbbd 8,6
37 # now those bits have been turned around, popcount and sum them
38 setvl 0,0,8,0,1,1 # set MVL=VL=8
39 sv.popcntd/sw=8 *24,*8 # do the (now transposed) popcount
40 sv.add *16,*16,*24 # and accumulate in results
41 # branch back if CTR still non-zero. works even though VL=8
42 sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve
43 ```
44
45 Array popcount is just standard popcount function
46 ([[!wikipedia Hamming weight]]) on an array of values, horizontally,
47 however positional popcount is different (vertical)
48
49 <img src="/openpower/sv/cookbook/1_popcount.svg" alt="pospopcnt" width="70%" />
50
51 Positional popcount adds up the totals of each bit set to 1 in each
52 bit-position, of an array of input values.
53
54 <img src="/openpower/sv/cookbook/2_popcount.svg" alt="pospopcnt" width="70%" />
55
56 # Visual representation of the pospopcount algorithm
57
58 # Walkthrough of the assembler
59
60 Firstly the CTR (Counter) SPR is set up, and is key to looping
61 as outlined further, below
62
63 ```
64 mtspr 9, 3 # move r3 to CTR
65 ```
66
67 The Vector Length, which is limited to 8 (MVL - Maximum
68 Vector Length) is set up. A special "CTR" Mode is used
69 which automatically uses the CTR SPR rather than register
70 RA. (*Note that RA is set to zero to indicate this, because there is
71 limited encoding space. See [[openpower/sv/setvl]] instruction
72 specification for details)*.
73
74 The result of this instruction is that if CTR is greater than
75 8, VL is set to 8. If however CTR is less than or equal to 8,
76 then VL is set to CTR. Additionally, a copy of VL is placed
77 into RT (r3 in this case), which again is necessary as part
78 of the limited encoding space but in some cases (not here)
79 this is desirable, and avoids a `mfspr` instruction to take
80 a copy of VL into a GPR.
81
82 ```
83 # VL = MIN(CTR,MAXVL=8)
84 setvl 3,0,8,0,1,1 # set MVL=8, VL=MIN(MVL,CTR)
85 ```
86
87 ```
88 # load VL bytes (update r4 addr) but compressed (dw=8)
89 addi 6, 0, 0 # initialise all 64-bits of r6 to zero
90 sv.lbzu/pi/dw=8 *6, 1(4) # should be /lf here as well
91 ```
92
93 ```
94 # gather performs the transpose (which gets us to positional..)
95 gbbd 8,6
96 ```
97
98 ```
99 # now those bits have been turned around, popcount and sum them
100 setvl 0,0,8,0,1,1 # set MVL=VL=8
101 sv.popcntd/sw=8 *24,*8 # do the (now transposed) popcount
102 ```
103
104 ```
105 sv.add *16,*16,*24 # and accumulate in results
106 ```
107
108 ```
109 # branch back if CTR still non-zero. works even though VL=8
110 sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve
111 ```
112
113 [[!tag svp64_cookbook ]]