whitespace
[libreriscv.git] / openpower / sv / cookbook / pospopcnt.mdwn
1 # Positional popcount SVP64
2
3 * <https://bugs.libre-soc.org/show_bug.cgi?id=672>
4 * <https://github.com/clausecker/pospop/blob/master/countsse2_amd64.s>
5 * RISC-V Bitmanip Extension Document Version 0.94-draft Editor: Claire Wolf Symbiotic GmbH
6 <https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-draft.pdf>
7
8 Positional popcount in optimised assembler is typically done on SIMD ISAs in
9 around 500 lines. Power ISA thanks to `bpermd` can be much more efficient:
10 with SVP64 even more so. The reference implementation showing the concept
11 is below.
12
13 ```
14 // Copyright (c) 2020 Robert Clausecker <fuz@fuz.su>
15 // count8 reference implementation for tests. Do not alter.
16 func count8safe(counts *[8]int, buf []uint8) {
17 for i := range buf {
18 for j := 0; j < 8; j++ {
19 counts[j] += int(buf[i] >> j & 1)
20 }
21 }
22 }
23 ```
24
25 A simple but still hardware-paralleliseable SVP64 assembler for
26 8-bit input values (`count8safe`) is as follows:
27
28 ```
29 mtspr 9, 3" # move r3 to CTR
30 # VL = MIN(CTR,MAXVL=8), Rc=1 (CR0 set if CTR ends)
31 setvl 3,0,8,0,1,1" # set MVL=8, VL=MIN(MVL,CTR)
32 # load VL bytes (update r4 addr) but compressed (dw=8)
33 addi 6, 0, 0 # initialise all 64-bits of r6 to zero
34 sv.lbzu/pi/dw=8 *6, 1(4) # should be /lf here as well
35 # gather performs the transpose (which gets us to positional..)
36 gbbd 8,6
37 # now those bits have been turned around, popcount and sum them
38 setvl 0,0,8,0,1,1 # set MVL=VL=8
39 sv.popcntd/sw=8 *24,*8 # do the (now transposed) popcount
40 sv.add *16,*16,*24 # and accumulate in results
41 # branch back if CTR still non-zero. works even though VL=8
42 sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve
43 ```
44
45 Array popcount is just standard popcount function
46 ([[!wikipedia Hamming weight]]) on an array of values, horizontally,
47 however positional popcount is different (vertical)
48
49 <img src="/openpower/sv/cookbook/1_popcount.svg" alt="pospopcnt" width="70%" />
50
51 Positional popcount adds up the totals of each bit set to 1 in each
52 bit-position, of an array of input values.
53
54 <img src="/openpower/sv/cookbook/2_popcount.svg" alt="pospopcnt" width="70%" />
55
56 # Visual representation of the pospopcount algorithm
57
58 # Walkthrough of the assembler
59
60 ```
61 mtspr 9, 3" # move r3 to CTR
62 ```
63
64 ```
65 # VL = MIN(CTR,MAXVL=8), Rc=1 (CR0 set if CTR ends)
66 setvl 3,0,8,0,1,1" # set MVL=8, VL=MIN(MVL,CTR)
67 ```
68
69 ```
70 # load VL bytes (update r4 addr) but compressed (dw=8)
71 addi 6, 0, 0 # initialise all 64-bits of r6 to zero
72 sv.lbzu/pi/dw=8 *6, 1(4) # should be /lf here as well
73 ```
74
75 ```
76 # gather performs the transpose (which gets us to positional..)
77 gbbd 8,6
78 ```
79
80 ```
81 # now those bits have been turned around, popcount and sum them
82 setvl 0,0,8,0,1,1 # set MVL=VL=8
83 sv.popcntd/sw=8 *24,*8 # do the (now transposed) popcount
84 ```
85
86 ```
87 sv.add *16,*16,*24 # and accumulate in results
88 ```
89
90 ```
91 # branch back if CTR still non-zero. works even though VL=8
92 sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve
93 ```
94
95 [[!tag svp64_cookbook ]]