(no commit message)
[libreriscv.git] / openpower / sv / cookbook / pospopcnt.mdwn
1 # Positional popcount SVP64
2
3 * <https://bugs.libre-soc.org/show_bug.cgi?id=672>
4 * <https://github.com/clausecker/pospop/blob/master/countsse2_amd64.s>
5
6 Positional popcount in optimised assembler is typically done on SIMD ISAs in
7 around 500 lines. Power ISA thanks to `bpermd` can be much more efficient:
8 with SVP64 even more so. The reference implementation showing the concept
9 is below.
10
11 ```
12 // Copyright (c) 2020 Robert Clausecker <fuz@fuz.su>
13 // count8 reference implementation for tests. Do not alter.
14 func count8safe(counts *[8]int, buf []uint8) {
15 for i := range buf {
16 for j := 0; j < 8; j++ {
17 counts[j] += int(buf[i] >> j & 1)
18 }
19 }
20 }
21 ```
22
23 A simple but still hardware-paralleliseable SVP64 assembler for
24 8-bit input values (`count8safe`) is as follows:
25
26 ```
27 mtspr 9, 3" # move r3 to CTR
28 # VL = MIN(CTR,MAXVL=8), Rc=1 (CR0 set if CTR ends)
29 setvl 3,0,8,0,1,1" # set MVL=8, VL=CTR and CR0 (Rc=1)
30 # load VL bytes (update r4 addr) but compressed (dw=8)
31 addi 6, 0, 0 # initialise all 64-bits of r6 to zero
32 sv.lbzu/pi/dw=8 *6, 1(4) # should be /lf here as well
33 # gather performs the transpose (which gets us to positional..)
34 gbbd 8,6
35 # now those bits have been turned around, popcount and sum them
36 setvl 0,0,8,0,1,1 # set MVL=VL=8
37 sv.popcntd/sw=8 *24,*8 # do the (now transposed) popcount
38 sv.add *16,*16,*24 # and accumulate in results
39 # branch back if CTR still non-zero. works even though VL=8
40 sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve
41 ```
42
43 Array popcount is just standard popcount function ([[!wikipedia Hamming weight]]) on an array of values whereas positional popcount adds up the totals of each bit set to 1 in each bit-position, of an array of input values.
44
45 <img src="/openpower/sv/cookbook/popcount.svg " alt="pospopcnt" width="70%" />
46
47 [[!tag svp64_cookbook ]]