rs6000: Support variable insert and Expand vec_insert in expander [PR79251]
authorXionghu Luo <luoxhu@linux.ibm.com>
Fri, 22 Jan 2021 03:01:24 +0000 (21:01 -0600)
committerXionghu Luo <luoxhu@linux.ibm.com>
Fri, 22 Jan 2021 14:03:53 +0000 (08:03 -0600)
commitb29225597584b697762585e0b707b7cb4b427650
tree60bcab945b7232d2cc66303b5b8c17304e288087
parentb46027c6544d3680b3647d3c771c9844b8b95772
rs6000: Support variable insert and Expand vec_insert in expander [PR79251]

vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0.  Current expander
generates stxv+stwx+lxv if arg2 is variable instead of constant, which
causes serious store hit load performance issue on Power.  This patch tries
 1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n&3] = i to
unify the gimple code, then expander could use vec_set_optab to expand.
 2) Expand the IFN VEC_SET to fast instructions: lvsr+insert+lvsl.
In this way, "vec_insert (i, v, n)" and "v[n&3] = i" won't be expanded too
early in gimple stage if arg2 is variable, avoid generating store hit load
instructions.

For Power9 V4SI:
addi 9,1,-16
rldic 6,6,2,60
stxv 34,-16(1)
stwx 5,9,6
lxv 34,-16(1)
=>
rlwinm 6,6,2,28,29
mtvsrwz 0,5
lvsr 1,0,6
lvsl 0,0,6
xxperm 34,34,33
xxinsertw 34,0,12
xxperm 34,34,32

Though instructions increase from 5 to 7, the performance is improved
60% in typical cases.
Tested with V2DI, V2DF V4SI, V4SF, V8HI, V16QI on Power9-LE.

2021-01-22  Xionghu Luo  <luoxhu@linux.ibm.com>

gcc/ChangeLog:

PR target/79251
PR target/98065

* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Ajdust variable index vec_insert from address dereference to
ARRAY_REF(VIEW_CONVERT_EXPR) tree expression.
* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var):
New declaration.
* config/rs6000/rs6000.c (rs6000_expand_vector_set_var): New function.

2021-01-22  Xionghu Luo  <luoxhu@linux.ibm.com>

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr79251.p9.c: New test.
* gcc.target/powerpc/pr79251-run.c: New test.
* gcc.target/powerpc/pr79251.h: New header.
gcc/config/rs6000/rs6000-c.c
gcc/config/rs6000/rs6000-protos.h
gcc/config/rs6000/rs6000.c
gcc/testsuite/gcc.target/powerpc/pr79251-run.c [new file with mode: 0644]
gcc/testsuite/gcc.target/powerpc/pr79251.h [new file with mode: 0644]
gcc/testsuite/gcc.target/powerpc/pr79251.p9.c [new file with mode: 0644]