[AArch64] PR tree-optimization/90332: Implement vec_init<M><N> where N is a vector...
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>
Thu, 6 Jun 2019 13:59:07 +0000 (13:59 +0000)
committerKyrylo Tkachov <ktkachov@gcc.gnu.org>
Thu, 6 Jun 2019 13:59:07 +0000 (13:59 +0000)
commit41dab855dce20d5d7042c9330dd8124d0ece19c0
treed2a266d73253c9d053aaf156075ef6bbc2accd30
parenta2dbc0bf2aa42f0f078d0d46f7d9cdafc5383d93
[AArch64] PR tree-optimization/90332: Implement vec_init<M><N> where N is a vector mode

This patch fixes the failing gcc.dg/vect/slp-reduc-sad-2.c testcase on aarch64
by implementing a vec_init optab that can handle two half-width vectors producing a full-width one
by concatenating them.

In the gcc.dg/vect/slp-reduc-sad-2.c case it's a V8QI reg concatenated with a V8QI const_vector of zeroes.
This can be implemented efficiently using the aarch64_combinez pattern that just loads a D-register to make
use of the implicit zero-extending semantics of that load.
Otherwise it concatenates the two vector using aarch64_simd_combine.

With this patch I'm seeing the effect from richi's original patch that added gcc.dg/vect/slp-reduc-sad-2.c on aarch64
and 525.x264_r improves by about 1.5%.

PR tree-optimization/90332
* config/aarch64/aarch64.c (aarch64_expand_vector_init):
Handle VALS containing two vectors.
* config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Rename
to...
(@aarch64_combinez<mode>): ... This.
(*aarch64_combinez_be<mode>): Rename to...
(@aarch64_combinez_be<mode>): ... This.
(vec_init<mode><Vhalf>): New define_expand.
* config/aarch64/iterators.md (Vhalf): Handle V8HF.

From-SVN: r272002
gcc/ChangeLog
gcc/config/aarch64/aarch64-simd.md
gcc/config/aarch64/aarch64.c
gcc/config/aarch64/iterators.md