Definite win on AVX512 as it allows us to avoid some gpr2mask transfers when a rematerializable 'allones' constant could be used.
I don't think the aarch64 fmov -> dup change is a regression, but would like confirmation if possible.
Paths
| Differential D59422
[SelectionDAG] Use SimplifyDemandedBits on truncated SCALAR_TO_VECTORs AbandonedPublic Authored by RKSimon on Mar 15 2019, 10:22 AM.
Details
Summary Definite win on AVX512 as it allows us to avoid some gpr2mask transfers when a rematerializable 'allones' constant could be used. I don't think the aarch64 fmov -> dup change is a regression, but would like confirmation if possible.
Diff Detail
Event Timeline
Revision Contents
Diff 190847 lib/CodeGen/SelectionDAG/DAGCombiner.cpp
test/CodeGen/AArch64/arm64-build-vector.ll
test/CodeGen/X86/avx512-mask-op.ll
|
This is in fact a regression, at least on some targets; on an A57, it has higher latency and uses an extra execution unit.
I'm guessing there's some issue with the priority between splat vs. zeroing in the case where the high elements are all undef?