This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT
ClosedPublic

Authored by RKSimon on Jul 12 2018, 11:41 AM.

Details

Summary

If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jul 12 2018, 11:41 AM

Makes sense.

test/CodeGen/AArch64/aarch64-be-bv.ll
33 ↗(On Diff #155238)

This is nice, but it's destroying the intent of the test, which is to check that we generate the correct movi instruction.

test/CodeGen/ARM/func-argpassing-endian.ll
3 ↗(On Diff #155238)

Regenerating this file LGTM; please commit separately.

RKSimon added inline comments.Jul 12 2018, 1:49 PM
test/CodeGen/AArch64/aarch64-be-bv.ll
33 ↗(On Diff #155238)

Am I missing something - why the extractelement - why not return the <8 x i16> add result directly?

efriedma added inline comments.Jul 12 2018, 3:17 PM
test/CodeGen/AArch64/aarch64-be-bv.ll
33 ↗(On Diff #155238)

Returning the result "directly" involves a bitcast, which is also likely to break in the future (this is big-endian, so it swaps the elements.)

Maybe store the result to memory instead.

RKSimon updated this revision to Diff 155334.Jul 13 2018, 2:42 AM

Rebased after the arm/aarch64 tests updates

You need to get reviewers for the test changes to AMDGPU and SystemZ. Otherwise LGTM.

test/CodeGen/X86/oddshuffles.ll
366 ↗(On Diff #155334)

It looks like the total instructions is increasing here? Maybe an issue with x86 shuffle lowering?

@arsenm @uweigand Any commments?

test/CodeGen/X86/oddshuffles.ll
366 ↗(On Diff #155334)

Pre-combine we've decreased the the number of shuffles, meaning that it now falls below the threshold for permitting domain swaps to use shufps - on older SSE2 machines we're better off avoiding the domain swap.

Hmm ... The SystemZ tests seem to be getting strictly worse. Before, we have in f3:

vaf     %v0, %v24, %v26
vlgvh   %r0, %v0, 6
vlgvh   %r2, %v28, 3
ar      %r2, %r0

and after the patch you're testing for:

vaf %v0, %v24, %v26
vrepf %v0, %v0, 3
vlgvh %r0, %v0, 2
vlgvh %r2, %v28, 3
ar %r2, %r0

(And similar for f4.)

Given that the point of this test to ensure that there is no superfluous vrep, this seems a clear regression. Can you check what's going on here?

RKSimon updated this revision to Diff 155685.Jul 16 2018, 8:12 AM
RKSimon edited the summary of this revision. (Show Details)
RKSimon added a reviewer: atanasyan.
RKSimon added a subscriber: atanasyan.

Updated with SystemZ fix to permit permute decode of target shuffles (well, SPLAT) as well - @uweigand does that look OK to you?

I've generalized DAGCombiner::visitEXTRACT_VECTOR_ELT to handle the case where the source vector has multiple uses, if all of them are EXTRACT_VECTOR_ELT we now accumulate the demanded mask accordingly - this simplifies some MIPS vector codegen so adding @atanasyan to take a look.

The SystemZ changes look good to me. Thanks for taking care of this!

MIPS changes LGTM

This revision was not accepted when it landed; it landed in state Needs Review.Jul 17 2018, 2:50 AM
This revision was automatically updated to reflect the committed changes.