This is an archive of the discontinued LLVM Phabricator instance.

[x86] use demanded bits to simplify masked store codegen
ClosedPublic

Authored by spatel on Oct 6 2018, 7:30 AM.

Details

Summary

As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store.

We can replace a hard-coded PCMPGT simplification with the more general demanded bits call here to improve things. The AVX1 pattern still isn't handled, so that's another potential dependency for the instcombine patch (although I'm not sure how much masked op usage we prefer with only AVX1).

Diff Detail

Event Timeline

spatel created this revision.Oct 6 2018, 7:30 AM
RKSimon added inline comments.Oct 6 2018, 7:38 AM
test/CodeGen/X86/masked_memop.ll
1282

We're going to have to add SimplifyDemandedBitsForTargetNode to handle this properly - @craig.topper didn't you have a patch that was going to add that at some point?

craig.topper added inline comments.Oct 6 2018, 3:23 PM
lib/Target/X86/X86ISelLowering.cpp
36512–36513

Can you drop the 2 spaces at the start of this blank line.

test/CodeGen/X86/masked_memop.ll
1282

It was in https://reviews.llvm.org/D38832, but I found a simpler approach for that specific case.

1283

Why might be able to get this without target support if we stop splitting v4i32<-v4i64 sign extends during DAG combine on AVX1 targets. We already handle the split in LowerSIGN_EXTEND so we shouldn't need to split in combine. The splitting creates a sequence we can't run SimplifyDemandedBits through because we ended up with 2 uses of the v4i32 input.

RKSimon added inline comments.Oct 7 2018, 6:15 AM
test/CodeGen/X86/masked_memop.ll
1283

Looking at this now - the SEXT isn't a problem, but ZEXT needs to be handled as well and is proving trickier.

spatel updated this revision to Diff 168588.Oct 7 2018, 6:52 AM

Patch updated:
Fixed whitespace diff.

RKSimon added inline comments.Oct 8 2018, 4:40 AM
lib/Target/X86/X86ISelLowering.cpp
36527

No need for the Depth = 0 at the end

spatel updated this revision to Diff 168653.Oct 8 2018, 6:47 AM

Patch updated:
Removed unnecessary depth param (copy-paste remnant).

Also, I confirmed that the combination of this patch + D52980 will remove the extra AVX1 shift from masked_store_bool_mask_demand_trunc_sext().

Also, I confirmed that the combination of this patch + D52980 will remove the extra AVX1 shift from masked_store_bool_mask_demand_trunc_sext().

D52980 has now landed, but I had to reduce it slightly so we need to confirm if the AVX1 shift is still removed.

RKSimon accepted this revision.Oct 9 2018, 5:27 AM

LGTM - the AVX1 regression needs SIGN_EXTEND_VECTOR_INREG support adding to SimplifyDemandedBits which I intend to do as a follow up patch

This revision is now accepted and ready to land.Oct 9 2018, 5:27 AM
spatel added a comment.Oct 9 2018, 6:38 AM

LGTM - the AVX1 regression needs SIGN_EXTEND_VECTOR_INREG support adding to SimplifyDemandedBits which I intend to do as a follow up patch

For reference, that was committed here:
rL344043

...so I'll update the test file and commit soon.

This revision was automatically updated to reflect the committed changes.