This is an archive of the discontinued LLVM Phabricator instance.

Demanded elements support for masked.load and masked.gather
ClosedPublic

Authored by reames on Jan 28 2019, 10:12 PM.

Details

Summary

Teach instcombine to propagate demanded elements through a masked load or masked gather instruction.

This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140.

Diff Detail

Repository
rL LLVM

Event Timeline

reames created this revision.Jan 28 2019, 10:12 PM
reames planned changes to this revision.Feb 1 2019, 11:21 AM

rebase after rL352653 ?

craig.topper added inline comments.Feb 1 2019, 2:39 PM
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1449 ↗(On Diff #184849)

I think DemandedLoad should maybe be called DemandedPtrVec.

reames updated this revision to Diff 184857.Feb 1 2019, 3:03 PM

Address review comment re: naming.

I chose DemandedPtrs instead of DemandedPtrVec for consistency w/other variables in the same file.

RKSimon added inline comments.Feb 2 2019, 3:30 AM
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1455 ↗(On Diff #184857)

We're not 100% consistent with this but for selects we try to use:

if (CElt->isNullValue())
  DemandedPtrs.clearBit(i);
else
  DemandedPassThrough.clearBit(i);
reames marked an inline comment as done.Feb 2 2019, 11:21 PM
reames added inline comments.
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1455 ↗(On Diff #184857)

This would be wrong. Consider any constant expr or other constant of indeterminate value.

RKSimon added inline comments.Feb 3 2019, 4:50 AM
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1455 ↗(On Diff #184857)

@spatel may be able to advise, but Instruction::Select does this:

Constant *CElt = CV->getAggregateElement(i);
if (isa<ConstantExpr>(CElt))
  continue;
// TODO: If a select condition element is undef, we can demand from
// either side. If one side is known undef, choosing that side would
// propagate undef.
if (CElt->isNullValue())
  DemandedLHS.clearBit(i);
else
  DemandedRHS.clearBit(i);
spatel added inline comments.Feb 3 2019, 5:26 AM
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1455 ↗(On Diff #184857)

I haven't been following the details of this review, but that code specifically guards against ConstantExpr. Is there some other constant-derived possibility that we failed to account for?

The code here looks safe since it is explicitly checking for constant 0 or -1, but it doesn't account for the possibility of an undef element in the mask.

I'm not sure what we want the result to be in that case. Either way, we probably want a regression test to account for that (and a test with a constant expression, so we're sure we don't crash on that?).

reames planned changes to this revision.Feb 3 2019, 10:57 AM
reames marked an inline comment as done.
reames added inline comments.
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1455 ↗(On Diff #184857)

I'm happy to add a test case for both constexpr, and undef. I specifically *do not* want to be aggressive about undef at this time though. That opens up a bunch of complexity since we need to be *consistent* about how we interpret an undef mask element, and I'd really rather separate that into it's own patch.

p.s. That TODO comment in Select looks wrong. We already interpret undef as falling into the else. I'm not sure that's correct, but that's a separate patch and a separate discussion.

p.p.s. In a future patch, abstracting out a helper function for this masked demanded bit splitting operation is planned.

reames updated this revision to Diff 190869.Mar 15 2019, 12:13 PM

Rebase on top of requested test change.

spatel accepted this revision.Mar 17 2019, 9:11 AM

LGTM

test/Transforms/InstCombine/masked_intrinsics.ll
38–45 ↗(On Diff #190869)

This constant expression is too easy. :)
As we can see from the optimized check lines, we fold the ConstantExpr into a regular constant, so it's not providing the hoped-for code coverage.
I'd change this to something like:

@g = external global i8

define <2 x double> @load_cemask(<2 x double>* %ptr, <2 x double> %passthru)  {
  %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 2, <2 x i1> <i1 1, i1 ptrtoint (i8* @g to i1)>, <2 x double> %passthru)
  ret <2 x double> %res
}
This revision is now accepted and ready to land.Mar 17 2019, 9:11 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2019, 1:08 PM

Be aware that this patch turned out to contain a nasty miscompile. It's now been fixed in https://reviews.llvm.org/rL358299. I recommend anyone who has branched with this change to consider cherry-picking the fix.