This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x
ClosedPublic

Authored by paquette on Aug 6 2020, 11:57 AM.

Details

Summary

If we have a mask, and a value x, where (x & mask) == x, we can drop the AND and just use x.

This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.

In AArch64, this is most useful post-legalization. Patterns like this often show up when legalizing s1s, which must be extended to larger types.

e.g.

%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1

Since G_ICMP only produces a single bit, there's no reason to mask it with the G_AND.

Diff Detail

Event Timeline

paquette created this revision.Aug 6 2020, 11:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2020, 11:57 AM
paquette requested review of this revision.Aug 6 2020, 11:57 AM

Where is this handled in DAGCombiner? I don't see anything directly matching this; is this one of the cases that SimplifyDemandedBits handles more generally? The closest looking thing after skimming is this:

// if (and x, c) is known to be zero, return 0
unsigned BitWidth = VT.getScalarSizeInBits();
if (N1C && DAG.MaskedValueIsZero(SDValue(N, 0),
                                 APInt::getAllOnesValue(BitWidth)))
  return DAG.getConstant(0, SDLoc(N), VT);

I didn't port this directly from the DAGCombiner, I originally wrote it just to handle the G_ICMP + G_AND pattern, which appears fairly often in AArch64.

But then I noticed the DAGCombiner does this:

// fold (and x, -1) -> x

and decided to try to generalize it.

arsenm added inline comments.Aug 6 2020, 12:49 PM
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
1873

I would kind of expect combines to be marked with requiring this, and the generated matcher table would understand to skip this if known bits isn't available

1881

I don't see why vectors wouldn't work

paquette added inline comments.
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
1873

Yeah, that would make sense.

@dsanders, is there any way to do this in the combiner right now?

1881

Don't think there's any way to really test it right now because it looks like GISelKnownBits doesn't support vectors yet.

Also yeah, it looks like this is handled in SelectionDAG::GetDemandedBits:

case ISD::AND: {
  // X & -1 -> X (ignoring bits which aren't demanded).
  // Also handle the case where masked out bits in X are known to be zero.
  if (ConstantSDNode *RHSC = isConstOrConstSplat(V.getOperand(1))) {
    const APInt &AndVal = RHSC->getAPIntValue();
    if (DemandedBits.isSubsetOf(AndVal) ||
        DemandedBits.isSubsetOf(computeKnownBits(V.getOperand(0)).Zero |
                                AndVal))
      return V.getOperand(0);
  }

So I guess that it might make sense to partially move this into GISelKnownBits.

RKSimon added a subscriber: RKSimon.Aug 7 2020, 2:05 AM

Also yeah, it looks like this is handled in SelectionDAG::GetDemandedBits:

case ISD::AND: {
  // X & -1 -> X (ignoring bits which aren't demanded).
  // Also handle the case where masked out bits in X are known to be zero.
  if (ConstantSDNode *RHSC = isConstOrConstSplat(V.getOperand(1))) {
    const APInt &AndVal = RHSC->getAPIntValue();
    if (DemandedBits.isSubsetOf(AndVal) ||
        DemandedBits.isSubsetOf(computeKnownBits(V.getOperand(0)).Zero |
                                AndVal))
      return V.getOperand(0);
  }

So I guess that it might make sense to partially move this into GISelKnownBits.

Just a headsup - I'm trying to get rid of SelectionDAG::GetDemandedBits entirely and move all uses to SimplifyMultipleUseDemandedBits instead - its almost there, but GetDemandedBits is rather lax at OneUse checks when generating new ops/constants so the last few cases are taking a while (e.g. D77804)

aemerson accepted this revision.Aug 10 2020, 10:25 PM

LGTM. Improvements to the combiner can be done separately.

This revision is now accepted and ready to land.Aug 10 2020, 10:25 PM