This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics
ClosedPublic

Authored by benmxwl-arm on Nov 21 2022, 5:52 AM.

Details

Summary

This allows DemandedBits to see that the SVE count intrinsics (CNTB,
CNTH, CNTW, CNTD) sans multiplier will only ever produce small
positive integers. The maximum value you could get here is 256, which
is CNTB on a machine with a 2048bit vector size (the maximum for SVE).

Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.

Diff Detail

Event Timeline

benmxwl-arm created this revision.Nov 21 2022, 5:52 AM
benmxwl-arm requested review of this revision.Nov 21 2022, 5:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 21 2022, 5:52 AM
Matt added a subscriber: Matt.Nov 21 2022, 3:11 PM

This may be better as two separate patches, one for the vscale and one for the cnt's

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1140

Could this just use construct known bits of both sides and use KnownBits::mul? It might be able to get value out of the low bits then too.

1147

The Negative = needn't be part of the if.

benmxwl-arm added inline comments.Nov 22 2022, 3:56 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1140

Please correct me if I've made a silly mistake here, but it does not seem to be exactly KnownBits::mul:

Optional<unsigned> MaxVScale = Attr.getVScaleRangeMax();
if (!MaxVScale.has_value())
  return false;
if (auto *MulImm = dyn_cast<ConstantSDNode>(Op.getOperand(0))) {
  unsigned RequiredBits = Log2_64(*MaxVScale) + 1;
  if (RequiredBits >= BitWidth)
    return false;
  Known.Zero.setHighBits(BitWidth - RequiredBits);
  Known = KnownBits::mul(Known, KnownBits::makeConstant(MulImm->getAPIntValue()));
}
return false;

If the MaxVScale is 16 (5 bits)

The known zero bits are everything above Log2(16 * Mul) + 1.
The above snippet seems to end up with Log2((2^5 - 1) * Mul) + 1 (which is off by 1 bit)

Also it seems that KnowBits::mul can't handle negative multipliers and always reports no known bits in that case.

This patch now only handles teaching DemandedBits about the SVE count intrinsics.
The VSCALE changes have been split out to: https://reviews.llvm.org/D138508

benmxwl-arm retitled this revision from [TargetLowering][AArch64] Teach DemandedBits about VSCALE and SVE CNTx to [TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics.Nov 22 2022, 9:31 AM
benmxwl-arm edited the summary of this revision. (Show Details)
benmxwl-arm marked an inline comment as done.Nov 22 2022, 9:34 AM
dmgreen added inline comments.Nov 23 2022, 1:36 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1140

Ah - I was worried about that - because it works on bits and not ranges, the results are not as exact as they could otherwise be. I was also considering the vscale turning into a shift, but that may be SVE specific.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
23320

Can you move this into the switch under INTRINSIC_WO_CHAIN. Just so it doesn't need to be checked for every instruction.

23326

Is there a reason why this is this based on SVEMaxBitsPerVector and not the maximum value in VScaleRange? Or a combo of both if vscale_range is unbounded for some reason.

23327

Nit: Log2_32?

benmxwl-arm retitled this revision from [TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics to [TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics.
benmxwl-arm edited the summary of this revision. (Show Details)
  • Moved CNTx intrinsic check into switch
  • Now base the MaxElements off VScaleRange if present, otherwise fallback to SVEMaxBitsPerVector
  • Switched to Log2_32
benmxwl-arm marked 3 inline comments as done.Nov 23 2022, 4:57 AM
benmxwl-arm added inline comments.Nov 23 2022, 4:59 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
23326

No real reason, it just was a little simpler. But it does not add much extra complexity doing the combo check.

Simplified to use Subtarget->getMaxSVEVectorSizeInBits() (which is already based off the vscale_range() attribute).

This now depends on https://reviews.llvm.org/D138575 (currently getMaxSVEVectorSizeInBits() only checks for hasSVE(),
which means it breaks for SME). Until that lands there will be a few test breakages.

dmgreen accepted this revision.Nov 24 2022, 2:07 AM

Sounds good. This LGTM if no one else has any other comments.

This revision is now accepted and ready to land.Nov 24 2022, 2:07 AM
This revision was landed with ongoing or failed builds.Nov 25 2022, 2:18 AM
This revision was automatically updated to reflect the committed changes.