This is an archive of the discontinued LLVM Phabricator instance.

SelectionDAG: Teach ComputeKnownBits about VSCALE
ClosedPublic

Authored by craig.topper on Dec 19 2022, 1:27 PM.

Details

Summary

This reverts commit 9b92f70d4758f75903ce93feaba5098130820d40. The issue
with the re-applied change was an implicit truncation due to the
multiplication. Although the operations were converted to APInt, the
values were implicitly converted to long due to the typing rules.

Fixes: #59594

Diff Detail

Event Timeline

compnerd created this revision.Dec 19 2022, 1:27 PM
compnerd requested review of this revision.Dec 19 2022, 1:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2022, 1:27 PM

Please clean up your RISCV test to match the style (use riscv64 as the triple, use UTC, don't put the triple and datalayout in the IR, clean up all the verbose comments, clean up metadata and attributes)

Revert "Revert "Reland

Avoid double negation and perhaps just use the original subject with the original commit message.

You can attach a paragraph describing what is fixed in the new patch.

compnerd updated this revision to Diff 484071.Dec 19 2022, 2:26 PM
compnerd retitled this revision from Revert "Revert "Reland "[TargetLowering] Teach DemandedBits about VSCALE""" to TargetLowering: Teach DemandedBits about VSCALE.
craig.topper added inline comments.
llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48

Don't you need a vscale_range attribute to hit the bug?

craig.topper added a comment.EditedDec 19 2022, 2:46 PM

Where was there an implicit conversion to long? There is no implicit conversion operator on APInt.

It looks to me like this multiply overflows with vscale max and the APInt got a value of 0 causing the required bits to be 0.

%4 = tail call i8 @llvm.vscale.i8()                                            
%5 = shl i8 %4, 3
jrtc27 added inline comments.Dec 19 2022, 2:48 PM
llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
2
4

Use UTC, really

6–8
9

I tend to prefer tests that don't have mangled C++ names in them

craig.topper added inline comments.Dec 19 2022, 2:51 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗(On Diff #484071)

I'm not sure that hardcoding 64 here is the right fix. I think what we need to check is that the multiply doesn't overflow.

compnerd added inline comments.Dec 19 2022, 3:04 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗(On Diff #484071)

@craig.topper I absolutely agree with you. I call this out in the commit message, that I'm just assuming that this is wide enough. This can potentially overflow still. Is there a good way to ensure that we have the proper width?

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
4

What do you mean by UTC?

9

Sure, I can rename it, don't really think that it matters though.

compnerd added inline comments.Dec 19 2022, 3:06 PM
llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48

I thought so, except, removing the attributes somehow still reproduces the difference? I don't think that the reduction is the best, and I'm happy to restore the attribute if you like.

jrtc27 added inline comments.Dec 19 2022, 3:07 PM
llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
4

UpdateTestChecks, which in this case is update_llc_test_checks.py

craig.topper added inline comments.Dec 19 2022, 3:23 PM
llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48

I thought in my testing that it won't make it to the APInt multiply without the attribute?

MacDue added inline comments.Dec 19 2022, 4:05 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗(On Diff #484071)

My thought was to do:

APInt Multiplier = Op.getConstantOperandAPInt(0);
unsigned MultiplyBits = Log2_32(*MaxVScale) + 1 + Multiplier.getActiveBits();
APInt VScaleResultUpperbound = APInt(MultiplyBits, *MaxVScale) * Multiplier.sextOrTrunc(MultiplyBits);

Which should always have enough bits to store the full result.

craig.topper added inline comments.Dec 19 2022, 4:18 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗(On Diff #484071)

Here's what I was playing with

APInt VScaleResultUpperbound(BitWidth, *MaxVScale);
// TODO: Vscale max with no leading zeros requires special handling.
if (VScaleResultUpperbound.isNegative())
  return false;
bool Overflow;
VScaleResultUpperbound =
    VScaleResultUpperbound.smul_ov(Op.getConstantOperandAPInt(0), Overflow);
if (Overflow)
  return false;
if (VScaleResultUpperbound.isNegative())
  Known.One.setHighBits(VScaleResultUpperbound.countLeadingOnes());
else
  Known.Zero.setHighBits(VScaleResultUpperbound.countLeadingZeros());

BTW, why is this in SimplifyDemandedBits instead of computeKnownBits? It doesn't use the DemandedBits.

compnerd marked 3 inline comments as done.Dec 20 2022, 8:57 AM

BTW, why is this in SimplifyDemandedBits instead of computeKnownBits? It doesn't use the DemandedBits.

I couldn't answer that tbh, perhaps @benmxwl-arm can answer that. I am only trying to help restore the patch that I reverted due to it regressing RVV.

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48

Ugh, I think I mixed up the command line outputs and thought it wasn't needed. It is definitely needed on the test function. Thanks for flagging that!

compnerd updated this revision to Diff 484295.Dec 20 2022, 8:58 AM
compnerd marked an inline comment as done.

Partially address feedback from Craig, address feedback from Jessica.

compnerd added inline comments.Dec 20 2022, 9:57 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗(On Diff #484071)

I didn't know about the smul_ov function, that does seem better. The use of BitWidth is still a problem. The BitWidth is computed from the width of the DemandedBits which is computed from the type of the result of the operation. In this particular test (luckily identifies the issue!) the result type is i8, which results in the BitWidth being 8, but the vrange is 2,1024, which exceeds the range of i8, and thus when we do scale * multiplier and overflow, we end up with the wrong result. I think that we should either go with the wider bit width here.

craig.topper added inline comments.Dec 20 2022, 10:31 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗(On Diff #484071)

You're right. I would need to at least check that the vscale max fits in the bitwidth.

compnerd added inline comments.Dec 20 2022, 11:08 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗(On Diff #484071)

Can we somehow be assured that the multiplication between the vscale max and the scalar will not overflow? It seems that we need a wider width than the vscale max unless I'm not counting bits correctly.

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
4

Done, but this really does seem unnecessarily stringent testing. What we are looking to verify here is that the index counter is updated, the rest of the code around it really doesn't matter.

compnerd updated this revision to Diff 484329.Dec 20 2022, 11:11 AM

Further reduce test case by creating a hand written synthetic test case. We can get away with a 2-iteration unrolled loop avoiding any phi branches. The previous overflow would improperly remove the increment in between the two iterations.

compnerd updated this revision to Diff 484368.Dec 20 2022, 1:21 PM

Further reduce the test case. We create the splat and shuffle, offset it by vlenb and return the generated splat. This reduces down to the minimal assembly sequence that would exhibit the issue. If we were to accidentally truncate the scale, we would drop the read of vlenb and the adjustment for the shuffle.

There is as trySextValue in APInt: https://reviews.llvm.org/D139683. Maybe you need a tryMul.

compnerd added a comment.EditedDec 20 2022, 1:41 PM

There is as trySextValue in APInt: https://reviews.llvm.org/D139683. Maybe you need a tryMul.

From a look, that seems to do what this was doing - extending to 64-bits. It feels that both Craig and I want a tighter bounds on the bit width to compute the scale. But, I can certainly see the value in having a tryMul to mirror the trySextValue.

BTW, why is this in SimplifyDemandedBits instead of computeKnownBits? It doesn't use the DemandedBits.

I couldn't answer that tbh, perhaps @benmxwl-arm can answer that. I am only trying to help restore the patch that I reverted due to it regressing RVV.

Sorry, that's my bad. It's perfectly fine to move this over to computeKnownBits().
Just replace return falses with breaks and TLO.DAG.getMachineFunction() with getMachineFunction() and it works fine moving this case over there.

compnerd retitled this revision from TargetLowering: Teach DemandedBits about VSCALE to SelectionDAG: Teach ComputeKnownBits about VSCALE.Dec 21 2022, 1:45 PM
compnerd updated this revision to Diff 484667.Dec 21 2022, 1:48 PM
compnerd edited the summary of this revision. (Show Details)
foad added inline comments.Dec 22 2022, 12:53 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

This seems pretty complex. Is the Multiplier operand guaranteed to have the same width as the result, i.e. BitWidth? If so you should be able to do all calculation in that width and the signedness of the multiplier should be irrelevant.

compnerd added inline comments.Dec 22 2022, 7:30 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

No AFAIK, the multiplier has no guarantees of the width. It may be 32-bit, or 64-bit, though LLVM did truncate it further.

foad added inline comments.Dec 22 2022, 7:40 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

Then the documentation for ISD::VSCALE should be clear about whether the multiplier is treated as signed or unsigned. I don't see it mentioned in ISDOpcodes.h.

Anyway, if it is signed, can't you first sextOrTrunc it to BitWdith, and then do all of the rest of the calculations in that width?

compnerd added inline comments.Dec 22 2022, 8:01 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

My interpretation is that it is signed. The LangRef calls it out as being a positive number, thus, it must be signed. As to truncating it to BitWidth, again, that doesn't work. We need it to be the computed width as we need to ensure that we have sufficient space for the multiplication to not overflow (which was the original bug) and we need it to match to perform the operation (requirements from APInt). Is there a way to write this in a simpler way that I don't know about?

foad added inline comments.Dec 22 2022, 8:30 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

The LangRef calls it out as being a positive number

No, that's talking about vscale itself. The multiplier is not mentioned in LangRef because it is specific to the ISD::VSCALE sdag node.

Anyway does this work:

Known.Zero.setBitsFrom(Log2_32(*MaxVScale) + 1);
Known = KnownBits::mul(Known, KnownBits::makeConstant(Multiplier.sextOrTrunc(BitWidth)));

?

compnerd added inline comments.Dec 22 2022, 9:22 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051
void llvm::APInt::setBits(unsigned int, unsigned int): Assertion `loBit <= BitWidth && "loBit out of range"' failed.

This is why I had kept the "complex" code as is. I had tried a few different things, I'd really rather prefer the smul_ov rather than the current implementation, but it seemed more complicated than this long-winded way.

craig.topper added inline comments.Dec 22 2022, 9:36 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

The issue here is that we don't that vscale is *MaxVscale. We're trying to calculate an upper bound on the result of the multiply to determine the value of sign bits only.

foad added inline comments.Dec 22 2022, 9:42 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

Fixing the assertion is easy:

Known.Zero.setBitsFrom(std::min(Log2_32(*MaxVScale) + 1, BitWidth));
Known = KnownBits::mul(Known, KnownBits::makeConstant(Multiplier.sextOrTrunc(BitWidth)));
benmxwl-arm added inline comments.Dec 22 2022, 10:02 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

KnownBits::mul() does not perform the same operation (and will under-report the know bits) and does not work for negative multipliers.

(I did try using KnownBits::mul() in an earlier version of this patch)

compnerd added inline comments.Dec 22 2022, 10:39 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

Fixing the assertion is easy:

Truncating the value to BitWidth seems incorrect. But also, as @benmxwl-arm mentions, that isn't exactly the same result.

foad added inline comments.Jan 3 2023, 6:13 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051

KnownBits::mul() does not perform the same operation (and will under-report the know bits)

Ack. I understand now that you want a result based on the actual maximum value of vscale, not just the known bits of vscale. Maybe value range propagation would be a better tool for this, but I guess we don't have that in SelectionDAG.

benmxwl-arm added inline comments.Jan 3 2023, 9:44 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040

I think this actually should be Multiplier.getSignificantBits() (which takes into account the sign).
Other than that this LGTM, though I'd wait for approval from a more experienced reviewer.

compnerd added inline comments.Jan 3 2023, 10:26 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040

Hmm, I think that it _should_ be the same given that Multiplier should never be negative. But I don't see any harm in changing this to Multiplier.getSignificantBits() as it may be less confusing, so seems like a good idea.

craig.topper added inline comments.Jan 3 2023, 10:30 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040

Why can't multiplier be negative?

benmxwl-arm added inline comments.Jan 3 2023, 11:03 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040

I think the multiplier can be negative (see the _with_negative_multiplier tests). Maybe you're thinking about the vscale intrinsic? The intrinsic has no multiplier and always returns a positive value, the DAG node can be the result of merging a vscale intrinsic and a multiply, and I don't think there's any restriction on the multiplier value or sign.

compnerd added inline comments.Jan 3 2023, 11:11 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040

Yes, I was thinking of the vscale intrinsic. Okay, in that case, yes, this should be getSignificantBits.

compnerd updated this revision to Diff 486298.Jan 4 2023, 8:11 AM

Use getSignificantBits

foad added a comment.Jan 6 2023, 1:49 AM

I think the multiplier can be negative (see the _with_negative_multiplier tests).

This needs documenting. Speciflcally, for the ISD::VSCALE node, if the output is wider than the input, does it zero- or sign-extend the input? (Or to put it another way, does it do an unsigned extending multiply or a signed extending multiply?)

I think the multiplier can be negative (see the _with_negative_multiplier tests).

This needs documenting. Speciflcally, for the ISD::VSCALE node, if the output is wider than the input, does it zero- or sign-extend the input? (Or to put it another way, does it do an unsigned extending multiply or a signed extending multiply?)

I'm not sure it's allowed to have a different type. In tablegen it is using SDTIntUnaryOp which requires the input and output types to match.

craig.topper commandeered this revision.May 25 2023, 11:32 AM
craig.topper added a reviewer: compnerd.

I'm going to take this over and finish it.

Use smul_ov instead of trying to size for the worst case.

nikic added a subscriber: nikic.May 26 2023, 12:43 AM
nikic added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3130

Can this whole block be reduced to something like this?

const Function &F = getMachineFunction().getFunction();
const APInt &Multiplier = Op.getConstantOperandAPInt(0);
Known = getVScaleConstantRange(&F, BitWidth).mul(Multiplier).toKnownBits();

Use ConstantRange.

nikic accepted this revision.May 26 2023, 9:55 AM

LGTM

This revision is now accepted and ready to land.May 26 2023, 9:55 AM
This revision was automatically updated to reflect the committed changes.