This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
4/17
SelectionDAG.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
vscale-and-sve-cnt-demandedbits.ll
-
RISCV/
8/12
vscale-demanded-bits.ll

Differential D140347

SelectionDAG: Teach ComputeKnownBits about VSCALE
ClosedPublic

Authored by craig.topper on Dec 19 2022, 1:27 PM.

Download Raw Diff

Details

Reviewers

MacDue
dmgreen
paulwalker-arm
benmxwl-arm
compnerd
nikic

Commits

rGa4f437f012b4: SelectionDAG: Teach ComputeKnownBits about VSCALE

Summary

This reverts commit 9b92f70d4758f75903ce93feaba5098130820d40. The issue
with the re-applied change was an implicit truncation due to the
multiplication. Although the operations were converted to APInt, the
values were implicitly converted to long due to the typing rules.

Fixes: #59594

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

compnerd created this revision.Dec 19 2022, 1:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2022, 1:27 PM

Herald added subscribers: frasercrmck, luismarques, apazos and 19 others. · View Herald Transcript

compnerd requested review of this revision.Dec 19 2022, 1:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2022, 1:27 PM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

Please clean up your RISCV test to match the style (use riscv64 as the triple, use UTC, don't put the triple and datalayout in the IR, clean up all the verbose comments, clean up metadata and attributes)

Revert "Revert "Reland

Avoid double negation and perhaps just use the original subject with the original commit message.

You can attach a paragraph describing what is fixed in the new patch.

compnerd updated this revision to Diff 484071.Dec 19 2022, 2:26 PM

compnerd retitled this revision from Revert "Revert "Reland "[TargetLowering] Teach DemandedBits about VSCALE""" to TargetLowering: Teach DemandedBits about VSCALE.

compnerd added a reviewer: benmxwl-arm.Dec 19 2022, 2:36 PM

craig.topper added a subscriber: craig.topper.Dec 19 2022, 2:38 PM

craig.topper added inline comments.

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48	Don't you need a vscale_range attribute to hit the bug?

Where was there an implicit conversion to long? There is no implicit conversion operator on APInt.

It looks to me like this multiply overflows with vscale max and the APInt got a value of 0 causing the required bits to be 0.

%4 = tail call i8 @llvm.vscale.i8()                                            
%5 = shl i8 %4, 3

jrtc27 added inline comments.Dec 19 2022, 2:48 PM

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
2
4	Use UTC, really
6–8
9	I tend to prefer tests that don't have mangled C++ names in them

craig.topper added inline comments.Dec 19 2022, 2:51 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗	(On Diff #484071)	I'm not sure that hardcoding 64 here is the right fix. I think what we need to check is that the multiply doesn't overflow.

compnerd added inline comments.Dec 19 2022, 3:04 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗	(On Diff #484071)	@craig.topper I absolutely agree with you. I call this out in the commit message, that I'm just assuming that this is wide enough. This can potentially overflow still. Is there a good way to ensure that we have the proper width?
llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
4	What do you mean by UTC?
9	Sure, I can rename it, don't really think that it matters though.

compnerd added inline comments.Dec 19 2022, 3:06 PM

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48	I thought so, except, removing the attributes somehow still reproduces the difference? I don't think that the reduction is the best, and I'm happy to restore the attribute if you like.

jrtc27 added inline comments.Dec 19 2022, 3:07 PM

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
4	UpdateTestChecks, which in this case is update_llc_test_checks.py

craig.topper added inline comments.Dec 19 2022, 3:23 PM

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48	I thought in my testing that it won't make it to the APInt multiply without the attribute?

Harbormaster completed remote builds in B204007: Diff 484071.Dec 19 2022, 3:27 PM

MacDue added inline comments.Dec 19 2022, 4:05 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

1136 ↗

(On Diff #484071)

My thought was to do:

APInt Multiplier = Op.getConstantOperandAPInt(0);
unsigned MultiplyBits = Log2_32(*MaxVScale) + 1 + Multiplier.getActiveBits();
APInt VScaleResultUpperbound = APInt(MultiplyBits, *MaxVScale) * Multiplier.sextOrTrunc(MultiplyBits);

Which should always have enough bits to store the full result.

craig.topper added inline comments.Dec 19 2022, 4:18 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

1136 ↗

(On Diff #484071)

Here's what I was playing with

APInt VScaleResultUpperbound(BitWidth, *MaxVScale);
// TODO: Vscale max with no leading zeros requires special handling.
if (VScaleResultUpperbound.isNegative())
  return false;
bool Overflow;
VScaleResultUpperbound =
    VScaleResultUpperbound.smul_ov(Op.getConstantOperandAPInt(0), Overflow);
if (Overflow)
  return false;
if (VScaleResultUpperbound.isNegative())
  Known.One.setHighBits(VScaleResultUpperbound.countLeadingOnes());
else
  Known.Zero.setHighBits(VScaleResultUpperbound.countLeadingZeros());

BTW, why is this in SimplifyDemandedBits instead of computeKnownBits? It doesn't use the DemandedBits.

In D140347#4006583, @craig.topper wrote:

BTW, why is this in SimplifyDemandedBits instead of computeKnownBits? It doesn't use the DemandedBits.

I couldn't answer that tbh, perhaps @benmxwl-arm can answer that. I am only trying to help restore the patch that I reverted due to it regressing RVV.

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
48	Ugh, I think I mixed up the command line outputs and thought it wasn't needed. It is definitely needed on the test function. Thanks for flagging that!

Partially address feedback from Craig, address feedback from Jessica.

compnerd added inline comments.Dec 20 2022, 9:57 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗	(On Diff #484071)	I didn't know about the `smul_ov` function, that does seem better. The use of `BitWidth` is still a problem. The `BitWidth` is computed from the width of the `DemandedBits` which is computed from the type of the result of the operation. In this particular test (luckily identifies the issue!) the result type is `i8`, which results in the `BitWidth` being 8, but the vrange is 2,1024, which exceeds the range of i8, and thus when we do scale * multiplier and overflow, we end up with the wrong result. I think that we should either go with the wider bit width here.

craig.topper added inline comments.Dec 20 2022, 10:31 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗	(On Diff #484071)	You're right. I would need to at least check that the vscale max fits in the bitwidth.

Harbormaster completed remote builds in B204180: Diff 484295.Dec 20 2022, 10:44 AM

compnerd added inline comments.Dec 20 2022, 11:08 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1136 ↗	(On Diff #484071)	Can we somehow be assured that the multiplication between the vscale max and the scalar will not overflow? It seems that we need a wider width than the vscale max unless I'm not counting bits correctly.
llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll
4	Done, but this really does seem unnecessarily stringent testing. What we are looking to verify here is that the index counter is updated, the rest of the code around it really doesn't matter.

Further reduce test case by creating a hand written synthetic test case. We can get away with a 2-iteration unrolled loop avoiding any phi branches. The previous overflow would improperly remove the increment in between the two iterations.

Further reduce the test case. We create the splat and shuffle, offset it by vlenb and return the generated splat. This reduces down to the minimal assembly sequence that would exhibit the issue. If we were to accidentally truncate the scale, we would drop the read of vlenb and the adjustment for the shuffle.

There is as trySextValue in APInt: https://reviews.llvm.org/D139683. Maybe you need a tryMul.

In D140347#4009008, @tschuett wrote:

There is as trySextValue in APInt: https://reviews.llvm.org/D139683. Maybe you need a tryMul.

From a look, that seems to do what this was doing - extending to 64-bits. It feels that both Craig and I want a tighter bounds on the bit width to compute the scale. But, I can certainly see the value in having a tryMul to mirror the trySextValue.

Harbormaster completed remote builds in B204230: Diff 484368.Dec 20 2022, 3:32 PM

In D140347#4008253, @compnerd wrote:

In D140347#4006583, @craig.topper wrote:

BTW, why is this in SimplifyDemandedBits instead of computeKnownBits? It doesn't use the DemandedBits.

I couldn't answer that tbh, perhaps @benmxwl-arm can answer that. I am only trying to help restore the patch that I reverted due to it regressing RVV.

Sorry, that's my bad. It's perfectly fine to move this over to computeKnownBits().
Just replace return falses with breaks and TLO.DAG.getMachineFunction() with getMachineFunction() and it works fine moving this case over there.

compnerd retitled this revision from TargetLowering: Teach DemandedBits about VSCALE to SelectionDAG: Teach ComputeKnownBits about VSCALE.Dec 21 2022, 1:45 PM

Herald added a subscriber: foad. · View Herald TranscriptDec 21 2022, 1:45 PM

compnerd updated this revision to Diff 484667.Dec 21 2022, 1:48 PM

compnerd edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B204448: Diff 484667.Dec 21 2022, 2:45 PM

foad added inline comments.Dec 22 2022, 12:53 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	This seems pretty complex. Is the Multiplier operand guaranteed to have the same width as the result, i.e. BitWidth? If so you should be able to do all calculation in that width and the signedness of the multiplier should be irrelevant.

compnerd added inline comments.Dec 22 2022, 7:30 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	No AFAIK, the multiplier has no guarantees of the width. It may be 32-bit, or 64-bit, though LLVM did truncate it further.

foad added inline comments.Dec 22 2022, 7:40 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	Then the documentation for ISD::VSCALE should be clear about whether the multiplier is treated as signed or unsigned. I don't see it mentioned in ISDOpcodes.h. Anyway, if it is signed, can't you first sextOrTrunc it to BitWdith, and then do all of the rest of the calculations in that width?

compnerd added inline comments.Dec 22 2022, 8:01 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	My interpretation is that it is signed. The LangRef calls it out as being a positive number, thus, it must be signed. As to truncating it to BitWidth, again, that doesn't work. We need it to be the computed width as we need to ensure that we have sufficient space for the multiplication to not overflow (which was the original bug) and we need it to match to perform the operation (requirements from APInt). Is there a way to write this in a simpler way that I don't know about?

foad added inline comments.Dec 22 2022, 8:30 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	The LangRef calls it out as being a positive number No, that's talking about vscale itself. The multiplier is not mentioned in LangRef because it is specific to the ISD::VSCALE sdag node. Anyway does this work: Known.Zero.setBitsFrom(Log2_32(*MaxVScale) + 1); Known = KnownBits::mul(Known, KnownBits::makeConstant(Multiplier.sextOrTrunc(BitWidth))); ?

compnerd added inline comments.Dec 22 2022, 9:22 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	void llvm::APInt::setBits(unsigned int, unsigned int): Assertion `loBit <= BitWidth && "loBit out of range"' failed. This is why I had kept the "complex" code as is. I had tried a few different things, I'd really rather prefer the `smul_ov` rather than the current implementation, but it seemed more complicated than this long-winded way.

craig.topper added inline comments.Dec 22 2022, 9:36 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	The issue here is that we don't that vscale is *MaxVscale. We're trying to calculate an upper bound on the result of the multiply to determine the value of sign bits only.

foad added inline comments.Dec 22 2022, 9:42 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	Fixing the assertion is easy: Known.Zero.setBitsFrom(std::min(Log2_32(*MaxVScale) + 1, BitWidth)); Known = KnownBits::mul(Known, KnownBits::makeConstant(Multiplier.sextOrTrunc(BitWidth)));

benmxwl-arm added inline comments.Dec 22 2022, 10:02 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	`KnownBits::mul()` does not perform the same operation (and will under-report the know bits) and does not work for negative multipliers. (I did try using `KnownBits::mul()` in an earlier version of this patch)

compnerd added inline comments.Dec 22 2022, 10:39 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	Fixing the assertion is easy: Truncating the value to `BitWidth` seems incorrect. But also, as @benmxwl-arm mentions, that isn't exactly the same result.

foad added inline comments.Jan 3 2023, 6:13 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040–3051	`KnownBits::mul()` does not perform the same operation (and will under-report the know bits) Ack. I understand now that you want a result based on the actual maximum value of vscale, not just the known bits of vscale. Maybe value range propagation would be a better tool for this, but I guess we don't have that in SelectionDAG.

benmxwl-arm added inline comments.Jan 3 2023, 9:44 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040	I think this actually should be `Multiplier.getSignificantBits()` (which takes into account the sign). Other than that this LGTM, though I'd wait for approval from a more experienced reviewer.

compnerd added inline comments.Jan 3 2023, 10:26 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040	Hmm, I think that it _should_ be the same given that `Multiplier` should never be negative. But I don't see any harm in changing this to `Multiplier.getSignificantBits()` as it may be less confusing, so seems like a good idea.

craig.topper added inline comments.Jan 3 2023, 10:30 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040	Why can't multiplier be negative?

benmxwl-arm added inline comments.Jan 3 2023, 11:03 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040	I think the multiplier can be negative (see the `_with_negative_multiplier` tests). Maybe you're thinking about the vscale intrinsic? The intrinsic has no multiplier and always returns a positive value, the DAG node can be the result of merging a vscale intrinsic and a multiply, and I don't think there's any restriction on the multiplier value or sign.

compnerd added inline comments.Jan 3 2023, 11:11 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3040	Yes, I was thinking of the vscale intrinsic. Okay, in that case, yes, this should be `getSignificantBits`.

Use getSignificantBits

Harbormaster completed remote builds in B205709: Diff 486298.Jan 4 2023, 9:48 AM

I think the multiplier can be negative (see the _with_negative_multiplier tests).

This needs documenting. Speciflcally, for the ISD::VSCALE node, if the output is wider than the input, does it zero- or sign-extend the input? (Or to put it another way, does it do an unsigned extending multiply or a signed extending multiply?)

In D140347#4030800, @foad wrote:

I think the multiplier can be negative (see the _with_negative_multiplier tests).

This needs documenting. Speciflcally, for the ISD::VSCALE node, if the output is wider than the input, does it zero- or sign-extend the input? (Or to put it another way, does it do an unsigned extending multiply or a signed extending multiply?)

I'm not sure it's allowed to have a different type. In tablegen it is using SDTIntUnaryOp which requires the input and output types to match.

georges added a subscriber: georges.Jan 11 2023, 8:08 AM

I'm going to take this over and finish it.

Herald added a subscriber: luke. · View Herald TranscriptMay 25 2023, 11:32 AM

Use smul_ov instead of trying to size for the worst case.

Harbormaster completed remote builds in B234690: Diff 525850.May 25 2023, 4:31 PM

nikic added a subscriber: nikic.May 26 2023, 12:43 AM

nikic added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3130	Can this whole block be reduced to something like this? const Function &F = getMachineFunction().getFunction(); const APInt &Multiplier = Op.getConstantOperandAPInt(0); Known = getVScaleConstantRange(&F, BitWidth).mul(Multiplier).toKnownBits();

Use ConstantRange.

LGTM

This revision is now accepted and ready to land.May 26 2023, 9:55 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptMay 26 2023, 9:55 AM

Harbormaster completed remote builds in B234884: Diff 526093.May 26 2023, 10:22 AM

Closed by commit rGa4f437f012b4: SelectionDAG: Teach ComputeKnownBits about VSCALE (authored by craig.topper). · Explain WhyMay 26 2023, 10:48 AM

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGa4f437f012b4: SelectionDAG: Teach ComputeKnownBits about VSCALE.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

6 lines

test/

CodeGen/

AArch64/

vscale-and-sve-cnt-demandedbits.ll

37 lines

RISCV/

vscale-demanded-bits.ll

25 lines

Diff 526123

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,031 Lines • ▼ Show 20 Lines	KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt &DemandedElts,
case ISD::MERGE_VALUES:		case ISD::MERGE_VALUES:
return computeKnownBits(Op.getOperand(Op.getResNo()), DemandedElts,		return computeKnownBits(Op.getOperand(Op.getResNo()), DemandedElts,
Depth + 1);		Depth + 1);
case ISD::SPLAT_VECTOR: {		case ISD::SPLAT_VECTOR: {
SDValue SrcOp = Op.getOperand(0);		SDValue SrcOp = Op.getOperand(0);
assert(SrcOp.getValueSizeInBits() >= BitWidth &&		assert(SrcOp.getValueSizeInBits() >= BitWidth &&
"Expected SPLAT_VECTOR implicit truncation");		"Expected SPLAT_VECTOR implicit truncation");
// Implicitly truncate the bits to match the official semantics of		// Implicitly truncate the bits to match the official semantics of
// SPLAT_VECTOR.		// SPLAT_VECTOR.
		benmxwl-armUnsubmitted Not Done Reply Inline Actions I think this actually should be `Multiplier.getSignificantBits()` (which takes into account the sign). Other than that this LGTM, though I'd wait for approval from a more experienced reviewer. benmxwl-arm: I think this actually should be `Multiplier.getSignificantBits()` (which takes into account the…
		compnerdUnsubmitted Not Done Reply Inline Actions Hmm, I think that it _should_ be the same given that `Multiplier` should never be negative. But I don't see any harm in changing this to `Multiplier.getSignificantBits()` as it may be less confusing, so seems like a good idea. compnerd: Hmm, I think that it _should_ be the same given that `Multiplier` should never be negative.
		craig.topperAuthorUnsubmitted Not Done Reply Inline Actions Why can't multiplier be negative? craig.topper: Why can't multiplier be negative?
		benmxwl-armUnsubmitted Not Done Reply Inline Actions I think the multiplier can be negative (see the `_with_negative_multiplier` tests). Maybe you're thinking about the vscale intrinsic? The intrinsic has no multiplier and always returns a positive value, the DAG node can be the result of merging a vscale intrinsic and a multiply, and I don't think there's any restriction on the multiplier value or sign. benmxwl-arm: I think the multiplier can be negative (see the `_with_negative_multiplier` tests). Maybe…
		compnerdUnsubmitted Not Done Reply Inline Actions Yes, I was thinking of the vscale intrinsic. Okay, in that case, yes, this should be `getSignificantBits`. compnerd: Yes, I was thinking of the vscale intrinsic. Okay, in that case, yes, this should be…
Known = computeKnownBits(SrcOp, Depth + 1).trunc(BitWidth);		Known = computeKnownBits(SrcOp, Depth + 1).trunc(BitWidth);
break;		break;
}		}
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
assert(!Op.getValueType().isScalableVector());		assert(!Op.getValueType().isScalableVector());
// Collect the known bits that are shared by every demanded vector element.		// Collect the known bits that are shared by every demanded vector element.
Known.Zero.setAllBits(); Known.One.setAllBits();		Known.Zero.setAllBits(); Known.One.setAllBits();
for (unsigned i = 0, e = Op.getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = Op.getNumOperands(); i != e; ++i) {
if (!DemandedElts[i])		if (!DemandedElts[i])
continue;		continue;

		foadUnsubmitted Not Done Reply Inline Actions This seems pretty complex. Is the Multiplier operand guaranteed to have the same width as the result, i.e. BitWidth? If so you should be able to do all calculation in that width and the signedness of the multiplier should be irrelevant. foad: This seems pretty complex. Is the Multiplier operand guaranteed to have the same width as the…
		compnerdUnsubmitted Done Reply Inline Actions No AFAIK, the multiplier has no guarantees of the width. It may be 32-bit, or 64-bit, though LLVM did truncate it further. compnerd: No AFAIK, the multiplier has no guarantees of the width. It may be 32-bit, or 64-bit, though…
		foadUnsubmitted Not Done Reply Inline Actions Then the documentation for ISD::VSCALE should be clear about whether the multiplier is treated as signed or unsigned. I don't see it mentioned in ISDOpcodes.h. Anyway, if it is signed, can't you first sextOrTrunc it to BitWdith, and then do all of the rest of the calculations in that width? foad: Then the documentation for ISD::VSCALE should be clear about whether the multiplier is treated…
		compnerdUnsubmitted Done Reply Inline Actions My interpretation is that it is signed. The LangRef calls it out as being a positive number, thus, it must be signed. As to truncating it to BitWidth, again, that doesn't work. We need it to be the computed width as we need to ensure that we have sufficient space for the multiplication to not overflow (which was the original bug) and we need it to match to perform the operation (requirements from APInt). Is there a way to write this in a simpler way that I don't know about? compnerd: My interpretation is that it is signed. The LangRef calls it out as being a positive number…
		foadUnsubmitted Not Done Reply Inline Actions The LangRef calls it out as being a positive number No, that's talking about vscale itself. The multiplier is not mentioned in LangRef because it is specific to the ISD::VSCALE sdag node. Anyway does this work: Known.Zero.setBitsFrom(Log2_32(MaxVScale) + 1); Known = KnownBits::mul(Known, KnownBits::makeConstant(Multiplier.sextOrTrunc(BitWidth))); ? foad:* > The LangRef calls it out as being a positive number No, that's talking about vscale itself.
		compnerdUnsubmitted Done Reply Inline Actions void llvm::APInt::setBits(unsigned int, unsigned int): Assertion `loBit <= BitWidth && "loBit out of range"' failed. This is why I had kept the "complex" code as is. I had tried a few different things, I'd really rather prefer the `smul_ov` rather than the current implementation, but it seemed more complicated than this long-winded way. compnerd: ``` void llvm::APInt::setBits(unsigned int, unsigned int): Assertion `loBit <= BitWidth &&…
		craig.topperAuthorUnsubmitted Not Done Reply Inline Actions The issue here is that we don't that vscale is MaxVscale. We're trying to calculate an upper bound on the result of the multiply to determine the value of sign bits only. craig.topper:* The issue here is that we don't that vscale is *MaxVscale. We're trying to calculate an upper…
		foadUnsubmitted Not Done Reply Inline Actions Fixing the assertion is easy: Known.Zero.setBitsFrom(std::min(Log2_32(MaxVScale) + 1, BitWidth)); Known = KnownBits::mul(Known, KnownBits::makeConstant(Multiplier.sextOrTrunc(BitWidth))); foad:* Fixing the assertion is easy: ``` Known.Zero.setBitsFrom(std::min(Log2_32(*MaxVScale) + 1…
		benmxwl-armUnsubmitted Not Done Reply Inline Actions `KnownBits::mul()` does not perform the same operation (and will under-report the know bits) and does not work for negative multipliers. (I did try using `KnownBits::mul()` in an earlier version of this patch) benmxwl-arm: `KnownBits::mul()` does not perform the same operation (and will under-report the know bits)…
		compnerdUnsubmitted Done Reply Inline Actions Fixing the assertion is easy: Truncating the value to `BitWidth` seems incorrect. But also, as @benmxwl-arm mentions, that isn't exactly the same result. compnerd: > Fixing the assertion is easy: Truncating the value to `BitWidth` seems incorrect. But also…
		foadUnsubmitted Not Done Reply Inline Actions `KnownBits::mul()` does not perform the same operation (and will under-report the know bits) Ack. I understand now that you want a result based on the actual maximum value of vscale, not just the known bits of vscale. Maybe value range propagation would be a better tool for this, but I guess we don't have that in SelectionDAG. foad: > `KnownBits::mul()` does not perform the same operation (and will under-report the know bits)…
SDValue SrcOp = Op.getOperand(i);		SDValue SrcOp = Op.getOperand(i);
Known2 = computeKnownBits(SrcOp, Depth + 1);		Known2 = computeKnownBits(SrcOp, Depth + 1);

// BUILD_VECTOR can implicitly truncate sources, we must handle this.		// BUILD_VECTOR can implicitly truncate sources, we must handle this.
if (SrcOp.getValueSizeInBits() != BitWidth) {		if (SrcOp.getValueSizeInBits() != BitWidth) {
assert(SrcOp.getValueSizeInBits() > BitWidth &&		assert(SrcOp.getValueSizeInBits() > BitWidth &&
"Expected BUILD_VECTOR implicit truncation");		"Expected BUILD_VECTOR implicit truncation");
Known2 = Known2.trunc(BitWidth);		Known2 = Known2.trunc(BitWidth);
Show All 30 Lines	if (Known.isUnknown())
break;		break;
if (!!DemandedRHS) {		if (!!DemandedRHS) {
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);
Known2 = computeKnownBits(RHS, DemandedRHS, Depth + 1);		Known2 = computeKnownBits(RHS, DemandedRHS, Depth + 1);
Known = Known.intersectWith(Known2);		Known = Known.intersectWith(Known2);
}		}
break;		break;
}		}
		case ISD::VSCALE: {
		const Function &F = getMachineFunction().getFunction();
		const APInt &Multiplier = Op.getConstantOperandAPInt(0);
		Known = getVScaleRange(&F, BitWidth).multiply(Multiplier).toKnownBits();
		break;
		}
case ISD::CONCAT_VECTORS: {		case ISD::CONCAT_VECTORS: {
if (Op.getValueType().isScalableVector())		if (Op.getValueType().isScalableVector())
break;		break;
// Split DemandedElts and test each of the demanded subvectors.		// Split DemandedElts and test each of the demanded subvectors.
Known.Zero.setAllBits(); Known.One.setAllBits();		Known.Zero.setAllBits(); Known.One.setAllBits();
EVT SubVectorVT = Op.getOperand(0).getValueType();		EVT SubVectorVT = Op.getOperand(0).getValueType();
unsigned NumSubVectorElts = SubVectorVT.getVectorNumElements();		unsigned NumSubVectorElts = SubVectorVT.getVectorNumElements();
unsigned NumSubVectors = Op.getNumOperands();		unsigned NumSubVectors = Op.getNumOperands();
Show All 10 Lines	for (unsigned i = 0; i != NumSubVectors; ++i) {
break;		break;
}		}
break;		break;
}		}
case ISD::INSERT_SUBVECTOR: {		case ISD::INSERT_SUBVECTOR: {
if (Op.getValueType().isScalableVector())		if (Op.getValueType().isScalableVector())
break;		break;
// Demand any elements from the subvector and the remainder from the src its		// Demand any elements from the subvector and the remainder from the src its
// inserted into.		// inserted into.
		nikicUnsubmitted Not Done Reply Inline Actions Can this whole block be reduced to something like this? const Function &F = getMachineFunction().getFunction(); const APInt &Multiplier = Op.getConstantOperandAPInt(0); Known = getVScaleConstantRange(&F, BitWidth).mul(Multiplier).toKnownBits(); nikic: Can this whole block be reduced to something like this? ``` const Function &F =…
SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);
SDValue Sub = Op.getOperand(1);		SDValue Sub = Op.getOperand(1);
uint64_t Idx = Op.getConstantOperandVal(2);		uint64_t Idx = Op.getConstantOperandVal(2);
unsigned NumSubElts = Sub.getValueType().getVectorNumElements();		unsigned NumSubElts = Sub.getValueType().getVectorNumElements();
APInt DemandedSubElts = DemandedElts.extractBits(NumSubElts, Idx);		APInt DemandedSubElts = DemandedElts.extractBits(NumSubElts, Idx);
APInt DemandedSrcElts = DemandedElts;		APInt DemandedSrcElts = DemandedElts;
DemandedSrcElts.insertBits(APInt::getZero(NumSubElts), Idx);		DemandedSrcElts.insertBits(APInt::getZero(NumSubElts), Idx);

▲ Show 20 Lines • Show All 9,329 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64 -mattr=+sve < %s \| FileCheck %s		; RUN: llc -mtriple=aarch64 -mattr=+sve < %s \| FileCheck %s

; This tests that various ands, sexts, and zexts (and other operations)		; This tests that various ands, sexts, and zexts (and other operations)
; operating on vscale or the SVE count instructions can be eliminated		; operating on vscale or the SVE count instructions can be eliminated
; (via demanded bits) due to their known limited range.		; (via demanded bits) due to their known limited range.

; On AArch64 vscale can be at most 16 (for a 2048-bit vector).		; On AArch64 vscale can be at most 16 (for a 2048-bit vector).
; The counting instructions (sans multiplier) have a value of at most 256		; The counting instructions (sans multiplier) have a value of at most 256
; (for a 2048-bit vector of i8s).		; (for a 2048-bit vector of i8s).

define i32 @vscale_and_elimination() vscale_range(1,16) {		define i32 @vscale_and_elimination() vscale_range(1,16) {
; CHECK-LABEL: vscale_and_elimination:		; CHECK-LABEL: vscale_and_elimination:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: rdvl x8, #1		; CHECK-NEXT: rdvl x8, #1
; CHECK-NEXT: lsr x8, x8, #4		; CHECK-NEXT: lsr x8, x8, #4
; CHECK-NEXT: and w9, w8, #0x1f		; CHECK-NEXT: and w9, w8, #0x1c
; CHECK-NEXT: and w8, w8, #0xfffffffc		; CHECK-NEXT: add w0, w8, w9
; CHECK-NEXT: add w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%vscale = call i32 @llvm.vscale.i32()		%vscale = call i32 @llvm.vscale.i32()
%and_redundant = and i32 %vscale, 31		%and_redundant = and i32 %vscale, 31
%and_required = and i32 %vscale, 17179869180		%and_required = and i32 %vscale, 17179869180
%result = add i32 %and_redundant, %and_required		%result = add i32 %and_redundant, %and_required
ret i32 %result		ret i32 %result
}		}

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%result = add i64 %and_redundant, %and_required		%result = add i64 %and_redundant, %and_required
ret i64 %result		ret i64 %result
}		}

define i64 @vscale_trunc_zext() vscale_range(1,16) {		define i64 @vscale_trunc_zext() vscale_range(1,16) {
; CHECK-LABEL: vscale_trunc_zext:		; CHECK-LABEL: vscale_trunc_zext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: rdvl x8, #1		; CHECK-NEXT: rdvl x8, #1
; CHECK-NEXT: lsr x8, x8, #4		; CHECK-NEXT: lsr x0, x8, #4
; CHECK-NEXT: and x0, x8, #0xffffffff
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%vscale = call i32 @llvm.vscale.i32()		%vscale = call i32 @llvm.vscale.i32()
%zext = zext i32 %vscale to i64		%zext = zext i32 %vscale to i64
ret i64 %zext		ret i64 %zext
}		}

define i64 @vscale_trunc_sext() vscale_range(1,16) {		define i64 @vscale_trunc_sext() vscale_range(1,16) {
; CHECK-LABEL: vscale_trunc_sext:		; CHECK-LABEL: vscale_trunc_sext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: rdvl x8, #1		; CHECK-NEXT: rdvl x8, #1
; CHECK-NEXT: lsr x8, x8, #4		; CHECK-NEXT: lsr x0, x8, #4
; CHECK-NEXT: sxtw x0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%vscale = call i32 @llvm.vscale.i32()		%vscale = call i32 @llvm.vscale.i32()
%sext = sext i32 %vscale to i64		%sext = sext i32 %vscale to i64
ret i64 %sext		ret i64 %sext
}		}

define i64 @count_bytes_trunc_zext() {		define i64 @count_bytes_trunc_zext() {
; CHECK-LABEL: count_bytes_trunc_zext:		; CHECK-LABEL: count_bytes_trunc_zext:
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines

define i32 @vscale_with_multiplier() vscale_range(1,16) {		define i32 @vscale_with_multiplier() vscale_range(1,16) {
; CHECK-LABEL: vscale_with_multiplier:		; CHECK-LABEL: vscale_with_multiplier:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: rdvl x8, #1		; CHECK-NEXT: rdvl x8, #1
; CHECK-NEXT: mov w9, #5		; CHECK-NEXT: mov w9, #5
; CHECK-NEXT: lsr x8, x8, #4		; CHECK-NEXT: lsr x8, x8, #4
; CHECK-NEXT: mul x8, x8, x9		; CHECK-NEXT: mul x8, x8, x9
; CHECK-NEXT: and w9, w8, #0x7f		; CHECK-NEXT: and w9, w8, #0x3f
; CHECK-NEXT: and w8, w8, #0x3f		; CHECK-NEXT: add w0, w8, w9
; CHECK-NEXT: add w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%vscale = call i32 @llvm.vscale.i32()		%vscale = call i32 @llvm.vscale.i32()
%mul = mul i32 %vscale, 5		%mul = mul i32 %vscale, 5
%and_redundant = and i32 %mul, 127		%and_redundant = and i32 %mul, 127
%and_required = and i32 %mul, 63		%and_required = and i32 %mul, 63
%result = add i32 %and_redundant, %and_required		%result = add i32 %and_redundant, %and_required
ret i32 %result		ret i32 %result
}		}

define i32 @vscale_with_negative_multiplier() vscale_range(1,16) {		define i32 @vscale_with_negative_multiplier() vscale_range(1,16) {
; CHECK-LABEL: vscale_with_negative_multiplier:		; CHECK-LABEL: vscale_with_negative_multiplier:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: rdvl x8, #1		; CHECK-NEXT: rdvl x8, #1
; CHECK-NEXT: mov x9, #-5		; CHECK-NEXT: mov x9, #-5
; CHECK-NEXT: lsr x8, x8, #4		; CHECK-NEXT: lsr x8, x8, #4
; CHECK-NEXT: mul x8, x8, x9		; CHECK-NEXT: mul x8, x8, x9
; CHECK-NEXT: orr w9, w8, #0xffffff80		; CHECK-NEXT: and w9, w8, #0xffffffc0
; CHECK-NEXT: and w8, w8, #0xffffffc0		; CHECK-NEXT: add w0, w8, w9
; CHECK-NEXT: add w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%vscale = call i32 @llvm.vscale.i32()		%vscale = call i32 @llvm.vscale.i32()
%mul = mul i32 %vscale, -5		%mul = mul i32 %vscale, -5
%or_redundant = or i32 %mul, 4294967168		%or_redundant = or i32 %mul, 4294967168
%or_required = and i32 %mul, 4294967232		%or_required = and i32 %mul, 4294967232
%result = add i32 %or_redundant, %or_required		%result = add i32 %or_redundant, %or_required
ret i32 %result		ret i32 %result
}		}

		define i32 @pow2_vscale_with_negative_multiplier() vscale_range(1,16) {
		; CHECK-LABEL: pow2_vscale_with_negative_multiplier:
		; CHECK: // %bb.0:
		; CHECK-NEXT: cntd x8
		; CHECK-NEXT: neg x8, x8
		; CHECK-NEXT: orr w9, w8, #0xfffffff0
		; CHECK-NEXT: add w0, w8, w9
		; CHECK-NEXT: ret
		%vscale = call i32 @llvm.vscale.i32()
		%mul = mul i32 %vscale, -2
		%or_redundant = or i32 %mul, 4294967264
		%or_required = or i32 %mul, 4294967280
		%result = add i32 %or_redundant, %or_required
		ret i32 %result
		}

declare i32 @llvm.vscale.i32()		declare i32 @llvm.vscale.i32()
declare i64 @llvm.aarch64.sve.cntb(i32 %pattern)		declare i64 @llvm.aarch64.sve.cntb(i32 %pattern)
declare i64 @llvm.aarch64.sve.cnth(i32 %pattern)		declare i64 @llvm.aarch64.sve.cnth(i32 %pattern)
declare i64 @llvm.aarch64.sve.cntw(i32 %pattern)		declare i64 @llvm.aarch64.sve.cntw(i32 %pattern)
declare i64 @llvm.aarch64.sve.cntd(i32 %pattern)		declare i64 @llvm.aarch64.sve.cntd(i32 %pattern)

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll

This file was added.

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py

; RUN: llc -mtriple riscv64 -mattr +v -filetype asm -o - %s | FileCheck %s

jrtc27Unsubmitted

Not Done

- ; RUN: llc -mtriple riscv64 -mattr +v -filetype asm -o - %s | FileCheck %s

+ ; RUN: llc -mtriple=riscv64 -mattr=+v < %s | FileCheck %s

; CHECK: vse8.v v8, (a5), v0.t

jrtc27:

declare i8 @llvm.vscale.i8()

jrtc27Unsubmitted

Not Done

Use UTC, really

jrtc27: Use UTC, really

compnerdUnsubmitted

Done

What do you mean by UTC?

compnerd: What do you mean by UTC?

jrtc27Unsubmitted

Not Done

UpdateTestChecks, which in this case is update_llc_test_checks.py

jrtc27: UpdateTestChecks, which in this case is update_llc_test_checks.py

compnerdUnsubmitted

Done

Done, but this really does seem unnecessarily stringent testing. What we are looking to verify here is that the index counter is updated, the rest of the code around it really doesn't matter.

compnerd: Done, but this really does seem unnecessarily stringent testing. What we are looking to verify…

declare <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()

define <vscale x 8 x i8> @f() #0 {

; CHECK-LABEL: f:

jrtc27Unsubmitted

Not Done

; CHECK: vadd.vx v8, v8, a3

- target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"

define dso_local void @_Z4FillPhi(ptr nocapture noundef writeonly %buffer, i32 noundef signext %n) local_unnamed_addr {

jrtc27:

; CHECK: # %bb.0: # %entry

jrtc27Unsubmitted

Done

I tend to prefer tests that don't have mangled C++ names in them

jrtc27: I tend to prefer tests that don't have mangled C++ names in them

compnerdUnsubmitted

Done

Sure, I can rename it, don't really think that it matters though.

compnerd: Sure, I can rename it, don't really think that it matters though.

; CHECK-NEXT: csrr a0, vlenb

; CHECK-NEXT: vsetvli a1, zero, e8, m1, ta, ma

; CHECK-NEXT: vid.v v8

; CHECK-NEXT: vadd.vx v8, v8, a0

; CHECK-NEXT: ret

entry:

%0 = tail call i8 @llvm.vscale.i8()

%1 = shl i8 %0, 3

%.splat.insert = insertelement <vscale x 8 x i8> poison, i8 %1, i64 0

%.splat = shufflevector <vscale x 8 x i8> %.splat.insert, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer

%2 = tail call <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()

%3 = add <vscale x 8 x i8> %2, %.splat

ret <vscale x 8 x i8> %3

}

attributes #0 = { vscale_range(2,1024) }

craig.topperAuthorUnsubmitted

Done

Don't you need a vscale_range attribute to hit the bug?

craig.topper: Don't you need a vscale_range attribute to hit the bug?

compnerdUnsubmitted

Done

I thought so, except, removing the attributes somehow still reproduces the difference? I don't think that the reduction is the best, and I'm happy to restore the attribute if you like.

compnerd: I thought so, except, removing the attributes somehow still reproduces the difference? I don't…

craig.topperAuthorUnsubmitted

Done

I thought in my testing that it won't make it to the APInt multiply without the attribute?

craig.topper: I thought in my testing that it won't make it to the APInt multiply without the attribute?

compnerdUnsubmitted

Done

Ugh, I think I mixed up the command line outputs and thought it wasn't needed. It is definitely needed on the test function. Thanks for flagging that!

compnerd: Ugh, I think I mixed up the command line outputs and thought it wasn't needed. It is…

This is an archive of the discontinued LLVM Phabricator instance.

SelectionDAG: Teach ComputeKnownBits about VSCALEClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 526123

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll

llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll

SelectionDAG: Teach ComputeKnownBits about VSCALE
ClosedPublic