This is an archive of the discontinued LLVM Phabricator instance.

LangRef says "If the result value does not fit in the result type, then the result is a poison value." But does that mean the sign bit is guaranteed to be zero?

We could use the max-vscale attribute to determine if the runtime value fits the type. If this cannot be determined, the optimization would be skipped, otherwise we change the type of llvm.vscale to the extended type.
Would that be a suitable approach?

I assume the max-vscale attribute will usually be present? In that case, checking it seems best.

(We might actually want to consider relaxing the LangRef rule; if we can easily determine whether the result fits into a given bitwidth, there isn't really any reason to make llvm.vscale return poison.)

There's instances where max-vscale won't be known at compile time, but this route should allow for the optimizations wherever possible.

I believe in terms of the LangRef, it's still possible for llvm.vscale to return poision, and it could still happen in cases where max-vscale is unknown until runtime. So personally I don't think any adjustment is required.

In D105994#2881236, @efriedma wrote:

I assume the max-vscale attribute will usually be present? In that case, checking it seems best.

(We might actually want to consider relaxing the LangRef rule; if we can easily determine whether the result fits into a given bitwidth, there isn't really any reason to make llvm.vscale return poison.)

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

In D105994#2890403, @sdesmalen wrote:

In D105994#2881236, @efriedma wrote:

I assume the max-vscale attribute will usually be present? In that case, checking it seems best.

(We might actually want to consider relaxing the LangRef rule; if we can easily determine whether the result fits into a given bitwidth, there isn't really any reason to make llvm.vscale return poison.)

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

My suggestion is that, for example, @llvm.vscale.i8() returns "vscale & 0xFF". Not really sure if it's worthwhile, but it seems like an unnecessary source of poison.

In D105994#2890694, @efriedma wrote:

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

My suggestion is that, for example, @llvm.vscale.i8() returns "vscale & 0xFF". Not really sure if it's worthwhile, but it seems like an unnecessary source of poison.

What problem would that solve?

In D105994#2890907, @sdesmalen wrote:

In D105994#2890694, @efriedma wrote:

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

My suggestion is that, for example, @llvm.vscale.i8() returns "vscale & 0xFF". Not really sure if it's worthwhile, but it seems like an unnecessary source of poison.

What problem would that solve?

It would allow folding "trunc vscale". And it might be a little simpler to understand. Either way, not a big deal. :)

Added check for vscale_range attribute before optimisation
If the attribute isn't present, or if the maximum value exceeds the bitwidth of the original instrinsic, the optimization is skipped

Updated .ll test to test the extra logic

DylanFleming-arm added a parent revision: D106277: [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute.Jul 21 2021, 12:00 PM

efriedma added inline comments.Jul 21 2021, 12:34 PM

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
1371	This shift can overflow. I'd suggest using `Log2_32(MaxVScale)` instead.

Harbormaster completed remote builds in B115378: Diff 360544.Jul 21 2021, 2:28 PM

Changed bitshift to Log2_32

LGTM

This revision is now accepted and ready to land.Jul 26 2021, 9:53 AM

Harbormaster completed remote builds in B116204: Diff 361699.Jul 26 2021, 11:10 AM

This revision was landed with ongoing or failed builds.Jul 30 2021, 8:03 AM

Closed by commit rGa7a39ec886a0: [SVE] Add folds for sign and zero extends of vscale (authored by DylanFleming-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

DylanFleming-arm added a commit: rGa7a39ec886a0: [SVE] Add folds for sign and zero extends of vscale.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

28 lines

test/

Transforms/

InstCombine/

vscale_sext_and_zext.ll

85 lines

Diff 363097

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 1,355 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitZExt(ZExtInst &CI) {
Value *And;		Value *And;
if (SrcI && match(SrcI, m_OneUse(m_Xor(m_Value(And), m_Constant(C)))) &&		if (SrcI && match(SrcI, m_OneUse(m_Xor(m_Value(And), m_Constant(C)))) &&
match(And, m_OneUse(m_And(m_Trunc(m_Value(X)), m_Specific(C)))) &&		match(And, m_OneUse(m_And(m_Trunc(m_Value(X)), m_Specific(C)))) &&
X->getType() == CI.getType()) {		X->getType() == CI.getType()) {
Constant *ZC = ConstantExpr::getZExt(C, CI.getType());		Constant *ZC = ConstantExpr::getZExt(C, CI.getType());
return BinaryOperator::CreateXor(Builder.CreateAnd(X, ZC), ZC);		return BinaryOperator::CreateXor(Builder.CreateAnd(X, ZC), ZC);
}		}

		if (match(Src, m_VScale(DL))) {
		if (CI.getFunction()->hasFnAttribute(Attribute::VScaleRange)) {
		unsigned MaxVScale = CI.getFunction()
		->getFnAttribute(Attribute::VScaleRange)
		.getVScaleRangeArgs()
		.second;
		unsigned TypeWidth = Src->getType()->getScalarSizeInBits();
		if (Log2_32(MaxVScale) < TypeWidth) {
		efriedmaUnsubmitted Not Done Reply Inline Actions This shift can overflow. I'd suggest using `Log2_32(MaxVScale)` instead. efriedma: This shift can overflow. I'd suggest using `Log2_32(MaxVScale)` instead.
		Value *VScale = Builder.CreateVScale(ConstantInt::get(DestTy, 1));
		return replaceInstUsesWith(CI, VScale);
		}
		}
		}

return nullptr;		return nullptr;
}		}

/// Transform (sext icmp) to bitwise / integer operations to eliminate the icmp.		/// Transform (sext icmp) to bitwise / integer operations to eliminate the icmp.
Instruction InstCombinerImpl::transformSExtICmp(ICmpInst ICI,		Instruction InstCombinerImpl::transformSExtICmp(ICmpInst ICI,
Instruction &CI) {		Instruction &CI) {
Value Op0 = ICI->getOperand(0), Op1 = ICI->getOperand(1);		Value Op0 = ICI->getOperand(0), Op1 = ICI->getOperand(1);
ICmpInst::Predicate Pred = ICI->getPredicate();		ICmpInst::Predicate Pred = ICI->getPredicate();
▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	Constant *NewShAmt = ConstantExpr::getSub(
ConstantInt::get(DestTy, DestTy->getScalarSizeInBits()),		ConstantInt::get(DestTy, DestTy->getScalarSizeInBits()),
NumLowbitsLeft);		NumLowbitsLeft);
NewShAmt =		NewShAmt =
Constant::mergeUndefsWith(Constant::mergeUndefsWith(NewShAmt, BA), CA);		Constant::mergeUndefsWith(Constant::mergeUndefsWith(NewShAmt, BA), CA);
A = Builder.CreateShl(A, NewShAmt, CI.getName());		A = Builder.CreateShl(A, NewShAmt, CI.getName());
return BinaryOperator::CreateAShr(A, NewShAmt);		return BinaryOperator::CreateAShr(A, NewShAmt);
}		}

		if (match(Src, m_VScale(DL))) {
		if (CI.getFunction()->hasFnAttribute(Attribute::VScaleRange)) {
		unsigned MaxVScale = CI.getFunction()
		->getFnAttribute(Attribute::VScaleRange)
		.getVScaleRangeArgs()
		.second;
		unsigned TypeWidth = Src->getType()->getScalarSizeInBits();
		if (Log2_32(MaxVScale) < (TypeWidth - 1)) {
		Value *VScale = Builder.CreateVScale(ConstantInt::get(DestTy, 1));
		return replaceInstUsesWith(CI, VScale);
		}
		}
		}

return nullptr;		return nullptr;
}		}

/// Return a Constant* for the specified floating-point constant if it fits		/// Return a Constant* for the specified floating-point constant if it fits
/// in the specified FP type without changing its value.		/// in the specified FP type without changing its value.
static bool fitsInFPType(ConstantFP *CFP, const fltSemantics &Sem) {		static bool fitsInFPType(ConstantFP *CFP, const fltSemantics &Sem) {
bool losesInfo;		bool losesInfo;
APFloat F = CFP->getValueAPF();		APFloat F = CFP->getValueAPF();
▲ Show 20 Lines • Show All 1,209 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/vscale_sext_and_zext.ll

This file was added.

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

; RUN: opt < %s -instcombine -S | FileCheck %s

define i64 @vscale_SExt_i32toi64() #0 {

; CHECK: entry:

; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()

; CHECK-NEXT: ret i64 [[TMP0]]

entry:

%0 = call i32 @llvm.vscale.i32()

%1 = sext i32 %0 to i64

ret i64 %1

}

define i32 @vscale_SExt_i8toi32() #0 {

; CHECK: entry:

; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.vscale.i32()

; CHECK-NEXT: ret i32 [[TMP0]]

entry:

%0 = call i8 @llvm.vscale.i8()

%1 = sext i8 %0 to i32

ret i32 %1

}

define i32 @vscale_SExt_i8toi32_poison() vscale_range(0, 192) {

; CHECK: entry:

; CHECK-NEXT: [[TMP0:%.*]] = call i8 @llvm.vscale.i8()

; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[TMP0]] to i32

; CHECK-NEXT: ret i32 [[TMP1]]

entry:

%0 = call i8 @llvm.vscale.i8()

%1 = sext i8 %0 to i32

ret i32 %1

}

define i64 @vscale_ZExt_i32toi64() #0 {

; CHECK: entry:

; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()

; CHECK-NEXT: ret i64 [[TMP0]]

entry:

%0 = call i32 @llvm.vscale.i32()

%1 = zext i32 %0 to i64

MattUnsubmitted

Not Done

entry:

- %0 = call i16@llvm.vscale.i16()

+ %0 = call i16 @llvm.vscale.i16()

%1 = zext i16 %0 to i64

Nit: Missing space.

Matt: Nit: Missing space.

ret i64 %1

}

define i64 @vscale_ZExt_i1toi64() vscale_range(0, 1) {

; CHECK: entry:

; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()

; CHECK-NEXT: ret i64 [[TMP0]]

entry:

%0 = call i1 @llvm.vscale.i1()

%1 = zext i1 %0 to i64

ret i64 %1

}

define i32 @vscale_ZExt_i8toi32_poison() vscale_range(0, 1024) {

; CHECK: entry:

; CHECK-NEXT: [[TMP0:%.*]] = call i8 @llvm.vscale.i8()

; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[TMP0]] to i32

; CHECK-NEXT: ret i32 [[TMP1]]

entry:

%0 = call i8 @llvm.vscale.i8()

%1 = zext i8 %0 to i32

ret i32 %1

}

define i32 @vscale_ZExt_i16toi32_unknown() {

; CHECK: entry:

; CHECK-NEXT: [[TMP0:%.*]] = call i16 @llvm.vscale.i16()

; CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[TMP0]] to i32

; CHECK-NEXT: ret i32 [[TMP1]]

entry:

%0 = call i16 @llvm.vscale.i16()

%1 = zext i16 %0 to i32

ret i32 %1

}

attributes #0 = { vscale_range(0, 16) }

declare i1 @llvm.vscale.i1()

declare i8 @llvm.vscale.i8()

declare i16 @llvm.vscale.i16()

declare i32 @llvm.vscale.i32()

This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add folds for sign and zero extends of vscaleClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 363097

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

llvm/test/Transforms/InstCombine/vscale_sext_and_zext.ll

[SVE] Add folds for sign and zero extends of vscale
ClosedPublic