Download Raw Diff

Details

Reviewers

foad
arsenm

Commits

rG4f17b1784e94: Fix for AMDGPU MUL_I24 known bits calculation

Summary

At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".

In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ekuznetsov139 created this revision.Nov 17 2019, 5:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2019, 5:00 PM

Herald added subscribers: llvm-commits, hiraditya, t-tye and 8 others. · View Herald Transcript

Needs testcase

In D70367#1749187, @arsenm wrote:

Needs testcase

Do you want me to provide a testcase? and what exactly do you mean by "testcase" - to supply an example of the bug, or to write a unit test for llvm testing for the bug?

As far as an example, compile the following

test.ll12 KBDownload

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

In D70367#1749255, @ekuznetsov139 wrote:

As far as an example, compile the following

test.ll12 KBDownload

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

In D70367#1749263, @arsenm wrote:
In D70367#1749255, @ekuznetsov139 wrote:
As far as an example, compile the following

test.ll12 KBDownload
llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll
The IR code:
%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2
The corresponding ISA:
v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0
I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

Should probably go in test/CodeGen/AMDGPU/mul_int24.ll and/or test/CodeGen/AMDGPU/mul_uint24.ll to go with similar tests

ekuznetsov139 updated this revision to Diff 229770.Nov 18 2019, 2:23 AM

ekuznetsov139 updated this revision to Diff 229773.Nov 18 2019, 2:25 AM

arsenm added inline comments.Nov 19 2019, 4:40 AM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
16–26	This testcase is far too big. You should be able to reduce it to a handful of instsructions, probably these ones

The logic looks correct to me.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4455	Space between "if" and "(".
4457	Likewise.

Simpler testcase

ekuznetsov139 updated this revision to Diff 230139.Nov 19 2019, 1:41 PM

arsenm added inline comments.Nov 25 2019, 8:20 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
3	FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix
4	Run line should be first. Most of the flags to llc can also be removed
5	Should use positive checks. I don’t know what this would exclude since we won’t emit anything with dashes
6	Drop this comment
18	Should probably merge this with the other mul24 test files

ekuznetsov139 updated this revision to Diff 231001.Nov 25 2019, 8:58 PM

ekuznetsov139 added inline comments.Nov 25 2019, 9:01 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	This would exclude global_store_dword v[0:1], v2, off offset:-128

arsenm added inline comments.Nov 25 2019, 11:17 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	Ok, that’s not obvious. Positive checks are much less error prone

ekuznetsov139 marked an inline comment as done.Dec 2 2019, 1:07 PM

ekuznetsov139 added inline comments.

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to be always equal to -32. A negative check fits right in. A positive check would be harder to write since multiple possible correct codes could be generated.

ekuznetsov139 added reviewers: foad, arsenm.Dec 2 2019, 6:30 PM

arsenm added inline comments.Dec 2 2019, 9:59 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	That’s fine. A negative test is easily breakable and should be avoided

ekuznetsov139 updated this revision to Diff 231956.Dec 3 2019, 11:33 AM

ekuznetsov139 marked 3 inline comments as done.

ekuznetsov139 marked 6 inline comments as done.

arsenm added inline comments.Dec 5 2019, 12:25 AM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
3	Just generate the checks at this point. This is going to successfully match the name of the function

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

In D70367#1770168, @ekuznetsov139 wrote:

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

Please don't give up! Matt's suggestion is to use exactly the .ll test case you already have, but to run utils/update_llc_test_checks.py on it to automatically generate the CHECK: (or GCN:) lines for it.

In D70367#1770168, @ekuznetsov139 wrote:

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

You can either use updatet_llc_test_checks, or follow the other examples in tesst/CodeGen/AMDGPU/mul_int24.ll

I don't see how this is better than GCN-NOT:-128, but here you go

LGTM

This revision is now accepted and ready to land.Dec 9 2019, 10:21 AM

Closed by commit rG4f17b1784e94: Fix for AMDGPU MUL_I24 known bits calculation (authored by foad). · Explain WhyDec 16 2019, 2:48 AM

This revision was automatically updated to reflect the committed changes.

Diff 234008

llvm/include/llvm/Support/KnownBits.h

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	public:
}		}

/// Returns true if this value is known to be negative.		/// Returns true if this value is known to be negative.
bool isNegative() const { return One.isSignBitSet(); }		bool isNegative() const { return One.isSignBitSet(); }

/// Returns true if this value is known to be non-negative.		/// Returns true if this value is known to be non-negative.
bool isNonNegative() const { return Zero.isSignBitSet(); }		bool isNonNegative() const { return Zero.isSignBitSet(); }

		/// Returns true if this value is known to be positive.
		bool isStrictlyPositive() const { return Zero.isSignBitSet() && !One.isNullValue(); }

/// Make this value negative.		/// Make this value negative.
void makeNegative() {		void makeNegative() {
One.setSignBit();		One.setSignBit();
}		}

/// Make this value non-negative.		/// Make this value non-negative.
void makeNonNegative() {		void makeNonNegative() {
Zero.setSignBit();		Zero.setSignBit();
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 4,439 Lines • ▼ Show 20 Lines	case AMDGPUISD::MUL_I24: {
// Skip extra check if all bits are known zeros.		// Skip extra check if all bits are known zeros.
if (TrailZ >= 32)		if (TrailZ >= 32)
break;		break;

// Truncate to 24 bits.		// Truncate to 24 bits.
LHSKnown = LHSKnown.trunc(24);		LHSKnown = LHSKnown.trunc(24);
RHSKnown = RHSKnown.trunc(24);		RHSKnown = RHSKnown.trunc(24);

bool Negative = false;
if (Opc == AMDGPUISD::MUL_I24) {		if (Opc == AMDGPUISD::MUL_I24) {
unsigned LHSValBits = 24 - LHSKnown.countMinSignBits();		unsigned LHSValBits = 24 - LHSKnown.countMinSignBits();
unsigned RHSValBits = 24 - RHSKnown.countMinSignBits();		unsigned RHSValBits = 24 - RHSKnown.countMinSignBits();
unsigned MaxValBits = std::min(LHSValBits + RHSValBits, 32u);		unsigned MaxValBits = std::min(LHSValBits + RHSValBits, 32u);
if (MaxValBits >= 32)		if (MaxValBits >= 32)
break;		break;
bool LHSNegative = LHSKnown.isNegative();		bool LHSNegative = LHSKnown.isNegative();
bool LHSPositive = LHSKnown.isNonNegative();		bool LHSNonNegative = LHSKnown.isNonNegative();
		foadUnsubmitted Done Reply Inline Actions Space between "if" and "(". foad: Space between "if" and "(".
		bool LHSPositive = LHSKnown.isStrictlyPositive();
bool RHSNegative = RHSKnown.isNegative();		bool RHSNegative = RHSKnown.isNegative();
		foadUnsubmitted Done Reply Inline Actions Likewise. foad: Likewise.
bool RHSPositive = RHSKnown.isNonNegative();		bool RHSNonNegative = RHSKnown.isNonNegative();
if ((!LHSNegative && !LHSPositive) \|\| (!RHSNegative && !RHSPositive))		bool RHSPositive = RHSKnown.isStrictlyPositive();
break;
Negative = (LHSNegative && RHSPositive) \|\| (LHSPositive && RHSNegative);		if((LHSNonNegative && RHSNonNegative) \|\| (LHSNegative && RHSNegative))
if (Negative)
Known.One.setHighBits(32 - MaxValBits);
else
Known.Zero.setHighBits(32 - MaxValBits);		Known.Zero.setHighBits(32 - MaxValBits);
		else if((LHSNegative && RHSPositive) \|\| (LHSPositive && RHSNegative))
		Known.One.setHighBits(32 - MaxValBits);
} else {		} else {
unsigned LHSValBits = 24 - LHSKnown.countMinLeadingZeros();		unsigned LHSValBits = 24 - LHSKnown.countMinLeadingZeros();
unsigned RHSValBits = 24 - RHSKnown.countMinLeadingZeros();		unsigned RHSValBits = 24 - RHSKnown.countMinLeadingZeros();
unsigned MaxValBits = std::min(LHSValBits + RHSValBits, 32u);		unsigned MaxValBits = std::min(LHSValBits + RHSValBits, 32u);
if (MaxValBits >= 32)		if (MaxValBits >= 32)
break;		break;
Known.Zero.setHighBits(32 - MaxValBits);		Known.Zero.setHighBits(32 - MaxValBits);
}		}
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py

				; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck --check-prefix=GCN %s
				arsenmUnsubmitted Done Reply Inline Actions FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix arsenm: FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix
				arsenmUnsubmitted Not Done Reply Inline Actions Just generate the checks at this point. This is going to successfully match the name of the function arsenm: Just generate the checks at this point. This is going to successfully match the name of the…
				define weak_odr amdgpu_kernel void @test_mul24_knownbits_kernel(float addrspace(1)* %p) #4 {
				arsenmUnsubmitted Done Reply Inline Actions Run line should be first. Most of the flags to llc can also be removed arsenm: Run line should be first. Most of the flags to llc can also be removed
				; GCN-LABEL: test_mul24_knownbits_kernel:
				arsenmUnsubmitted Done Reply Inline Actions Should use positive checks. I don’t know what this would exclude since we won’t emit anything with dashes arsenm: Should use positive checks. I don’t know what this would exclude since we won’t emit anything…
				ekuznetsov139AuthorUnsubmitted Done Reply Inline Actions This would exclude global_store_dword v[0:1], v2, off offset:-128 ekuznetsov139: This would exclude global_store_dword v[0:1], v2, off offset:-128
				arsenmUnsubmitted Done Reply Inline Actions Ok, that’s not obvious. Positive checks are much less error prone arsenm: Ok, that’s not obvious. Positive checks are much less error prone
				ekuznetsov139AuthorUnsubmitted Done Reply Inline Actions This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to be always equal to -32. A negative check fits right in. A positive check would be harder to write since multiple possible correct codes could be generated. ekuznetsov139: This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to…
				arsenmUnsubmitted Done Reply Inline Actions That’s fine. A negative test is easily breakable and should be avoided arsenm: That’s fine. A negative test is easily breakable and should be avoided
				; GCN: ; %bb.0: ; %entry
				arsenmUnsubmitted Done Reply Inline Actions Drop this comment arsenm: Drop this comment
				; GCN-NEXT: v_and_b32_e32 v0, 3, v0
				; GCN-NEXT: v_mul_i32_i24_e32 v0, 0xfffffb, v0
				; GCN-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GCN-NEXT: v_and_b32_e32 v0, 0xffffffe0, v0
				; GCN-NEXT: v_ashrrev_i32_e32 v1, 31, v0
				; GCN-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, s1
				; GCN-NEXT: v_add_co_u32_e32 v0, vcc, s0, v0
				; GCN-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
				; GCN-NEXT: v_mov_b32_e32 v2, 0
				; GCN-NEXT: global_store_dword v[0:1], v2, off
				arsenmUnsubmitted Not Done Reply Inline Actions Should probably merge this with the other mul24 test files arsenm: Should probably merge this with the other mul24 test files
				; GCN-NEXT: s_endpgm
				entry:
				%0 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
				%tid = and i32 %0, 3
				%1 = mul nsw i32 %tid, -5
				%v1 = and i32 %1, -32
				%v2 = sext i32 %v1 to i64
				%v3 = getelementptr inbounds float, float addrspace(1)* %p, i64 %v2
				arsenmUnsubmitted Not Done Reply Inline Actions This testcase is far too big. You should be able to reduce it to a handful of instsructions, probably these ones arsenm: This testcase is far too big. You should be able to reduce it to a handful of instsructions…
				store float 0.000, float addrspace(1)* %v3, align 4
				ret void
				}

				; Function Attrs: nounwind readnone speculatable
				declare i32 @llvm.amdgcn.workitem.id.x() #20

				!4 = !{i32 0, i32 1024}

This is an archive of the discontinued LLVM Phabricator instance.

Fix for AMDGPU MUL_I24 known bits calculation
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 234008

llvm/include/llvm/Support/KnownBits.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll

This is an archive of the discontinued LLVM Phabricator instance.

Fix for AMDGPU MUL_I24 known bits calculationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 234008

llvm/include/llvm/Support/KnownBits.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll

Fix for AMDGPU MUL_I24 known bits calculation
ClosedPublic