Download Raw Diff

Details

Reviewers

foad
arsenm

Commits

rG4f17b1784e94: Fix for AMDGPU MUL_I24 known bits calculation

Summary

At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".

In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.

Diff Detail

Event Timeline

ekuznetsov139 created this revision.Nov 17 2019, 5:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2019, 5:00 PM

Herald added subscribers: llvm-commits, hiraditya, t-tye and 8 others. · View Herald Transcript

Needs testcase

In D70367#1749187, @arsenm wrote:

Needs testcase

Do you want me to provide a testcase? and what exactly do you mean by "testcase" - to supply an example of the bug, or to write a unit test for llvm testing for the bug?

As far as an example, compile the following

test.ll12 KBDownload

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

In D70367#1749255, @ekuznetsov139 wrote:

As far as an example, compile the following

test.ll12 KBDownload

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

In D70367#1749263, @arsenm wrote:
In D70367#1749255, @ekuznetsov139 wrote:
As far as an example, compile the following

test.ll12 KBDownload
llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll
The IR code:
%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2
The corresponding ISA:
v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0
I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

Should probably go in test/CodeGen/AMDGPU/mul_int24.ll and/or test/CodeGen/AMDGPU/mul_uint24.ll to go with similar tests

ekuznetsov139 updated this revision to Diff 229770.Nov 18 2019, 2:23 AM

ekuznetsov139 updated this revision to Diff 229773.Nov 18 2019, 2:25 AM

arsenm added inline comments.Nov 19 2019, 4:40 AM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
16–26	This testcase is far too big. You should be able to reduce it to a handful of instsructions, probably these ones

The logic looks correct to me.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4455	Space between "if" and "(".
4457	Likewise.

Simpler testcase

ekuznetsov139 updated this revision to Diff 230139.Nov 19 2019, 1:41 PM

arsenm added inline comments.Nov 25 2019, 8:20 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
3	FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix
4	Run line should be first. Most of the flags to llc can also be removed
5	Should use positive checks. I don’t know what this would exclude since we won’t emit anything with dashes
6	Drop this comment
18	Should probably merge this with the other mul24 test files

ekuznetsov139 updated this revision to Diff 231001.Nov 25 2019, 8:58 PM

ekuznetsov139 added inline comments.Nov 25 2019, 9:01 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	This would exclude global_store_dword v[0:1], v2, off offset:-128

arsenm added inline comments.Nov 25 2019, 11:17 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	Ok, that’s not obvious. Positive checks are much less error prone

ekuznetsov139 marked an inline comment as done.Dec 2 2019, 1:07 PM

ekuznetsov139 added inline comments.

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to be always equal to -32. A negative check fits right in. A positive check would be harder to write since multiple possible correct codes could be generated.

ekuznetsov139 added reviewers: foad, arsenm.Dec 2 2019, 6:30 PM

arsenm added inline comments.Dec 2 2019, 9:59 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5	That’s fine. A negative test is easily breakable and should be avoided

ekuznetsov139 updated this revision to Diff 231956.Dec 3 2019, 11:33 AM

ekuznetsov139 marked 3 inline comments as done.

ekuznetsov139 marked 6 inline comments as done.

arsenm added inline comments.Dec 5 2019, 12:25 AM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
3	Just generate the checks at this point. This is going to successfully match the name of the function

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

In D70367#1770168, @ekuznetsov139 wrote:

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

Please don't give up! Matt's suggestion is to use exactly the .ll test case you already have, but to run utils/update_llc_test_checks.py on it to automatically generate the CHECK: (or GCN:) lines for it.

In D70367#1770168, @ekuznetsov139 wrote:

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

You can either use updatet_llc_test_checks, or follow the other examples in tesst/CodeGen/AMDGPU/mul_int24.ll

I don't see how this is better than GCN-NOT:-128, but here you go

LGTM

This revision is now accepted and ready to land.Dec 9 2019, 10:21 AM

Closed by commit rG4f17b1784e94: Fix for AMDGPU MUL_I24 known bits calculation (authored by foad). · Explain WhyDec 16 2019, 2:48 AM

This revision was automatically updated to reflect the committed changes.

Diff 230138

llvm/include/llvm/Support/KnownBits.h

Context not available.
	/// Returns true if this value is known to be non-negative.	/// Returns true if this value is known to be non-negative.
	bool isNonNegative() const { return Zero.isSignBitSet(); }	bool isNonNegative() const { return Zero.isSignBitSet(); }

		/// Returns true if this value is known to be positive.
		bool isStrictlyPositive() const { return Zero.isSignBitSet() && !One.isNullValue(); }

	/// Make this value negative.	/// Make this value negative.
	void makeNegative() {	void makeNegative() {
	One.setSignBit();	One.setSignBit();
Context not available.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Context not available.
	LHSKnown = LHSKnown.trunc(24);	LHSKnown = LHSKnown.trunc(24);
	RHSKnown = RHSKnown.trunc(24);	RHSKnown = RHSKnown.trunc(24);

	bool Negative = false;
	if (Opc == AMDGPUISD::MUL_I24) {	if (Opc == AMDGPUISD::MUL_I24) {
	unsigned LHSValBits = 24 - LHSKnown.countMinSignBits();	unsigned LHSValBits = 24 - LHSKnown.countMinSignBits();
	unsigned RHSValBits = 24 - RHSKnown.countMinSignBits();	unsigned RHSValBits = 24 - RHSKnown.countMinSignBits();
Context not available.
	if (MaxValBits >= 32)	if (MaxValBits >= 32)
	break;	break;
	bool LHSNegative = LHSKnown.isNegative();	bool LHSNegative = LHSKnown.isNegative();
	bool LHSPositive = LHSKnown.isNonNegative();	bool LHSNonNegative = LHSKnown.isNonNegative();
		bool LHSPositive = LHSKnown.isStrictlyPositive();
	bool RHSNegative = RHSKnown.isNegative();	bool RHSNegative = RHSKnown.isNegative();
	bool RHSPositive = RHSKnown.isNonNegative();	bool RHSNonNegative = RHSKnown.isNonNegative();
	if ((!LHSNegative && !LHSPositive) \|\| (!RHSNegative && !RHSPositive))	bool RHSPositive = RHSKnown.isStrictlyPositive();
	break;
	Negative = (LHSNegative && RHSPositive) \|\| (LHSPositive && RHSNegative);	if ((LHSNonNegative && RHSNonNegative) \|\| (LHSNegative && RHSNegative))
		foadUnsubmitted Done Reply Inline Actions Space between "if" and "(". foad: Space between "if" and "(".
	if (Negative)
	Known.One.setHighBits(32 - MaxValBits);
	else
	Known.Zero.setHighBits(32 - MaxValBits);	Known.Zero.setHighBits(32 - MaxValBits);
		else if ((LHSNegative && RHSPositive) \|\| (LHSPositive && RHSNegative))
		foadUnsubmitted Done Reply Inline Actions Likewise. foad: Likewise.
		Known.One.setHighBits(32 - MaxValBits);
	} else {	} else {
	unsigned LHSValBits = 24 - LHSKnown.countMinLeadingZeros();	unsigned LHSValBits = 24 - LHSKnown.countMinLeadingZeros();
	unsigned RHSValBits = 24 - RHSKnown.countMinLeadingZeros();	unsigned RHSValBits = 24 - RHSKnown.countMinLeadingZeros();
Context not available.

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll

This file was added.


				; FUNC-LABEL: {{^}}test_mul24_knownbits_kernel:
				; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 < %s \| FileCheck --check-prefix=GCN %s
				arsenmUnsubmitted Done Reply Inline Actions FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix arsenm: FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix
				arsenmUnsubmitted Not Done Reply Inline Actions Just generate the checks at this point. This is going to successfully match the name of the function arsenm: Just generate the checks at this point. This is going to successfully match the name of the…
				; GCN-NOT: -128
				arsenmUnsubmitted Done Reply Inline Actions Run line should be first. Most of the flags to llc can also be removed arsenm: Run line should be first. Most of the flags to llc can also be removed
				; Function Attrs: alwaysinline convergent norecurse nounwind
				arsenmUnsubmitted Done Reply Inline Actions Should use positive checks. I don’t know what this would exclude since we won’t emit anything with dashes arsenm: Should use positive checks. I don’t know what this would exclude since we won’t emit anything…
				ekuznetsov139AuthorUnsubmitted Done Reply Inline Actions This would exclude global_store_dword v[0:1], v2, off offset:-128 ekuznetsov139: This would exclude global_store_dword v[0:1], v2, off offset:-128
				arsenmUnsubmitted Done Reply Inline Actions Ok, that’s not obvious. Positive checks are much less error prone arsenm: Ok, that’s not obvious. Positive checks are much less error prone
				ekuznetsov139AuthorUnsubmitted Done Reply Inline Actions This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to be always equal to -32. A negative check fits right in. A positive check would be harder to write since multiple possible correct codes could be generated. ekuznetsov139: This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to…
				arsenmUnsubmitted Done Reply Inline Actions That’s fine. A negative test is easily breakable and should be avoided arsenm: That’s fine. A negative test is easily breakable and should be avoided
				define weak_odr amdgpu_kernel void @test_mul24_knownbits_kernel(float addrspace(1)* %p) #4 {
				arsenmUnsubmitted Done Reply Inline Actions Drop this comment arsenm: Drop this comment
				entry:
				%0 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
				%tid = and i32 %0, 3
				%1 = mul nsw i32 %tid, -5
				%v1 = and i32 %1, -32
				%v2 = sext i32 %v1 to i64
				%v3 = getelementptr inbounds float, float addrspace(1)* %p, i64 %v2
				store float 0.000, float addrspace(1)* %v3, align 4
				ret void
				}

				; Function Attrs: nounwind readnone speculatable
				arsenmUnsubmitted Not Done Reply Inline Actions Should probably merge this with the other mul24 test files arsenm: Should probably merge this with the other mul24 test files
				declare i32 @llvm.amdgcn.workitem.id.x() #20

				!4 = !{i32 0, i32 1024}

llvm/test/CodeGen/AMDGPU/mul_int24.ll

Context not available.
	ret void	ret void

	}	}


		; FUNC-LABEL: {{^}}test_mul24_knownbits_kernel:
		; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 < %s \| FileCheck --check-prefix=GCN %s
		; GCN-NOT: -128
		; Function Attrs: alwaysinline convergent norecurse nounwind
		define weak_odr amdgpu_kernel void @test_mul24_knownbits_kernel(float addrspace(1)* %p) #4 {
		entry:
		%0 = addrspacecast float* %p to float addrspace(1)*
		%1 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
		%tid = and i32 %1, 3
		%2 = mul nsw i32 %tid, -5
		%v1 = and i32 %2, -32
		%v2 = sext i32 %v1 to i64
		%v3 = getelementptr inbounds float, float addrspace(1)* %0, i64 %v2
		store float 0.000, float addrspace(1)* %v3, align 4
		ret void
		}

		; Function Attrs: nounwind readnone speculatable
		declare i32 @llvm.amdgcn.workitem.id.x() #20

	attributes #0 = { nounwind }	attributes #0 = { nounwind }
		!4 = !{i32 0, i32 1024}
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

Fix for AMDGPU MUL_I24 known bits calculation
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 230138

llvm/include/llvm/Support/KnownBits.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll

llvm/test/CodeGen/AMDGPU/mul_int24.ll

This is an archive of the discontinued LLVM Phabricator instance.

Fix for AMDGPU MUL_I24 known bits calculationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 230138

llvm/include/llvm/Support/KnownBits.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll

llvm/test/CodeGen/AMDGPU/mul_int24.ll

Fix for AMDGPU MUL_I24 known bits calculation
ClosedPublic