Download Raw Diff

Details

Reviewers

foad
arsenm

Commits

rG4f17b1784e94: Fix for AMDGPU MUL_I24 known bits calculation

Summary

At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".

In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ekuznetsov139 created this revision.Nov 17 2019, 5:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2019, 5:00 PM

Herald added subscribers: llvm-commits, hiraditya, t-tye and 8 others. · View Herald Transcript

Needs testcase

In D70367#1749187, @arsenm wrote:

Needs testcase

Do you want me to provide a testcase? and what exactly do you mean by "testcase" - to supply an example of the bug, or to write a unit test for llvm testing for the bug?

As far as an example, compile the following

test.ll12 KBDownload

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

In D70367#1749255, @ekuznetsov139 wrote:

As far as an example, compile the following

test.ll12 KBDownload

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

In D70367#1749263, @arsenm wrote:
In D70367#1749255, @ekuznetsov139 wrote:
As far as an example, compile the following

test.ll12 KBDownload
llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll
The IR code:
%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2
The corresponding ISA:
v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0
I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

Should probably go in test/CodeGen/AMDGPU/mul_int24.ll and/or test/CodeGen/AMDGPU/mul_uint24.ll to go with similar tests

ekuznetsov139 updated this revision to Diff 229770.Nov 18 2019, 2:23 AM

ekuznetsov139 updated this revision to Diff 229773.Nov 18 2019, 2:25 AM

arsenm added inline comments.Nov 19 2019, 4:40 AM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
15–25 ↗	(On Diff #229773)	This testcase is far too big. You should be able to reduce it to a handful of instsructions, probably these ones

The logic looks correct to me.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4455	Space between "if" and "(".
4457	Likewise.

Simpler testcase

ekuznetsov139 updated this revision to Diff 230139.Nov 19 2019, 1:41 PM

arsenm added inline comments.Nov 25 2019, 8:20 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
2 ↗	(On Diff #230139)	FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix
3 ↗	(On Diff #230139)	Run line should be first. Most of the flags to llc can also be removed
4 ↗	(On Diff #230139)	Should use positive checks. I don’t know what this would exclude since we won’t emit anything with dashes
5 ↗	(On Diff #230139)	Drop this comment
17 ↗	(On Diff #230139)	Should probably merge this with the other mul24 test files

ekuznetsov139 updated this revision to Diff 231001.Nov 25 2019, 8:58 PM

ekuznetsov139 added inline comments.Nov 25 2019, 9:01 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
4 ↗	(On Diff #230139)	This would exclude global_store_dword v[0:1], v2, off offset:-128

arsenm added inline comments.Nov 25 2019, 11:17 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
4 ↗	(On Diff #230139)	Ok, that’s not obvious. Positive checks are much less error prone

ekuznetsov139 marked an inline comment as done.Dec 2 2019, 1:07 PM

ekuznetsov139 added inline comments.

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
4 ↗	(On Diff #230139)	This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to be always equal to -32. A negative check fits right in. A positive check would be harder to write since multiple possible correct codes could be generated.

ekuznetsov139 added reviewers: foad, arsenm.Dec 2 2019, 6:30 PM

arsenm added inline comments.Dec 2 2019, 9:59 PM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
4 ↗	(On Diff #230139)	That’s fine. A negative test is easily breakable and should be avoided

ekuznetsov139 updated this revision to Diff 231956.Dec 3 2019, 11:33 AM

ekuznetsov139 marked 3 inline comments as done.

ekuznetsov139 marked 6 inline comments as done.

arsenm added inline comments.Dec 5 2019, 12:25 AM

llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
2 ↗	(On Diff #231956)	Just generate the checks at this point. This is going to successfully match the name of the function

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

In D70367#1770168, @ekuznetsov139 wrote:

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

Please don't give up! Matt's suggestion is to use exactly the .ll test case you already have, but to run utils/update_llc_test_checks.py on it to automatically generate the CHECK: (or GCN:) lines for it.

In D70367#1770168, @ekuznetsov139 wrote:

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

You can either use updatet_llc_test_checks, or follow the other examples in tesst/CodeGen/AMDGPU/mul_int24.ll

I don't see how this is better than GCN-NOT:-128, but here you go

LGTM

This revision is now accepted and ready to land.Dec 9 2019, 10:21 AM

Closed by commit rG4f17b1784e94: Fix for AMDGPU MUL_I24 known bits calculation (authored by foad). · Explain WhyDec 16 2019, 2:48 AM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

Fix for AMDGPU MUL_I24 known bits calculation
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 229734

llvm/include/llvm/Support/KnownBits.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Fix for AMDGPU MUL_I24 known bits calculationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 229734

llvm/include/llvm/Support/KnownBits.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Fix for AMDGPU MUL_I24 known bits calculation
ClosedPublic