This is an archive of the discontinued LLVM Phabricator instance.

Fix for AMDGPU MUL_I24 known bits calculation
ClosedPublic

Authored by ekuznetsov139 on Nov 17 2019, 5:00 PM.

Details

Summary

At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".

In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.

Diff Detail

Event Timeline

ekuznetsov139 created this revision.Nov 17 2019, 5:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2019, 5:00 PM

Needs testcase

Needs testcase

Do you want me to provide a testcase? and what exactly do you mean by "testcase" - to supply an example of the bug, or to write a unit test for llvm testing for the bug?

As far as an example, compile the following

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

As far as an example, compile the following

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

As far as an example, compile the following

llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-code-object-v3 -O2 -amdgpu-function-calls=0 -filetype=asm -o test.isa test.ll

The IR code:

%5 = tail call i32 @llvm.amdgcn.workitem.id.x() #28, !range !4
%tid_y = lshr i32 %5, 4
%tid_x = and i32 %5, 15

%y_div_5 = sdiv i32 %tid_y, 5
%6 = mul nsw i32 %y_div_5, -5
%y_mod_5 = add nsw i32 %6, %tid_y
%v1 = add nsw i32 %tid_x, %y_mod_5
%7 = icmp sgt i32 %v1, -2
%spec.select.i51 = select i1 %7, i32 %v1, i32 -2

The corresponding ISA:

v_and_b32_e32 v15, 15, v0
v_lshrrev_b32_e32 v14, 4, v0
s_load_dwordx2 s[0:1], s[4:5], 0x0
v_add3_u32 v0, v14, v15, -8
v_max_i32_e32 v0, -2, v0

I mean this testcase needs to be prepared as a lit test for the test suite in the patch itself

Should probably go in test/CodeGen/AMDGPU/mul_int24.ll and/or test/CodeGen/AMDGPU/mul_uint24.ll to go with similar tests

arsenm added inline comments.Nov 19 2019, 4:40 AM
llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
16–26

This testcase is far too big. You should be able to reduce it to a handful of instsructions, probably these ones

foad added a subscriber: foad.Nov 19 2019, 12:32 PM

The logic looks correct to me.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4455

Space between "if" and "(".

4457

Likewise.

Simpler testcase

arsenm added inline comments.Nov 25 2019, 8:20 PM
llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
3

FUNC isn’t specified as a check prefix, but there’s also no reason to use a second prefix

4

Run line should be first. Most of the flags to llc can also be removed

5

Should use positive checks. I don’t know what this would exclude since we won’t emit anything with dashes

6

Drop this comment

18

Should probably merge this with the other mul24 test files

ekuznetsov139 added inline comments.Nov 25 2019, 9:01 PM
llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5

This would exclude

global_store_dword v[0:1], v2, off offset:-128
arsenm added inline comments.Nov 25 2019, 11:17 PM
llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5

Ok, that’s not obvious. Positive checks are much less error prone

ekuznetsov139 marked an inline comment as done.Dec 2 2019, 1:07 PM
ekuznetsov139 added inline comments.
llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5

This is a negative test. We are testing to make sure that the optimizer does not assume %v1 to be always equal to -32. A negative check fits right in. A positive check would be harder to write since multiple possible correct codes could be generated.

arsenm added inline comments.Dec 2 2019, 9:59 PM
llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
5

That’s fine. A negative test is easily breakable and should be avoided

ekuznetsov139 marked 3 inline comments as done.
ekuznetsov139 marked 6 inline comments as done.
arsenm added inline comments.Dec 5 2019, 12:25 AM
llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
3

Just generate the checks at this point. This is going to successfully match the name of the function

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

foad added a comment.Dec 5 2019, 1:46 AM

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

Please don't give up! Matt's suggestion is to use exactly the .ll test case you already have, but to run utils/update_llc_test_checks.py on it to automatically generate the CHECK: (or GCN:) lines for it.

I give up. I don't know what kind of test you want. I'll let someone who knows how to do these things submit a patch.

You can either use updatet_llc_test_checks, or follow the other examples in tesst/CodeGen/AMDGPU/mul_int24.ll

I don't see how this is better than GCN-NOT:-128, but here you go

arsenm accepted this revision.Dec 9 2019, 10:21 AM

LGTM

This revision is now accepted and ready to land.Dec 9 2019, 10:21 AM
This revision was automatically updated to reflect the committed changes.