This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid using s_cmpk when src0 is not register
ClosedPublic

Authored by ruiling on Jul 1 2020, 8:00 PM.

Details

Reviewers
arsenm
nhaehnle
Summary

The hardware spec require src0 of s_cmpk should be a register. So, we
should not optimize s_cmp to s_cmpk if src0 is not register.

Diff Detail

Event Timeline

ruiling created this revision.Jul 1 2020, 8:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2020, 8:00 PM
arsenm added inline comments.Jul 1 2020, 8:14 PM
llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6

positive checks are more useful. Also you can just generate these checks. Can you reproduce this with an IR test too?

ruiling marked an inline comment as done.Jul 1 2020, 8:59 PM
ruiling added inline comments.
llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6

will try positive check, how to generate the checks? could you give a little bit more info? The original test case that hit the issue is over-complex I think. Normally, a constant expression at IR level is easy to be optimized off by the middle-end. so I think a .mir test is enough for this issue.

ruiling updated this revision to Diff 275003.Jul 1 2020, 9:28 PM

auto-generate the checks in test

arsenm accepted this revision.Jul 6 2020, 7:05 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6

So what is the context this appears? Why wasn't it optimized out?

This revision is now accepted and ready to land.Jul 6 2020, 7:05 AM

Did you see @arsenm's comment?

ruiling marked an inline comment as done.Jul 6 2020, 5:28 PM
ruiling added inline comments.
llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6

well I didn't carefully check the program yet to understand why the optimization algorithms in llvm fails to optimize the program. but I think that is another problem that worth a careful investigation. I will investigate and try to optimize it off later. But I think this patch can be merged, right? can anyone help to merge? I don't have commit access.

ruiling marked an inline comment as done.Jul 7 2020, 2:21 AM
ruiling added inline comments.
llvm/test/CodeGen/AMDGPU/cmp_shrink.mir
6

@arsenm The issue occurs when running vulkancts again AMD open-source vulkan driver. I did a little more check, the test-case has ~40 BBs and lots of phi instructions which is later simplified and proved to be constant. And the problem may be because LLPC choose a subset of llvm optimization passes considering compilation time (https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/patch/Patch.cpp#L223). I tried use LLVM standard set of passes, the constant was optimized off. I think the optimization passes for vulkan may need further tuning to reach a better trade-off between compile-time and quality of generated code.

ping. can you help push the patch? @arsenm @nhaehnle

foad added a subscriber: foad.Jul 13 2020, 6:42 AM

ping. can you help push the patch? @arsenm @nhaehnle

Do you have commit access? If not, I can commit this. See https://llvm.org/docs/DeveloperPolicy.html#id16 if you want to request access.

@foad I don't have commit access, please go help push the patch if you think this is ok. I will apply commit access after some patches accepted. Thanks for pointing the useful link:)

foad closed this revision.Jul 14 2020, 1:17 AM

Committed in 1658b8d7ddb65eb78e1304b009f1043ab6d9463f (sorry I forgot to put a link to this review in the commit message).