This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
ClosedPublic

Authored by kzhuravl on Jun 14 2022, 10:35 AM.

Details

Summary

It was causing bad machine code for several blender scenes:

*** Bad machine code: Using an undefined physical register ***
- function:    kernel_holdout_emission_blurring_pathtermination_ao
- basic block: %bb.28 if.end40.i (0x7f84861a2320)
- instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec
- operand 1:   $vgpr3

Diff Detail

Event Timeline

kzhuravl created this revision.Jun 14 2022, 10:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 10:35 AM
kzhuravl requested review of this revision.Jun 14 2022, 10:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 10:35 AM
Herald added a subscriber: wdng. · View Herald Transcript
kzhuravl edited the summary of this revision. (Show Details)Jun 14 2022, 10:36 AM

Why not to remove original v_cmp instead? It should have no other uses.

Why not to remove original v_cmp instead? It should have no other uses.

Original v_cmp_eq_u32 got removed in favor of v_cmpx_eq_u32. v_cmp_ne_u32 is left as is.

Why not to remove original v_cmp instead? It should have no other uses.

v_cmp result is used as source for the saveexec (mostly vcc).

Looking good to me.

Why not to remove original v_cmp instead? It should have no other uses.

Original v_cmp_eq_u32 got removed in favor of v_cmpx_eq_u32. v_cmp_ne_u32 is left as is.

OK, I see. That's another instruction in between kills it..

What if src1 is not register?

Why not to remove original v_cmp instead? It should have no other uses.

Original v_cmp_eq_u32 got removed in favor of v_cmpx_eq_u32. v_cmp_ne_u32 is left as is.

OK, I see. That's another instruction in between kills it..

What if src1 is not register?

Fixed and added test for the non register case. Thanks!

kzhuravl updated this revision to Diff 436893.Jun 14 2022, 12:39 PM

Handle non register case and add a test for it.

This revision is now accepted and ready to land.Jun 14 2022, 12:40 PM
tsymalla accepted this revision.Jun 14 2022, 12:52 PM
arsenm added inline comments.Jun 14 2022, 1:40 PM
llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
493

Can you get away with clearing it on the one operand?

kzhuravl updated this revision to Diff 438434.Jun 20 2022, 10:01 AM
kzhuravl marked an inline comment as done.

Clear kill flags for Src0 if its a reg.

llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
493

In the case I was looking at - yes. However, better to clear on both. Updated and added the test.

@arsenm, any additional feedback?

arsenm accepted this revision.Jun 24 2022, 7:12 AM