Page MenuHomePhabricator

[AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block
ClosedPublic

Authored by alex-t on Mar 2 2020, 10:23 AM.

Details

Summary

When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop.

To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result.

%95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32

results to

s_mov_b64 s[6:7], exec

BB0_16:

v_readfirstlane_b32 s8, v28
v_cmp_eq_u32_e32 vcc, s8, v28
s_and_saveexec_b64 vcc, vcc
s_set_gpr_idx_on s8, gpr_idx(DST)
v_mov_b32_e32 v6, v25
s_set_gpr_idx_off
s_xor_b64 exec, exec, vcc
s_cbranch_execnz BB0_16

; %bb.17:

s_mov_b64 exec, s[6:7]

The bug appeared in case this expansion occurs in the ELSE block of the CF.

Originally

%110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32,
 %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec

expanded to

  • <== here exec has "THEN" context

    s_mov_b64 s[6:7], exec

BB0_16:

v_readfirstlane_b32 s8, v28
v_cmp_eq_u32_e32 vcc, s8, v28
s_and_saveexec_b64 vcc, vcc
s_set_gpr_idx_on s8, gpr_idx(DST)
v_mov_b32_e32 v6, v25
s_set_gpr_idx_off
s_xor_b64 exec, exec, vcc
s_cbranch_execnz BB0_16

; %bb.17:

s_or_saveexec_b64 s[4:5], s[4:5]   <-- exec mask is restored for "ELSE" but immediately overwritten.
s_mov_b64 exec, s[6:7]

The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask"

SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec.

Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration.

Diff Detail

Event Timeline

alex-t created this revision.Mar 2 2020, 10:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2020, 10:23 AM

Needs tests

alex-t updated this revision to Diff 247693.Mar 2 2020, 11:13 AM

Test updated

vpykhtin accepted this revision.Mar 4 2020, 6:05 AM

I cannot follow all the consequences of adding the landing pad, but this looks a robust solution since it's hard to find appropriate insertion point for s_or_saveexec instruction in the beginning of SI_ELSE constaining block.

This revision is now accepted and ready to land.Mar 4 2020, 6:05 AM
This revision was automatically updated to reflect the committed changes.