This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Don't clobber source register for V_SET_INACTIVE_*
ClosedPublic

Authored by ruiling on Jan 17 2022, 6:43 AM.

Details

Summary

The WWM register has unmodeled register liveness, For v_set_inactive_*,
clobberring source register is dangerous because it will overwrite the
inactive lanes. When the source vgpr is dead at v_set_inactive_lane,
the inactive lanes may be not really dead. This may make common
optimizations doing wrong.

For example in a simple if-then cfg in Machine IR:
bb.if:

%src =

bb.then:

%src1 = COPY %src
%dst = V_SET_INACTIVE %src1(tied-def 0), %inactive

bb.end

... = PHI [0, %bb.then] [%src, %bb.if]

The register coalescer will think it is safe to optimize "%src1 = COPY %src"
in bb.then. And at the same time, there is no interference for the PHI in
bb.end. The source and destination values of the PHI will be assigned
the same register. The single PHI register will be overwritten by the
v_set_inactive, then we would get wrong value in bb.end.

With this change, we will copy the content of the source register before
setting inactive lanes after register allocation. Yes, this will sacrifice
the WWM code generation a little, but I don't have any better idea to do things
correctly.

Diff Detail

Event Timeline

ruiling created this revision.Jan 17 2022, 6:43 AM
ruiling requested review of this revision.Jan 17 2022, 6:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 6:43 AM
foad added inline comments.Jan 17 2022, 7:21 AM
llvm/test/CodeGen/AMDGPU/set-inactive-wwm-overwrite.ll
1

Can you precommit this test so we can see the codegen changes?

ruiling updated this revision to Diff 400679.Jan 17 2022, 5:05 PM

Move tests into separate change D117527

Ping. This is used to fix issue from real app. Can anyone take a close look?

llvm/test/CodeGen/AMDGPU/set-inactive-wwm-overwrite.ll
19

The over-write to inactive lanes of v1 will cause the later buffer_store_dword v0, v1 in bb .end write wrong value out.

If I understand correctly, the root issue is the Register Coalescer not being aware of the wave-level CFG?
It is unfortunate that we have to add these copies, but I agree this seems the simplest fix right now.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1890

Perhaps we could add a comment above this e.g.
// We remove this COPY if/when optimizations (e.g. Register Coalescer) are made aware of the wave-level CFG.

llvm/lib/Target/AMDGPU/SIInstructions.td
183

This is just tidy up?

185

This removes the tie constraint, correct?

If I understand correctly, the root issue is the Register Coalescer not being aware of the wave-level CFG?

Actually it is because the WWM register have a weird liveness:( I don't have a good idea to model it. Even with wave-level CFG, it is still hard, right?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1890

If I understand correctly, how to model WWM register liveness is still an open question even with wave-level CFG.

llvm/lib/Target/AMDGPU/SIInstructions.td
183

No, this is a small optimization, we don't need the source to be in VGPR now. Both SGPR/VGPR source operand works. Without this line change, there would be one more COPY from SGPR to VGPR.

185

Yes

foad accepted this revision.Jan 26 2022, 1:16 AM

I don't know much about WWM codegen, but I'll approve it because the fix looks safe, and it definitely seems to fix a bug in set-inactive-wwm-overwrite.ll. Thanks.

This revision is now accepted and ready to land.Jan 26 2022, 1:16 AM
This revision was landed with ongoing or failed builds.Feb 5 2022, 8:38 PM
This revision was automatically updated to reflect the committed changes.