This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Always split blocks for si_end_cf
AcceptedPublic

Authored by arsenm on Mar 5 2023, 12:47 PM.

Details

Reviewers
rampitec
critson
foad
nhaehnle
cdevadas
Group Reviewers
Restricted Project
Summary

This is to fix incorrect VGPR spill placement with fastregalloc.
If SGPR spills were inserted in the first regalloc run used for the
exec mask, these instructions would break the block prolog. The
second regalloc run would then incorrectly insert VGPR spills before
the point where exec was setup.

Fixes #61083

Diff Detail

Event Timeline

arsenm created this revision.Mar 5 2023, 12:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2023, 12:47 PM
arsenm requested review of this revision.Mar 5 2023, 12:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2023, 12:47 PM
Herald added a subscriber: wdng. · View Herald Transcript

Unfortunately this interferes with WQM mode change insertion.
You can see this in the reordered s_or + s_and instruction pairs.
I guess this was always a risk with block splitting.

Seems like we need to modify the WQM pass to handle terminators that modify exec.

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll
264

Is this reordering fixing the bug mentioned in the description?
(Exec mask is restored before buffer_load, rather than after.)

arsenm added a comment.Mar 6 2023, 5:07 AM

I was assuming WQM needs to split blocks more aggressively itself to avoid the same problems

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll
264

Yes, previously we would only correctly handle spills used for the exec source value, not other spills

I know we've talked about this in the past but I'm not a fan because it's sort of difficult to justify with a "clean" foundational semantics story.

I think a clean story would be:

  • Every basic block has a canonical exec mask
  • Basic blocks can have prologs and epilogs in which the canonical exec mask doesn't apply. (SI_END_CF is the prolog case.)

If SI_END_CF is split off, the new basic block containing the SI_END_CF still doesn't have a canonical exec mask that makes sense.

Yes, what I'm thinking of as a clean story would require changing the register allocator a little.

Yes, what I'm thinking of as a clean story would require changing the register allocator a little.

OK, so we'll leave things broken for at least 5 more years. Terminators and phis are the only insertion points we can reasonably expect to work today so I think we just need to do this. We could explore better options whenever we get to trying to explicit track both CFGs in the IR at the same time. Right now I think it's a lower barrier of entry to teaching more passes to understand consecutive fall through blocks

My intention is to sit down and try make WQM work with this.
It is a non-trivial change, so no promises, but I will try to look at it in the next few days.

My intention is to sit down and try make WQM work with this.
It is a non-trivial change, so no promises, but I will try to look at it in the next few days.

ping on this

Sorry for the delay. I believe D151797 should allow this to proceed.

I will also look at adding block splitting to WQM, but I need to take care with this as it can have some unintended effects on reg alloc.
In general we have not had to worry about spilling correctness in graphics for pixel shaders (where WQM occurs) as spilling is generally a no-go for having good performance in graphics.

arsenm updated this revision to Diff 528807.Jun 6 2023, 5:26 AM

Rebase on top of wqm patches

critson accepted this revision.Jun 8 2023, 10:04 PM

LGTM w.r.t. WQM test changes.

I will run some local testing for verification, but feel free to proceed with this.

This revision is now accepted and ready to land.Jun 8 2023, 10:04 PM
ye-luo added a subscriber: ye-luo.Jun 9 2023, 6:02 PM

ping @arsenm could you finalize this patch and close https://github.com/llvm/llvm-project/issues/61083?

ping @arsenm could you finalize this patch and close https://github.com/llvm/llvm-project/issues/61083?

Testing found a new crash in blender so I need to look at this more

ping @arsenm could you finalize this patch and close https://github.com/llvm/llvm-project/issues/61083?

Testing found a new crash in blender so I need to look at this more

Any progress?

ping @arsenm could you finalize this patch and close https://github.com/llvm/llvm-project/issues/61083?

Testing found a new crash in blender so I need to look at this more

Any progress?

I'm having a really hard time getting the last version of blender that supports opencl to build. However, I have found a few machine verifier errors from its kernels which will hopefully happen to fix the precheck crash

The good news: the blender failure has seemingly disappeared (I'd assume it's related to the spill patches but I'm not going to bother tracking down why)
The bad news: now a few rocBLAS tests fail