Page MenuHomePhabricator

[AMDGPU] Fix shortfalls in WQM marking
ClosedPublic

Authored by critson on Mar 14 2021, 5:59 PM.

Details

Summary

When tracking defined lanes through phi nodes in the live range
graph each branch of the phi must be handled independently.
Also rewrite the marking algorithm to reduce unnecessary
operations.

Previously a shared set of defined lanes was used which caused
marking to stop prematurely. This was observable in existing lit
tests, but test patterns did not cover this detail.

Diff Detail

Event Timeline

critson created this revision.Mar 14 2021, 5:59 PM
critson requested review of this revision.Mar 14 2021, 5:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2021, 5:59 PM
foad added a comment.Mar 15 2021, 2:28 AM

Just some nits inline.

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
367–368

I think you could use a SetVector here instead of a separate set and vector.

375

"predecessors"

423

PhiStack.back()

critson updated this revision to Diff 330586.Mar 15 2021, 2:49 AM
  • Switch to SmallSetVector
  • Address other review comments
piotr added inline comments.Mar 15 2021, 3:33 AM
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
349

I think it would be clearer if you mentioned that NextPredIdx is only needed for the phi case, as opposed to DefinedLanes for instance. Can you please add a comment or change its name to include "phi" somehow (whatever seems right for you)?

368–370

Instead of starting with Idx = 0 and skipping, can you start with Idx = NextPredIdx? That would remove the need for the "continue".

critson added inline comments.Mar 15 2021, 4:36 AM
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
349

Yep, I can rename this variable.

368–370

What is happening here is we are skipping through the list of predecessor blocks to the correct number. I do not think there is a way to access a specific predecessor by index?

critson updated this revision to Diff 330610.Mar 15 2021, 4:41 AM
  • Add comment to NextPredIdx
piotr accepted this revision.Mar 15 2021, 5:05 AM
piotr added inline comments.
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
368–370

auto PI = MBB->pred_begin() + NextPredIdx; ?

This revision is now accepted and ready to land.Mar 15 2021, 5:05 AM
critson marked 6 inline comments as done.Mar 15 2021, 5:23 AM
critson added inline comments.
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
368–370

Thanks!
I didn't know you could do that with iterators.
(You can see the limits of my C++ knowledge.)

critson updated this revision to Diff 330617.Mar 15 2021, 5:31 AM
critson marked an inline comment as done.
  • Address comments
piotr added inline comments.Mar 15 2021, 5:38 AM
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
365

To keep the original behaviour, you also need to adjust Idx here and assign it to NextPredIdx, because Idx is used after the for loop in line 377, right?

critson marked an inline comment as done.Mar 15 2021, 5:40 AM
critson added inline comments.
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
365

Yes, I noticed that immediately after pushing this diff.

critson updated this revision to Diff 330621.Mar 15 2021, 5:43 AM
critson marked an inline comment as done.
  • Fix mistake in previous diff
piotr accepted this revision.Mar 15 2021, 5:45 AM

LGTM

This revision was landed with ongoing or failed builds.Mar 15 2021, 5:45 AM
This revision was automatically updated to reflect the committed changes.