This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Run hazard recognizer pass later
ClosedPublic

Authored by kerbowa on Oct 8 2020, 4:10 PM.

Details

Summary

If instructions were removed in peephole passes after the hazard recognizer was
run it is possible that new hazards could be introduced.

Fixes: SWDEV-253090

Diff Detail

Event Timeline

kerbowa created this revision.Oct 8 2020, 4:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2020, 4:10 PM
kerbowa requested review of this revision.Oct 8 2020, 4:10 PM
rampitec accepted this revision.Oct 8 2020, 4:13 PM

LGTM

This revision is now accepted and ready to land.Oct 8 2020, 4:13 PM
arsenm requested changes to this revision.Oct 8 2020, 5:15 PM

Needs test

This revision now requires changes to proceed.Oct 8 2020, 5:15 PM
t-tye added a comment.Oct 8 2020, 5:33 PM

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

It was running after the waitcnt pass before.

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

We also insert nops in the post-RA scheduler.

t-tye added a comment.Oct 8 2020, 7:31 PM

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

We also insert nops in the post-RA scheduler.

In earlier conversation it was suggested that the spurious NOPs were explained as happening because the hazard recognizer inserted them to break memory clauses, and then the waitcnt pass ran. There would be no need to insert the NOPs if the waitcnt instructions were already there. So seems that was not a valid explanation, perhaps the post-RA scheduler is an explanation, but I am unclear why it put those ones in. @rampitec can you help explain?

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

We also insert nops in the post-RA scheduler.

In earlier conversation it was suggested that the spurious NOPs were explained as happening because the hazard recognizer inserted them to break memory clauses, and then the waitcnt pass ran. There would be no need to insert the NOPs if the waitcnt instructions were already there. So seems that was not a valid explanation, perhaps the post-RA scheduler is an explanation, but I am unclear why it put those ones in. @rampitec can you help explain?

Post-RA scheduler and hazard recognizer is the same pass if you run post-RA scheduler. If not it is a separate pass.

foad added a comment.Oct 9 2020, 2:01 AM

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

We also insert nops in the post-RA scheduler.

In earlier conversation it was suggested that the spurious NOPs were explained as happening because the hazard recognizer inserted them to break memory clauses, and then the waitcnt pass ran. There would be no need to insert the NOPs if the waitcnt instructions were already there. So seems that was not a valid explanation, perhaps the post-RA scheduler is an explanation, but I am unclear why it put those ones in. @rampitec can you help explain?

Post-RA scheduler and hazard recognizer is the same pass if you run post-RA scheduler. If not it is a separate pass.

Just to expand on this, GCNHazardRecognizer is not a pass, it's a class that gets plugged into two different passes:

  1. The post-RA scheduler (which runs before waitcnt insertion)
  2. The standalone hazard recognizer pass (which should be run pretty late, after any other pass that could introduce hazards)

The post-RA scheduler can insert NOPs which turn out to be unnecessary because a waitcnt will be inserted there anyway, which would fix the hazard. I reckon the right way to fix this is to teach GCNHazardRecognizer to return Hazard (an "advisory" hazard) instead of NoopHazard (a "mandatory" hazard) when it is called from the post-RA scheduler. That way, the post-RA scheduler gets to avoid some hazards by reordering instructions (but not inserting nops), and the rest of the hazards get fixed in the late hazard recognizer pass by inserting nops. I have tried to implement this in the past but never quite finished it.

foad added inline comments.Oct 9 2020, 2:17 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1037

I am nervous about running PostRAHazardRecognizer after SIInsertHardClauses because you get undefined behaviour if you insert random instructions of the wrong type into the middle of a hard clause. In particular it is illegal to include s_waitcnt inside a clause, and I know the hazard recognizer does insert s_waitcnt_depctr in some cases.

SIInsertHardClauses does bundle the clauses it forms, but the hazard recognizer can still insert instructions inside bundles in some cases. (I think it is wrong to do this, and I have some ideas about fixing it.)

So I'm not sure if this will cause a problem in practice but it makes me nervous. It seems like we have more than one pass that wants to run as late as possible and I'm not sure how to resolve that. I know @nhaehnle has asked about this in the past.

foad added inline comments.Oct 9 2020, 3:07 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1037

I've run some Vulkan CTS testing and this patch doesn't seem to cause any problems in practice, so I won't block it.

kerbowa updated this revision to Diff 297498.Oct 11 2020, 10:28 PM

Add test.

arsenm accepted this revision.Oct 16 2020, 7:41 AM
This revision is now accepted and ready to land.Oct 16 2020, 7:41 AM
t-tye added a comment.Oct 16 2020, 9:50 AM

Will there be a separate review to parametrize the hazard recognizer so that when run early it only resolves the hazards necessary for the register allocator et al. That allows the other hazards to only be resolved in the final run. This would avoid splitting memory clauses early before the waitcnt/invalidate instructions have been inserted that may themselves split the memory clauses.

This revision was automatically updated to reflect the committed changes.