If instructions were removed in peephole passes after the hazard recognizer was
run it is possible that new hazards could be introduced.
Fixes: SWDEV-253090
Differential D89077
[AMDGPU] Run hazard recognizer pass later kerbowa on Oct 8 2020, 4:10 PM. Authored by
Details If instructions were removed in peephole passes after the hazard recognizer was Fixes: SWDEV-253090
Diff Detail
Event TimelineComment Actions Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses. Comment Actions In earlier conversation it was suggested that the spurious NOPs were explained as happening because the hazard recognizer inserted them to break memory clauses, and then the waitcnt pass ran. There would be no need to insert the NOPs if the waitcnt instructions were already there. So seems that was not a valid explanation, perhaps the post-RA scheduler is an explanation, but I am unclear why it put those ones in. @rampitec can you help explain? Comment Actions Post-RA scheduler and hazard recognizer is the same pass if you run post-RA scheduler. If not it is a separate pass. Comment Actions Just to expand on this, GCNHazardRecognizer is not a pass, it's a class that gets plugged into two different passes:
The post-RA scheduler can insert NOPs which turn out to be unnecessary because a waitcnt will be inserted there anyway, which would fix the hazard. I reckon the right way to fix this is to teach GCNHazardRecognizer to return Hazard (an "advisory" hazard) instead of NoopHazard (a "mandatory" hazard) when it is called from the post-RA scheduler. That way, the post-RA scheduler gets to avoid some hazards by reordering instructions (but not inserting nops), and the rest of the hazards get fixed in the late hazard recognizer pass by inserting nops. I have tried to implement this in the past but never quite finished it.
Comment Actions Will there be a separate review to parametrize the hazard recognizer so that when run early it only resolves the hazards necessary for the register allocator et al. That allows the other hazards to only be resolved in the final run. This would avoid splitting memory clauses early before the waitcnt/invalidate instructions have been inserted that may themselves split the memory clauses. |
I am nervous about running PostRAHazardRecognizer after SIInsertHardClauses because you get undefined behaviour if you insert random instructions of the wrong type into the middle of a hard clause. In particular it is illegal to include s_waitcnt inside a clause, and I know the hazard recognizer does insert s_waitcnt_depctr in some cases.
SIInsertHardClauses does bundle the clauses it forms, but the hazard recognizer can still insert instructions inside bundles in some cases. (I think it is wrong to do this, and I have some ideas about fixing it.)
So I'm not sure if this will cause a problem in practice but it makes me nervous. It seems like we have more than one pass that wants to run as late as possible and I'm not sure how to resolve that. I know @nhaehnle has asked about this in the past.