This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Limit runs of fixLdsBranchVmemWARHazard
ClosedPublic

Authored by piotr on Jun 14 2021, 5:05 AM.

Details

Summary

The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds
access followed by a branch, followed by an lds/vmem access.

The handling of the hazard requires an arbitrary number of instructions
to process. In the worst case where a function has a vmem access, but no lds
accesses, all instructions are examined only to conclude that the hazard
cannot occur.

Add the pre-processing stage which detects if there is both lds and vmem
present in the function and only then does the more costly search.

This patch significantly improves compilation time in the cases the hazard
cannot happen. In one pathological case I looked at IsHazardInst is needlesly
called 88.6 milion times.

The numbers could also be improved by introducing a map around the
inner calls to ::getWaitStatesSince in fixLdsBranchVmemWARHazard, but
nothing will beat not running fixLdsBranchVmemWARHazard at all in the cases
detected by shouldRunLdsBranchVmemWARHazardFixup().

Diff Detail

Event Timeline

piotr created this revision.Jun 14 2021, 5:05 AM
piotr requested review of this revision.Jun 14 2021, 5:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 5:05 AM
piotr edited the summary of this revision. (Show Details)
rampitec accepted this revision.Jun 14 2021, 12:55 PM

LGTM apart from tidy comments on variable names.

This revision is now accepted and ready to land.Jun 14 2021, 12:55 PM
This revision was landed with ongoing or failed builds.Jun 14 2021, 1:31 PM
This revision was automatically updated to reflect the committed changes.